TensorFlow Utility Functions
TensorFlow Utility Functions#
- ray.rllib.utils.tf_utils.explained_variance(y: Union[numpy.array, tf.Tensor, torch.Tensor], pred: Union[numpy.array, tf.Tensor, torch.Tensor]) Union[numpy.array, tf.Tensor, torch.Tensor] [source]#
Computes the explained variance for a pair of labels and predictions.
The formula used is: max(-1.0, 1.0 - (std(y - pred)^2 / std(y)^2))
- Parameters
y – The labels.
pred – The predictions.
- Returns
The explained variance given a pair of labels and predictions.
- ray.rllib.utils.tf_utils.flatten_inputs_to_1d_tensor(inputs: Union[numpy.array, tf.Tensor, torch.Tensor, dict, tuple], spaces_struct: Optional[Union[<MagicMock name='mock.spaces.Space' id='140330629676496'>, dict, tuple]] = None, time_axis: bool = False) Union[numpy.array, tf.Tensor, torch.Tensor] [source]#
Flattens arbitrary input structs according to the given spaces struct.
Returns a single 1D tensor resulting from the different input components’ values.
Thereby: - Boxes (any shape) get flattened to (B, [T]?, -1). Note that image boxes are not treated differently from other types of Boxes and get flattened as well. - Discrete (int) values are one-hot’d, e.g. a batch of [1, 0, 3] (B=3 with Discrete(4) space) results in [[0, 1, 0, 0], [1, 0, 0, 0], [0, 0, 0, 1]]. - MultiDiscrete values are multi-one-hot’d, e.g. a batch of [[0, 2], [1, 4]] (B=2 with MultiDiscrete([2, 5]) space) results in [[1, 0, 0, 0, 1, 0, 0], [0, 1, 0, 0, 0, 0, 1]].
- Parameters
inputs – The inputs to be flattened.
spaces_struct – The structure of the spaces that behind the input
time_axis – Whether all inputs have a time-axis (after the batch axis). If True, will keep not only the batch axis (0th), but the time axis (1st) as-is and flatten everything from the 2nd axis up.
- Returns
A single 1D tensor resulting from concatenating all flattened/one-hot’d input components. Depending on the time_axis flag, the shape is (B, n) or (B, T, n).
Examples
>>> # B=2 >>> from ray.rllib.utils.tf_utils import flatten_inputs_to_1d_tensor >>> from gym.spaces import Discrete, Box >>> out = flatten_inputs_to_1d_tensor( ... {"a": [1, 0], "b": [[[0.0], [0.1]], [1.0], [1.1]]}, ... spaces_struct=dict(a=Discrete(2), b=Box(shape=(2, 1))) ... ) >>> print(out) [[0.0, 1.0, 0.0, 0.1], [1.0, 0.0, 1.0, 1.1]] # B=2 n=4
>>> # B=2; T=2 >>> out = flatten_inputs_to_1d_tensor( ... ([[1, 0], [0, 1]], ... [[[0.0, 0.1], [1.0, 1.1]], [[2.0, 2.1], [3.0, 3.1]]]), ... spaces_struct=tuple([Discrete(2), Box(shape=(2, ))]), ... time_axis=True ... ) >>> print(out) [[[0.0, 1.0, 0.0, 0.1], [1.0, 0.0, 1.0, 1.1]], [[1.0, 0.0, 2.0, 2.1], [0.0, 1.0, 3.0, 3.1]]] # B=2 T=2 n=4
- ray.rllib.utils.tf_utils.get_gpu_devices() List[str] [source]#
Returns a list of GPU device names, e.g. [“/gpu:0”, “/gpu:1”].
Supports both tf1.x and tf2.x.
- Returns
List of GPU device names (str).
- ray.rllib.utils.tf_utils.get_placeholder(*, space: Optional[<MagicMock name='mock.Space' id='140329314031952'>] = None, value: Optional[Any] = None, name: Optional[str] = None, time_axis: bool = False, flatten: bool = True) tensorflow.python.ops.array_ops.placeholder [source]#
Returns a tf1.placeholder object given optional hints, such as a space.
Note that the returned placeholder will always have a leading batch dimension (None).
- Parameters
space – An optional gym.Space to hint the shape and dtype of the placeholder.
value – An optional value to hint the shape and dtype of the placeholder.
name – An optional name for the placeholder.
time_axis – Whether the placeholder should also receive a time dimension (None).
flatten – Whether to flatten the given space into a plain Box space and then create the placeholder from the resulting space.
- Returns
The tf1 placeholder.
- ray.rllib.utils.tf_utils.get_tf_eager_cls_if_necessary(orig_cls: Type[Policy], config: dict) Type[Policy] [source]#
Returns the corresponding tf-eager class for a given TFPolicy class.
- Parameters
orig_cls – The original TFPolicy class to get the corresponding tf-eager class for.
config – The Algorithm config dict.
- Returns
The tf eager policy class corresponding to the given TFPolicy class.
- ray.rllib.utils.tf_utils.huber_loss(x: Union[numpy.array, tf.Tensor, torch.Tensor], delta: float = 1.0) Union[numpy.array, tf.Tensor, torch.Tensor] [source]#
Computes the huber loss for a given term and delta parameter.
Reference: https://en.wikipedia.org/wiki/Huber_loss Note that the factor of 0.5 is implicitly included in the calculation.
- Formula:
L = 0.5 * x^2 for small abs x (delta threshold) L = delta * (abs(x) - 0.5*delta) for larger abs x (delta threshold)
- Parameters
x – The input term, e.g. a TD error.
delta – The delta parmameter in the above formula.
- Returns
The Huber loss resulting from
x
anddelta
.
- ray.rllib.utils.tf_utils.l2_loss(x: Union[numpy.array, tf.Tensor, torch.Tensor]) Union[numpy.array, tf.Tensor, torch.Tensor] [source]#
Computes half the L2 norm over a tensor’s values without the sqrt.
output = 0.5 * sum(x ** 2)
- Parameters
x – The input tensor.
- Returns
0.5 times the L2 norm over the given tensor’s values (w/o sqrt).
- ray.rllib.utils.tf_utils.make_tf_callable(session_or_none: Optional[tensorflow.python.client.session.Session], dynamic_shape: bool = False) Callable [source]#
Returns a function that can be executed in either graph or eager mode.
The function must take only positional args.
If eager is enabled, this will act as just a function. Otherwise, it will build a function that executes a session run with placeholders internally.
- Parameters
session_or_none – tf.Session if in graph mode, else None.
dynamic_shape – True if the placeholders should have a dynamic batch dimension. Otherwise they will be fixed shape.
- Returns
A function that can be called in either eager or static-graph mode.
- ray.rllib.utils.tf_utils.minimize_and_clip(optimizer: Union[tf.keras.optimizers.Optimizer, torch.optim.Optimizer], objective: Union[numpy.array, tf.Tensor, torch.Tensor], var_list: List[tf.Variable], clip_val: float = 10.0) Union[List[Tuple[Union[numpy.array, tf.Tensor, torch.Tensor], Union[numpy.array, tf.Tensor, torch.Tensor]]], List[Union[numpy.array, tf.Tensor, torch.Tensor]]] [source]#
Computes, then clips gradients using objective, optimizer and var list.
Ensures the norm of the gradients for each variable is clipped to
clip_val
.- Parameters
optimizer – Either a shim optimizer (tf eager) containing a tf.GradientTape under
self.tape
or a tf1 local optimizer object.objective – The loss tensor to calculate gradients on.
var_list – The list of tf.Variables to compute gradients over.
clip_val – The global norm clip value. Will clip around -clip_val and +clip_val.
- Returns
The resulting model gradients (list or tuples of grads + vars) corresponding to the input
var_list
.
- ray.rllib.utils.tf_utils.one_hot(x: Union[numpy.array, tf.Tensor, torch.Tensor], space: <MagicMock name='mock.Space' id='140329314031952'>) Union[numpy.array, tf.Tensor, torch.Tensor] [source]#
Returns a one-hot tensor, given and int tensor and a space.
Handles the MultiDiscrete case as well.
- Parameters
x – The input tensor.
space – The space to use for generating the one-hot tensor.
- Returns
The resulting one-hot tensor.
- Raises
ValueError – If the given space is not a discrete one.
Examples
>>> import gym >>> import tensorflow as tf >>> from ray.rllib.utils.tf_utils import one_hot >>> x = tf.Variable([0, 3], dtype=tf.int32) # batch-dim=2 >>> # Discrete space with 4 (one-hot) slots per batch item. >>> s = gym.spaces.Discrete(4) >>> one_hot(x, s) <tf.Tensor 'one_hot:0' shape=(2, 4) dtype=float32> >>> x = tf.Variable([[0, 1, 2, 3]], dtype=tf.int32) # batch-dim=1 >>> # MultiDiscrete space with 5 + 4 + 4 + 7 = 20 (one-hot) slots >>> # per batch item. >>> s = gym.spaces.MultiDiscrete([5, 4, 4, 7]) >>> one_hot(x, s) <tf.Tensor 'concat:0' shape=(1, 20) dtype=float32>
- ray.rllib.utils.tf_utils.reduce_mean_ignore_inf(x: Union[numpy.array, tf.Tensor, torch.Tensor], axis: Optional[int] = None) Union[numpy.array, tf.Tensor, torch.Tensor] [source]#
Same as tf.reduce_mean() but ignores -inf values.
- Parameters
x – The input tensor to reduce mean over.
axis – The axis over which to reduce. None for all axes.
- Returns
The mean reduced inputs, ignoring inf values.
- ray.rllib.utils.tf_utils.scope_vars(scope: Union[str, tensorflow.python.ops.variable_scope.VariableScope], trainable_only: bool = False) List[tensorflow.python.ops.variables.Variable] [source]#
Get variables inside a given scope.
- Parameters
scope – Scope in which the variables reside.
trainable_only – Whether or not to return only the variables that were marked as trainable.
- Returns
The list of variables in the given
scope
.
- ray.rllib.utils.tf_utils.zero_logps_from_actions(actions: Union[numpy.array, tf.Tensor, torch.Tensor, dict, tuple]) Union[numpy.array, tf.Tensor, torch.Tensor] [source]#
Helper function useful for returning dummy logp’s (0) for some actions.
- Parameters
actions – The input actions. This can be any struct of complex action components or a simple tensor of different dimensions, e.g. [B], [B, 2], or {“a”: [B, 4, 5], “b”: [B]}.
- Returns
A 1D tensor of 0.0 (dummy logp’s) matching the batch dim of
actions
(shape=[B]).