TensorFlow Utility Functions#

ray.rllib.utils.tf_utils.explained_variance(y: Union[numpy.array, tf.Tensor, torch.Tensor], pred: Union[numpy.array, tf.Tensor, torch.Tensor]) Union[numpy.array, tf.Tensor, torch.Tensor][source]#

Computes the explained variance for a pair of labels and predictions.

The formula used is: max(-1.0, 1.0 - (std(y - pred)^2 / std(y)^2))

  • y – The labels.

  • pred – The predictions.


The explained variance given a pair of labels and predictions.

ray.rllib.utils.tf_utils.flatten_inputs_to_1d_tensor(inputs: Union[numpy.array, tf.Tensor, torch.Tensor, dict, tuple], spaces_struct: Optional[Union[<MagicMock name='mock.spaces.Space' id='140330629676496'>, dict, tuple]] = None, time_axis: bool = False) Union[numpy.array, tf.Tensor, torch.Tensor][source]#

Flattens arbitrary input structs according to the given spaces struct.

Returns a single 1D tensor resulting from the different input components’ values.

Thereby: - Boxes (any shape) get flattened to (B, [T]?, -1). Note that image boxes are not treated differently from other types of Boxes and get flattened as well. - Discrete (int) values are one-hot’d, e.g. a batch of [1, 0, 3] (B=3 with Discrete(4) space) results in [[0, 1, 0, 0], [1, 0, 0, 0], [0, 0, 0, 1]]. - MultiDiscrete values are multi-one-hot’d, e.g. a batch of [[0, 2], [1, 4]] (B=2 with MultiDiscrete([2, 5]) space) results in [[1, 0, 0, 0, 1, 0, 0], [0, 1, 0, 0, 0, 0, 1]].

  • inputs – The inputs to be flattened.

  • spaces_struct – The structure of the spaces that behind the input

  • time_axis – Whether all inputs have a time-axis (after the batch axis). If True, will keep not only the batch axis (0th), but the time axis (1st) as-is and flatten everything from the 2nd axis up.


A single 1D tensor resulting from concatenating all flattened/one-hot’d input components. Depending on the time_axis flag, the shape is (B, n) or (B, T, n).


>>> # B=2
>>> from ray.rllib.utils.tf_utils import flatten_inputs_to_1d_tensor
>>> from gym.spaces import Discrete, Box
>>> out = flatten_inputs_to_1d_tensor( 
...     {"a": [1, 0], "b": [[[0.0], [0.1]], [1.0], [1.1]]},
...     spaces_struct=dict(a=Discrete(2), b=Box(shape=(2, 1)))
... ) 
>>> print(out) 
[[0.0, 1.0,  0.0, 0.1], [1.0, 0.0,  1.0, 1.1]]  # B=2 n=4
>>> # B=2; T=2
>>> out = flatten_inputs_to_1d_tensor( 
...     ([[1, 0], [0, 1]],
...      [[[0.0, 0.1], [1.0, 1.1]], [[2.0, 2.1], [3.0, 3.1]]]),
...     spaces_struct=tuple([Discrete(2), Box(shape=(2, ))]),
...     time_axis=True
... ) 
>>> print(out) 
[[[0.0, 1.0, 0.0, 0.1], [1.0, 0.0, 1.0, 1.1]],        [[1.0, 0.0, 2.0, 2.1], [0.0, 1.0, 3.0, 3.1]]]  # B=2 T=2 n=4
ray.rllib.utils.tf_utils.get_gpu_devices() List[str][source]#

Returns a list of GPU device names, e.g. [“/gpu:0”, “/gpu:1”].

Supports both tf1.x and tf2.x.


List of GPU device names (str).

ray.rllib.utils.tf_utils.get_placeholder(*, space: Optional[<MagicMock name='mock.Space' id='140329314031952'>] = None, value: Optional[Any] = None, name: Optional[str] = None, time_axis: bool = False, flatten: bool = True) tensorflow.python.ops.array_ops.placeholder[source]#

Returns a tf1.placeholder object given optional hints, such as a space.

Note that the returned placeholder will always have a leading batch dimension (None).

  • space – An optional gym.Space to hint the shape and dtype of the placeholder.

  • value – An optional value to hint the shape and dtype of the placeholder.

  • name – An optional name for the placeholder.

  • time_axis – Whether the placeholder should also receive a time dimension (None).

  • flatten – Whether to flatten the given space into a plain Box space and then create the placeholder from the resulting space.


The tf1 placeholder.

ray.rllib.utils.tf_utils.get_tf_eager_cls_if_necessary(orig_cls: Type[Policy], config: dict) Type[Policy][source]#

Returns the corresponding tf-eager class for a given TFPolicy class.

  • orig_cls – The original TFPolicy class to get the corresponding tf-eager class for.

  • config – The Algorithm config dict.


The tf eager policy class corresponding to the given TFPolicy class.

ray.rllib.utils.tf_utils.huber_loss(x: Union[numpy.array, tf.Tensor, torch.Tensor], delta: float = 1.0) Union[numpy.array, tf.Tensor, torch.Tensor][source]#

Computes the huber loss for a given term and delta parameter.

Reference: https://en.wikipedia.org/wiki/Huber_loss Note that the factor of 0.5 is implicitly included in the calculation.


L = 0.5 * x^2 for small abs x (delta threshold) L = delta * (abs(x) - 0.5*delta) for larger abs x (delta threshold)

  • x – The input term, e.g. a TD error.

  • delta – The delta parmameter in the above formula.


The Huber loss resulting from x and delta.

ray.rllib.utils.tf_utils.l2_loss(x: Union[numpy.array, tf.Tensor, torch.Tensor]) Union[numpy.array, tf.Tensor, torch.Tensor][source]#

Computes half the L2 norm over a tensor’s values without the sqrt.

output = 0.5 * sum(x ** 2)


x – The input tensor.


0.5 times the L2 norm over the given tensor’s values (w/o sqrt).

ray.rllib.utils.tf_utils.make_tf_callable(session_or_none: Optional[tensorflow.python.client.session.Session], dynamic_shape: bool = False) Callable[source]#

Returns a function that can be executed in either graph or eager mode.

The function must take only positional args.

If eager is enabled, this will act as just a function. Otherwise, it will build a function that executes a session run with placeholders internally.

  • session_or_none – tf.Session if in graph mode, else None.

  • dynamic_shape – True if the placeholders should have a dynamic batch dimension. Otherwise they will be fixed shape.


A function that can be called in either eager or static-graph mode.

ray.rllib.utils.tf_utils.minimize_and_clip(optimizer: Union[tf.keras.optimizers.Optimizer, torch.optim.Optimizer], objective: Union[numpy.array, tf.Tensor, torch.Tensor], var_list: List[tf.Variable], clip_val: float = 10.0) Union[List[Tuple[Union[numpy.array, tf.Tensor, torch.Tensor], Union[numpy.array, tf.Tensor, torch.Tensor]]], List[Union[numpy.array, tf.Tensor, torch.Tensor]]][source]#

Computes, then clips gradients using objective, optimizer and var list.

Ensures the norm of the gradients for each variable is clipped to clip_val.

  • optimizer – Either a shim optimizer (tf eager) containing a tf.GradientTape under self.tape or a tf1 local optimizer object.

  • objective – The loss tensor to calculate gradients on.

  • var_list – The list of tf.Variables to compute gradients over.

  • clip_val – The global norm clip value. Will clip around -clip_val and +clip_val.


The resulting model gradients (list or tuples of grads + vars) corresponding to the input var_list.

ray.rllib.utils.tf_utils.one_hot(x: Union[numpy.array, tf.Tensor, torch.Tensor], space: <MagicMock name='mock.Space' id='140329314031952'>) Union[numpy.array, tf.Tensor, torch.Tensor][source]#

Returns a one-hot tensor, given and int tensor and a space.

Handles the MultiDiscrete case as well.

  • x – The input tensor.

  • space – The space to use for generating the one-hot tensor.


The resulting one-hot tensor.


ValueError – If the given space is not a discrete one.


>>> import gym
>>> import tensorflow as tf
>>> from ray.rllib.utils.tf_utils import one_hot
>>> x = tf.Variable([0, 3], dtype=tf.int32)  # batch-dim=2
>>> # Discrete space with 4 (one-hot) slots per batch item.
>>> s = gym.spaces.Discrete(4)
>>> one_hot(x, s) 
<tf.Tensor 'one_hot:0' shape=(2, 4) dtype=float32>
>>> x = tf.Variable([[0, 1, 2, 3]], dtype=tf.int32)  # batch-dim=1
>>> # MultiDiscrete space with 5 + 4 + 4 + 7 = 20 (one-hot) slots
>>> # per batch item.
>>> s = gym.spaces.MultiDiscrete([5, 4, 4, 7])
>>> one_hot(x, s) 
<tf.Tensor 'concat:0' shape=(1, 20) dtype=float32>
ray.rllib.utils.tf_utils.reduce_mean_ignore_inf(x: Union[numpy.array, tf.Tensor, torch.Tensor], axis: Optional[int] = None) Union[numpy.array, tf.Tensor, torch.Tensor][source]#

Same as tf.reduce_mean() but ignores -inf values.

  • x – The input tensor to reduce mean over.

  • axis – The axis over which to reduce. None for all axes.


The mean reduced inputs, ignoring inf values.

ray.rllib.utils.tf_utils.scope_vars(scope: Union[str, tensorflow.python.ops.variable_scope.VariableScope], trainable_only: bool = False) List[tensorflow.python.ops.variables.Variable][source]#

Get variables inside a given scope.

  • scope – Scope in which the variables reside.

  • trainable_only – Whether or not to return only the variables that were marked as trainable.


The list of variables in the given scope.

ray.rllib.utils.tf_utils.zero_logps_from_actions(actions: Union[numpy.array, tf.Tensor, torch.Tensor, dict, tuple]) Union[numpy.array, tf.Tensor, torch.Tensor][source]#

Helper function useful for returning dummy logp’s (0) for some actions.


actions – The input actions. This can be any struct of complex action components or a simple tensor of different dimensions, e.g. [B], [B, 2], or {“a”: [B, 4, 5], “b”: [B]}.


A 1D tensor of 0.0 (dummy logp’s) matching the batch dim of actions (shape=[B]).