RLlib Utilities#

Here is a list of all the utilities available in RLlib.

MetricsLogger API#

RLlib uses the MetricsLogger API to log stats and metrics for the various components. Users can also

For example:

from ray.rllib.utils.metrics.metrics_logger import MetricsLogger

logger = MetricsLogger()

# Log a scalar float value under the `loss` key. By default, all logged
# values under that key are averaged, once `reduce()` is called.
logger.log_value("loss", 0.05, reduce="mean", window=2)
logger.log_value("loss", 0.1)
logger.log_value("loss", 0.2)

logger.peek("loss")  # expect: 0.15 (mean of last 2 values: 0.1 and 0.2)

`MetricsLogger`	A generic class collecting and processing metrics in RL training and evaluation.
`MetricsLogger.peek`	Returns the reduced values found in this MetricsLogger.
`MetricsLogger.log_value`	Logs a new value under a (possibly nested) key to the logger.
`MetricsLogger.log_dict`	Logs all leafs of a possibly nested dict of values to this logger.
`MetricsLogger.aggregate`	Merges n stats_dicts and logs result by merging on the time axis with existing stats.
`MetricsLogger.log_time`	Measures and logs a time delta value under `key` when used with a with-block.

Scheduler API#

RLlib uses the Scheduler API to set scheduled values for variables, in Python or PyTorch, dependent on an int timestep input. The type of the schedule is always a PiecewiseSchedule, which defines a list of increasing time steps, starting at 0, associated with values to be reached at these particular timesteps. PiecewiseSchedule interpolates values for all intermittent timesteps. The computed values are usually float32 types.

For example:

from ray.rllib.utils.schedules.scheduler import Scheduler

scheduler = Scheduler([[0, 0.1], [50, 0.05], [60, 0.001]])
print(scheduler.get_current_value())  # <- expect 0.1

# Up the timestep.
scheduler.update(timestep=45)
print(scheduler.get_current_value())  # <- expect 0.055

# Up the timestep.
scheduler.update(timestep=100)
print(scheduler.get_current_value())  # <- expect 0.001 (keep final value)

`Scheduler`	Class to manage a scheduled (framework-dependent) tensor variable.
`Scheduler.validate`	Performs checking of a certain schedule configuration.
`Scheduler.get_current_value`	Returns the current value (as a tensor variable).
`Scheduler.update`	Updates the underlying (framework specific) tensor variable.
`Scheduler._create_tensor_variable`	Creates a framework-specific tensor variable to be scheduled.

Framework Utilities#

Import utilities#

try_import_torch

Tries importing torch and returns the module (or None).

Torch utilities#

`clip_gradients`	Performs gradient clipping on a grad-dict based on a clip value and clip mode.
`compute_global_norm`	Computes the global norm for a gradients dict.
`convert_to_torch_tensor`	Converts any (possibly nested) structure to torch.Tensors.
`explained_variance`	Computes the explained variance for a pair of labels and predictions.
`flatten_inputs_to_1d_tensor`	Flattens arbitrary input structs according to the given spaces struct.
`global_norm`	Returns the global L2 norm over a list of tensors.
`one_hot`	Returns a one-hot tensor, given and int tensor and a space.
`reduce_mean_ignore_inf`	Same as torch.mean() but ignores -inf values.
`sequence_mask`	Offers same behavior as tf.sequence_mask for torch.
`set_torch_seed`	Sets the torch random seed to the given value.
`softmax_cross_entropy_with_logits`	Same behavior as tf.nn.softmax_cross_entropy_with_logits.
`update_target_network`	Updates a torch.nn.Module target network using Polyak averaging.

Numpy utilities#

`aligned_array`
`concat_aligned`
`convert_to_numpy`	Converts values in `stats` to non-Tensor numpy or python types.
`fc`	Calculates FC (dense) layer outputs given weights/biases and input.
`flatten_inputs_to_1d_tensor`	Flattens arbitrary input structs according to the given spaces struct.
`make_action_immutable`	Flags actions immutable to notify users when trying to change them.
`huber_loss`	Reference: https://en.wikipedia.org/wiki/Huber_loss.
`l2_loss`	Computes half the L2 norm of a tensor (w/o the sqrt): sum(x**2) / 2.
`lstm`	Calculates LSTM layer output given weights/biases, states, and input.
`one_hot`	One-hot utility function for numpy.
`relu`	Implementation of the leaky ReLU function.
`sigmoid`	Returns the sigmoid function applied to x.
`softmax`	Returns the softmax values for x.

Checkpoint utilities#

`try_import_msgpack`	Tries importing msgpack and msgpack_numpy and returns the patched msgpack module.
`Checkpointable`	Abstract base class for a component of RLlib that can be checkpointed to disk.