Note

Ray 2.40 uses RLlib’s new API stack by default. The Ray team has mostly completed transitioning algorithms, example scripts, and documentation to the new code base.

If you’re still using the old API stack, see New API stack migration guide for details on how to migrate.

RLlib Utilities#

Here is a list of all the utilities available in RLlib.

Scheduler API#

RLlib uses the Scheduler API to set scheduled values for variables, in Python or PyTorch, dependent on an int timestep input. The type of the schedule is always a PiecewiseSchedule, which defines a list of increasing time steps, starting at 0, associated with values to be reached at these particular timesteps. PiecewiseSchedule interpolates values for all intermittent timesteps. The computed values are usually float32 types.

For example:

from ray.rllib.utils.schedules.scheduler import Scheduler

scheduler = Scheduler([[0, 0.1], [50, 0.05], [60, 0.001]])
print(scheduler.get_current_value())  # <- expect 0.1

# Up the timestep.
scheduler.update(timestep=45)
print(scheduler.get_current_value())  # <- expect 0.055

# Up the timestep.
scheduler.update(timestep=100)
print(scheduler.get_current_value())  # <- expect 0.001 (keep final value)

Scheduler

Class to manage a scheduled (framework-dependent) tensor variable.

Scheduler.validate

Performs checking of a certain schedule configuration.

Scheduler.get_current_value

Returns the current value (as a tensor variable).

Scheduler.update

Updates the underlying (framework specific) tensor variable.

Scheduler._create_tensor_variable

Creates a framework-specific tensor variable to be scheduled.

Framework Utilities#

Import utilities#

try_import_torch

Tries importing torch and returns the module (or None).

Torch utilities#

clip_gradients

Performs gradient clipping on a grad-dict based on a clip value and clip mode.

compute_global_norm

Computes the global norm for a gradients dict.

convert_to_torch_tensor

Converts any struct to torch.Tensors.

explained_variance

Computes the explained variance for a pair of labels and predictions.

flatten_inputs_to_1d_tensor

Flattens arbitrary input structs according to the given spaces struct.

global_norm

Returns the global L2 norm over a list of tensors.

one_hot

Returns a one-hot tensor, given and int tensor and a space.

reduce_mean_ignore_inf

Same as torch.mean() but ignores -inf values.

sequence_mask

Offers same behavior as tf.sequence_mask for torch.

set_torch_seed

Sets the torch random seed to the given value.

softmax_cross_entropy_with_logits

Same behavior as tf.nn.softmax_cross_entropy_with_logits.

update_target_network

Updates a torch.nn.Module target network using Polyak averaging.

Numpy utilities#

aligned_array

concat_aligned

convert_to_numpy

Converts values in stats to non-Tensor numpy or python types.

fc

Calculates FC (dense) layer outputs given weights/biases and input.

flatten_inputs_to_1d_tensor

Flattens arbitrary input structs according to the given spaces struct.

make_action_immutable

Flags actions immutable to notify users when trying to change them.

huber_loss

Reference: https://en.wikipedia.org/wiki/Huber_loss.

l2_loss

Computes half the L2 norm of a tensor (w/o the sqrt): sum(x**2) / 2.

lstm

Calculates LSTM layer output given weights/biases, states, and input.

one_hot

One-hot utility function for numpy.

relu

Implementation of the leaky ReLU function.

sigmoid

Returns the sigmoid function applied to x.

softmax

Returns the softmax values for x.

Checkpoint utilities#

try_import_msgpack

Tries importing msgpack and msgpack_numpy and returns the patched msgpack module.

Checkpointable

Abstract base class for a component of RLlib that can be checkpointed to disk.