Note
Ray 2.40 uses RLlib’s new API stack by default. The Ray team has mostly completed transitioning algorithms, example scripts, and documentation to the new code base.
If you’re still using the old API stack, see New API stack migration guide for details on how to migrate.
RLlib Utilities#
Here is a list of all the utilities available in RLlib.
Scheduler API#
RLlib uses the Scheduler API to set scheduled values for variables, in Python or PyTorch,
dependent on an int timestep input. The type of the schedule is always a PiecewiseSchedule
, which defines a list
of increasing time steps, starting at 0, associated with values to be reached at these particular timesteps.
PiecewiseSchedule
interpolates values for all intermittent timesteps.
The computed values are usually float32 types.
For example:
from ray.rllib.utils.schedules.scheduler import Scheduler
scheduler = Scheduler([[0, 0.1], [50, 0.05], [60, 0.001]])
print(scheduler.get_current_value()) # <- expect 0.1
# Up the timestep.
scheduler.update(timestep=45)
print(scheduler.get_current_value()) # <- expect 0.055
# Up the timestep.
scheduler.update(timestep=100)
print(scheduler.get_current_value()) # <- expect 0.001 (keep final value)
Class to manage a scheduled (framework-dependent) tensor variable. |
|
Performs checking of a certain schedule configuration. |
|
Returns the current value (as a tensor variable). |
|
Updates the underlying (framework specific) tensor variable. |
|
Creates a framework-specific tensor variable to be scheduled. |
Framework Utilities#
Import utilities#
Tries importing torch and returns the module (or None). |
Torch utilities#
Performs gradient clipping on a grad-dict based on a clip value and clip mode. |
|
Computes the global norm for a gradients dict. |
|
Converts any struct to torch.Tensors. |
|
Computes the explained variance for a pair of labels and predictions. |
|
Flattens arbitrary input structs according to the given spaces struct. |
|
Returns the global L2 norm over a list of tensors. |
|
Returns a one-hot tensor, given and int tensor and a space. |
|
Same as torch.mean() but ignores -inf values. |
|
Offers same behavior as tf.sequence_mask for torch. |
|
Sets the torch random seed to the given value. |
|
Same behavior as tf.nn.softmax_cross_entropy_with_logits. |
|
Updates a torch.nn.Module target network using Polyak averaging. |
Numpy utilities#
Converts values in |
|
Calculates FC (dense) layer outputs given weights/biases and input. |
|
Flattens arbitrary input structs according to the given spaces struct. |
|
Flags actions immutable to notify users when trying to change them. |
|
Reference: https://en.wikipedia.org/wiki/Huber_loss. |
|
Computes half the L2 norm of a tensor (w/o the sqrt): sum(x**2) / 2. |
|
Calculates LSTM layer output given weights/biases, states, and input. |
|
One-hot utility function for numpy. |
|
Implementation of the leaky ReLU function. |
|
Returns the sigmoid function applied to x. |
|
Returns the softmax values for x. |
Checkpoint utilities#
Tries importing msgpack and msgpack_numpy and returns the patched msgpack module. |
|
Abstract base class for a component of RLlib that can be checkpointed to disk. |