Note

Ray 2.10.0 introduces the alpha stage of RLlib’s “new API stack”. The team is currently transitioning algorithms, example scripts, and documentation to the new code base throughout the subsequent minor releases leading up to Ray 3.0.

See here for more details on how to activate and use the new API stack.

RLlib Utilities#

Here is a list of all the utilities available in RLlib.

Exploration API#

Exploration is crucial in RL for enabling a learning agent to find new, potentially high-reward states by reaching unexplored areas of the environment.

RLlib has several built-in exploration components that the different algorithms use. You can also customize an algorithm’s exploration behavior by sub-classing the Exploration base class and implementing your own logic:

Built-in Exploration components#

`Exploration`	Implements an exploration strategy for Policies.
`Random`	A random action selector (deterministic/greedy for explore=False).
`StochasticSampling`	An exploration that simply samples from a distribution.
`EpsilonGreedy`	Epsilon-greedy Exploration class that produces exploration actions.
`GaussianNoise`	An exploration that adds white noise to continuous actions.
`OrnsteinUhlenbeckNoise`	An exploration that adds Ornstein-Uhlenbeck noise to continuous actions.
`RE3`	Random Encoder for Efficient Exploration.
`Curiosity`	Implementation of: [1] Curiosity-driven Exploration by Self-supervised Prediction Pathak, Agrawal, Efros, and Darrell - UC Berkeley - ICML 2017.
`ParameterNoise`	An exploration that changes a Model's parameters.

Inference#

get_exploration_action

Returns a (possibly) exploratory action and its log-likelihood.

Callback hooks#

`before_compute_actions`	Hook for preparations before policy.compute_actions() is called.
`on_episode_start`	Handles necessary exploration logic at the beginning of an episode.
`on_episode_end`	Handles necessary exploration logic at the end of an episode.
`postprocess_trajectory`	Handles post-processing of done episode trajectories.

Setting and getting states#

`get_state`	Returns the current exploration state.
`set_state`	Sets the Exploration object's state to the given values.

Scheduler API#

Use a scheduler to set scheduled values for variables (in Python, PyTorch, or TensorFlow) based on an (int64) timestep input. The computed values are usually float32 types.

Built-in Scheduler components#

`Schedule`	Schedule classes implement various time-dependent scheduling schemas.
`ConstantSchedule`	A Schedule where the value remains constant over time.
`LinearSchedule`	Linear interpolation between `initial_p` and `final_p`.
`PiecewiseSchedule`	Implements a Piecewise Scheduler.
`ExponentialSchedule`	Exponential decay schedule from `initial_p` to `final_p`.
`PolynomialSchedule`	Polynomial interpolation between `initial_p` and `final_p`.

Methods#

`value`	Generates the value given a timestep (based on schedule's logic).
`__call__`	Simply calls self.value(t).

Training Operations Utilities#

`multi_gpu_train_one_step`	Multi-GPU version of train_one_step.
`train_one_step`	Function that improves the all policies in `train_batch` on the local worker.

Framework Utilities#

Import utilities#

`try_import_torch`	Tries importing torch and returns the module (or None).
`try_import_tf`	Tries importing tf and returns the module (or None).
`try_import_tfp`	Tries importing tfp and returns the module (or None).

Tensorflow utilities#

`explained_variance`	Computes the explained variance for a pair of labels and predictions.
`flatten_inputs_to_1d_tensor`	Flattens arbitrary input structs according to the given spaces struct.
`get_gpu_devices`	Returns a list of GPU device names, e.g. ["/gpu:0", "/gpu:1"].
`get_placeholder`	Returns a tf1.placeholder object given optional hints, such as a space.
`huber_loss`	Computes the huber loss for a given term and delta parameter.
`l2_loss`	Computes half the L2 norm over a tensor's values without the sqrt.
`make_tf_callable`	Returns a function that can be executed in either graph or eager mode.
`minimize_and_clip`	Computes, then clips gradients using objective, optimizer and var list.
`one_hot`	Returns a one-hot tensor, given and int tensor and a space.
`reduce_mean_ignore_inf`	Same as tf.reduce_mean() but ignores -inf values.
`scope_vars`	Get variables inside a given scope.
`warn_if_infinite_kl_divergence`
`zero_logps_from_actions`	Helper function useful for returning dummy logp's (0) for some actions.

Torch utilities#

`apply_grad_clipping`	Applies gradient clipping to already computed grads inside `optimizer`.
`concat_multi_gpu_td_errors`	Concatenates multi-GPU (per-tower) TD error tensors given TorchPolicy.
`convert_to_torch_tensor`	Converts any struct to torch.Tensors.
`explained_variance`	Computes the explained variance for a pair of labels and predictions.
`flatten_inputs_to_1d_tensor`	Flattens arbitrary input structs according to the given spaces struct.
`global_norm`	Returns the global L2 norm over a list of tensors.
`huber_loss`	Computes the huber loss for a given term and delta parameter.
`l2_loss`	Computes half the L2 norm over a tensor's values without the sqrt.
`minimize_and_clip`	Clips grads found in `optimizer.param_groups` to given value in place.
`one_hot`	Returns a one-hot tensor, given and int tensor and a space.
`reduce_mean_ignore_inf`	Same as torch.mean() but ignores -inf values.
`sequence_mask`	Offers same behavior as tf.sequence_mask for torch.
`warn_if_infinite_kl_divergence`
`set_torch_seed`	Sets the torch random seed to the given value.
`softmax_cross_entropy_with_logits`	Same behavior as tf.nn.softmax_cross_entropy_with_logits.

Numpy utilities#

`aligned_array`
`concat_aligned`
`convert_to_numpy`	Converts values in `stats` to non-Tensor numpy or python types.
`fc`	Calculates FC (dense) layer outputs given weights/biases and input.
`flatten_inputs_to_1d_tensor`	Flattens arbitrary input structs according to the given spaces struct.
`make_action_immutable`	Flags actions immutable to notify users when trying to change them.
`huber_loss`	Reference: https://en.wikipedia.org/wiki/Huber_loss.
`l2_loss`	Computes half the L2 norm of a tensor (w/o the sqrt): sum(x**2) / 2.
`lstm`	Calculates LSTM layer output given weights/biases, states, and input.
`one_hot`	One-hot utility function for numpy.
`relu`	Implementation of the leaky ReLU function.
`sigmoid`	Returns the sigmoid function applied to x.
`softmax`	Returns the softmax values for x.

Checkpoint utilities#

`Checkpointable`	Abstract base class for a component of RLlib that can be checkpointed to disk.
`convert_to_msgpack_checkpoint`	Converts an Algorithm checkpoint (pickle based) to a msgpack based one.
`convert_to_msgpack_policy_checkpoint`	Converts a Policy checkpoint (pickle based) to a msgpack based one.
`get_checkpoint_info`	Returns a dict with information about an Algorithm/Policy checkpoint.
`try_import_msgpack`	Tries importing msgpack and msgpack_numpy and returns the patched msgpack module.

Policy utilities#

`compute_log_likelihoods_from_input_dict`	Returns log likelihood for actions in given batch for policy.
`create_policy_for_framework`	Framework-specific policy creation logics.
`local_policy_inference`	Run a connector enabled policy using environment observation.
`parse_policy_specs_from_checkpoint`	Read and parse policy specifications from a checkpoint file.

Other utilities#

`utils.tensor_dtype.get_np_dtype`	Returns the NumPy dtype of the given tensor or array.
`common.CLIArguments`	Dataclass for CLI arguments and options.
`common.FrameworkEnum`	Supported frameworks for RLlib, used for CLI argument validation.
`common.SupportedFileType`	Supported file types for RLlib, used for CLI argument validation.
`core.rl_module.validate_module_id`	Makes sure the given `policy_id` is valid.
`train.load_experiments_from_file`	Load experiments from a file.