RLlib Utilities
Contents
RLlib Utilities#
Here is a list of all the utilities available in RLlib.
Exploration API#
Exploration is crucial in RL for enabling a learning agent to find new, potentially high-reward states by reaching unexplored areas of the environment.
RLlib has several built-in exploration components that the different algorithms use. You can also customize an algorithm’s exploration behavior by sub-classing the Exploration base class and implementing your own logic:
Built-in Exploration components#
|
Implements an exploration strategy for Policies. |
|
A random action selector (deterministic/greedy for explore=False). |
|
An exploration that simply samples from a distribution. |
|
Epsilon-greedy Exploration class that produces exploration actions. |
|
An exploration that adds white noise to continuous actions. |
|
An exploration that adds Ornstein-Uhlenbeck noise to continuous actions. |
|
Random Encoder for Efficient Exploration. |
|
Implementation of: [1] Curiosity-driven Exploration by Self-supervised Prediction Pathak, Agrawal, Efros, and Darrell - UC Berkeley - ICML 2017. |
|
An exploration that changes a Model's parameters. |
Inference#
|
Returns a (possibly) exploratory action and its log-likelihood. |
Callback hooks#
|
Hook for preparations before policy.compute_actions() is called. |
|
Handles necessary exploration logic at the beginning of an episode. |
|
Handles necessary exploration logic at the end of an episode. |
|
Handles post-processing of done episode trajectories. |
Setting and getting states#
|
Returns the current exploration state. |
|
Sets the Exploration object's state to the given values. |
Scheduler API#
Use a scheduler to set scheduled values for variables (in Python, PyTorch, or TensorFlow) based on an (int64) timestep input. The computed values are usually float32 types.
Built-in Scheduler components#
|
Schedule classes implement various time-dependent scheduling schemas. |
|
A Schedule where the value remains constant over time. |
|
Linear interpolation between |
|
Implements a Piecewise Scheduler. |
|
Exponential decay schedule from |
|
Polynomial interpolation between |
Methods#
|
Generates the value given a timestep (based on schedule's logic). |
|
Simply calls self.value(t). |
Training Operations Utilities#
|
Multi-GPU version of train_one_step. |
|
Function that improves the all policies in |
Framework Utilities#
Import utilities#
|
Tries importing torch and returns the module (or None). |
|
Tries importing tf and returns the module (or None). |
|
Tries importing tfp and returns the module (or None). |
Tensorflow utilities#
|
Computes the explained variance for a pair of labels and predictions. |
|
Flattens arbitrary input structs according to the given spaces struct. |
Returns a list of GPU device names, e.g. |
|
|
Returns a tf1.placeholder object given optional hints, such as a space. |
|
Computes the huber loss for a given term and delta parameter. |
|
Computes half the L2 norm over a tensor's values without the sqrt. |
|
Returns a function that can be executed in either graph or eager mode. |
|
Computes, then clips gradients using objective, optimizer and var list. |
|
Returns a one-hot tensor, given and int tensor and a space. |
|
Same as tf.reduce_mean() but ignores -inf values. |
|
Get variables inside a given scope. |
|
|
|
Helper function useful for returning dummy logp's (0) for some actions. |
Torch utilities#
|
Applies gradient clipping to already computed grads inside |
|
Concatenates multi-GPU (per-tower) TD error tensors given TorchPolicy. |
|
Converts any struct to torch.Tensors. |
|
Computes the explained variance for a pair of labels and predictions. |
|
Flattens arbitrary input structs according to the given spaces struct. |
|
Returns a torch device edepending on a config and current worker index. |
|
Returns the global L2 norm over a list of tensors. |
|
Computes the huber loss for a given term and delta parameter. |
|
Computes half the L2 norm over a tensor's values without the sqrt. |
|
Clips grads found in |
|
Returns a one-hot tensor, given and int tensor and a space. |
|
Same as torch.mean() but ignores -inf values. |
|
Offers same behavior as tf.sequence_mask for torch. |
|
|
|
Sets the torch random seed to the given value. |
|
Same behavior as tf.nn.softmax_cross_entropy_with_logits. |
Numpy utilities#
|
|
|
|
|
Converts values in |
|
Calculates FC (dense) layer outputs given weights/biases and input. |
|
Flattens arbitrary input structs according to the given spaces struct. |
Flags actions immutable to notify users when trying to change them. |
|
|
Reference: https://en.wikipedia.org/wiki/Huber_loss. |
|
Computes half the L2 norm of a tensor (w/o the sqrt): sum(x**2) / 2. |
|
Calculates LSTM layer output given weights/biases, states, and input. |
|
One-hot utility function for numpy. |
|
Implementation of the leaky ReLU function. |
|
Returns the sigmoid function applied to x. |
|
Returns the softmax values for x. |