RLlib Utilities#

Here is a list of all the utilities available in RLlib.

Exploration API#

Exploration is crucial in RL for enabling a learning agent to find new, potentially high-reward states by reaching unexplored areas of the environment.

RLlib has several built-in exploration components that the different algorithms use. You can also customize an algorithm’s exploration behavior by sub-classing the Exploration base class and implementing your own logic:

Built-in Exploration components#

Exploration

Implements an exploration strategy for Policies.

Random

A random action selector (deterministic/greedy for explore=False).

StochasticSampling

An exploration that simply samples from a distribution.

EpsilonGreedy

Epsilon-greedy Exploration class that produces exploration actions.

GaussianNoise

An exploration that adds white noise to continuous actions.

OrnsteinUhlenbeckNoise

An exploration that adds Ornstein-Uhlenbeck noise to continuous actions.

RE3

Random Encoder for Efficient Exploration.

Curiosity

Implementation of: [1] Curiosity-driven Exploration by Self-supervised Prediction Pathak, Agrawal, Efros, and Darrell - UC Berkeley - ICML 2017.

ParameterNoise

An exploration that changes a Model's parameters.

Inference#

get_exploration_action

Returns a (possibly) exploratory action and its log-likelihood.

Callback hooks#

before_compute_actions

Hook for preparations before policy.compute_actions() is called.

on_episode_start

Handles necessary exploration logic at the beginning of an episode.

on_episode_end

Handles necessary exploration logic at the end of an episode.

postprocess_trajectory

Handles post-processing of done episode trajectories.

Setting and getting states#

get_state

Returns the current exploration state.

set_state

Sets the Exploration object's state to the given values.

Scheduler API#

Use a scheduler to set scheduled values for variables (in Python, PyTorch, or TensorFlow) based on an (int64) timestep input. The computed values are usually float32 types.

Built-in Scheduler components#

Schedule

Schedule classes implement various time-dependent scheduling schemas.

ConstantSchedule

A Schedule where the value remains constant over time.

LinearSchedule

Linear interpolation between initial_p and final_p.

PiecewiseSchedule

Implements a Piecewise Scheduler.

ExponentialSchedule

Exponential decay schedule from initial_p to final_p.

PolynomialSchedule

Polynomial interpolation between initial_p and final_p.

Methods#

value

Generates the value given a timestep (based on schedule's logic).

__call__

Simply calls self.value(t).

Training Operations Utilities#

multi_gpu_train_one_step

Multi-GPU version of train_one_step.

train_one_step

Function that improves the all policies in train_batch on the local worker.

Framework Utilities#

Import utilities#

try_import_torch

Tries importing torch and returns the module (or None).

try_import_tf

Tries importing tf and returns the module (or None).

try_import_tfp

Tries importing tfp and returns the module (or None).

Tensorflow utilities#

explained_variance

Computes the explained variance for a pair of labels and predictions.

flatten_inputs_to_1d_tensor

Flattens arbitrary input structs according to the given spaces struct.

get_gpu_devices

Returns a list of GPU device names, e.g.

get_placeholder

Returns a tf1.placeholder object given optional hints, such as a space.

huber_loss

Computes the huber loss for a given term and delta parameter.

l2_loss

Computes half the L2 norm over a tensor's values without the sqrt.

make_tf_callable

Returns a function that can be executed in either graph or eager mode.

minimize_and_clip

Computes, then clips gradients using objective, optimizer and var list.

one_hot

Returns a one-hot tensor, given and int tensor and a space.

reduce_mean_ignore_inf

Same as tf.reduce_mean() but ignores -inf values.

scope_vars

Get variables inside a given scope.

warn_if_infinite_kl_divergence

zero_logps_from_actions

Helper function useful for returning dummy logp's (0) for some actions.

Torch utilities#

apply_grad_clipping

Applies gradient clipping to already computed grads inside optimizer.

concat_multi_gpu_td_errors

Concatenates multi-GPU (per-tower) TD error tensors given TorchPolicy.

convert_to_torch_tensor

Converts any struct to torch.Tensors.

explained_variance

Computes the explained variance for a pair of labels and predictions.

flatten_inputs_to_1d_tensor

Flattens arbitrary input structs according to the given spaces struct.

get_device

Returns a torch device edepending on a config and current worker index.

global_norm

Returns the global L2 norm over a list of tensors.

huber_loss

Computes the huber loss for a given term and delta parameter.

l2_loss

Computes half the L2 norm over a tensor's values without the sqrt.

minimize_and_clip

Clips grads found in optimizer.param_groups to given value in place.

one_hot

Returns a one-hot tensor, given and int tensor and a space.

reduce_mean_ignore_inf

Same as torch.mean() but ignores -inf values.

sequence_mask

Offers same behavior as tf.sequence_mask for torch.

warn_if_infinite_kl_divergence

set_torch_seed

Sets the torch random seed to the given value.

softmax_cross_entropy_with_logits

Same behavior as tf.nn.softmax_cross_entropy_with_logits.

Numpy utilities#

aligned_array

concat_aligned

convert_to_numpy

Converts values in stats to non-Tensor numpy or python types.

fc

Calculates FC (dense) layer outputs given weights/biases and input.

flatten_inputs_to_1d_tensor

Flattens arbitrary input structs according to the given spaces struct.

make_action_immutable

Flags actions immutable to notify users when trying to change them.

huber_loss

Reference: https://en.wikipedia.org/wiki/Huber_loss.

l2_loss

Computes half the L2 norm of a tensor (w/o the sqrt): sum(x**2) / 2.

lstm

Calculates LSTM layer output given weights/biases, states, and input.

one_hot

One-hot utility function for numpy.

relu

Implementation of the leaky ReLU function.

sigmoid

Returns the sigmoid function applied to x.

softmax

Returns the softmax values for x.