RLlib Utilities#

Here is a list of all the utilities available in RLlib.

Exploration API#

Exploration is crucial in RL for enabling a learning agent to find new, potentially high-reward states by reaching unexplored areas of the environment.

RLlib has several built-in exploration components that the different algorithms use. You can also customize an algorithm’s exploration behavior by sub-classing the Exploration base class and implementing your own logic:

Built-in Exploration components#

Exploration(action_space, *, framework, ...)

Implements an exploration strategy for Policies.

Random(action_space, *, model, framework, ...)

A random action selector (deterministic/greedy for explore=False).

StochasticSampling(action_space, *, ...)

An exploration that simply samples from a distribution.

EpsilonGreedy(action_space, *, framework, ...)

Epsilon-greedy Exploration class that produces exploration actions.

GaussianNoise(action_space, *, framework, ...)

An exploration that adds white noise to continuous actions.

OrnsteinUhlenbeckNoise(action_space, *, ...)

An exploration that adds Ornstein-Uhlenbeck noise to continuous actions.

RE3(action_space, *, framework, model, ...)

Random Encoder for Efficient Exploration.

Curiosity(action_space, *, framework, model, ...)

Implementation of: [1] Curiosity-driven Exploration by Self-supervised Prediction Pathak, Agrawal, Efros, and Darrell - UC Berkeley - ICML 2017.

ParameterNoise(action_space, *, framework, ...)

An exploration that changes a Model's parameters.

Inference#

get_exploration_action(*, ...[, explore])

Returns a (possibly) exploratory action and its log-likelihood.

Callback hooks#

before_compute_actions(*[, timestep, ...])

Hook for preparations before policy.compute_actions() is called.

on_episode_start(policy, *[, environment, ...])

Handles necessary exploration logic at the beginning of an episode.

on_episode_end(policy, *[, environment, ...])

Handles necessary exploration logic at the end of an episode.

postprocess_trajectory(policy, sample_batch)

Handles post-processing of done episode trajectories.

Setting and getting states#

get_state([sess])

Returns the current exploration state.

set_state(state[, sess])

Sets the Exploration object's state to the given values.

Scheduler API#

Use a scheduler to set scheduled values for variables (in Python, PyTorch, or TensorFlow) based on an (int64) timestep input. The computed values are usually float32 types.

Built-in Scheduler components#

Schedule(framework)

Schedule classes implement various time-dependent scheduling schemas.

ConstantSchedule(value[, framework])

A Schedule where the value remains constant over time.

LinearSchedule(**kwargs)

Linear interpolation between initial_p and final_p.

PiecewiseSchedule(endpoints, float]], ...)

Implements a Piecewise Scheduler.

ExponentialSchedule(schedule_timesteps[, ...])

Exponential decay schedule from initial_p to final_p.

PolynomialSchedule(schedule_timesteps, ...)

Polynomial interpolation between initial_p and final_p.

Methods#

value(t)

Generates the value given a timestep (based on schedule's logic).

__call__(t)

Simply calls self.value(t).

Training Operations Utilities#

multi_gpu_train_one_step(algorithm, train_batch)

Multi-GPU version of train_one_step.

train_one_step(algorithm, train_batch[, ...])

Function that improves the all policies in train_batch on the local worker.

Framework Utilities#

Import utilities#

try_import_torch([error])

Tries importing torch and returns the module (or None).

try_import_tf([error])

Tries importing tf and returns the module (or None).

try_import_tfp([error])

Tries importing tfp and returns the module (or None).

Tensorflow utilities#

explained_variance(y, pred)

Computes the explained variance for a pair of labels and predictions.

flatten_inputs_to_1d_tensor(inputs[, ...])

Flattens arbitrary input structs according to the given spaces struct.

get_gpu_devices()

Returns a list of GPU device names, e.g.

get_placeholder(*[, space, value, name, ...])

Returns a tf1.placeholder object given optional hints, such as a space.

huber_loss(x[, delta])

Computes the huber loss for a given term and delta parameter.

l2_loss(x)

Computes half the L2 norm over a tensor's values without the sqrt.

make_tf_callable(session_or_none[, ...])

Returns a function that can be executed in either graph or eager mode.

minimize_and_clip(optimizer, objective, var_list)

Computes, then clips gradients using objective, optimizer and var list.

one_hot(x, space)

Returns a one-hot tensor, given and int tensor and a space.

reduce_mean_ignore_inf(x[, axis])

Same as tf.reduce_mean() but ignores -inf values.

scope_vars(scope[, trainable_only])

Get variables inside a given scope.

warn_if_infinite_kl_divergence(policy, mean_kl)

zero_logps_from_actions(actions)

Helper function useful for returning dummy logp's (0) for some actions.

Torch utilities#

apply_grad_clipping(policy, optimizer, loss)

Applies gradient clipping to already computed grads inside optimizer.

concat_multi_gpu_td_errors(policy)

Concatenates multi-GPU (per-tower) TD error tensors given TorchPolicy.

convert_to_torch_tensor(x[, device])

Converts any struct to torch.Tensors.

explained_variance(y, pred)

Computes the explained variance for a pair of labels and predictions.

flatten_inputs_to_1d_tensor(inputs[, ...])

Flattens arbitrary input structs according to the given spaces struct.

get_device(config)

Returns a torch device edepending on a config and current worker index.

global_norm(tensors)

Returns the global L2 norm over a list of tensors.

huber_loss(x[, delta])

Computes the huber loss for a given term and delta parameter.

l2_loss(x)

Computes half the L2 norm over a tensor's values without the sqrt.

minimize_and_clip(optimizer[, clip_val])

Clips grads found in optimizer.param_groups to given value in place.

one_hot(x, space)

Returns a one-hot tensor, given and int tensor and a space.

reduce_mean_ignore_inf(x[, axis])

Same as torch.mean() but ignores -inf values.

sequence_mask(lengths[, maxlen, dtype, ...])

Offers same behavior as tf.sequence_mask for torch.

warn_if_infinite_kl_divergence(policy, ...)

set_torch_seed([seed])

Sets the torch random seed to the given value.

softmax_cross_entropy_with_logits(logits, labels)

Same behavior as tf.nn.softmax_cross_entropy_with_logits.

Numpy utilities#

aligned_array(*args, **kwargs)

concat_aligned(*args, **kwargs)

convert_to_numpy(x[, reduce_type, reduce_floats])

Converts values in stats to non-Tensor numpy or python types.

fc(x, weights[, biases, framework])

Calculates FC (dense) layer outputs given weights/biases and input.

flatten_inputs_to_1d_tensor(inputs[, ...])

Flattens arbitrary input structs according to the given spaces struct.

make_action_immutable(obj)

Flags actions immutable to notify users when trying to change them.

huber_loss(x[, delta])

Reference: https://en.wikipedia.org/wiki/Huber_loss.

l2_loss(x)

Computes half the L2 norm of a tensor (w/o the sqrt): sum(x**2) / 2.

lstm(x, weights[, biases, ...])

Calculates LSTM layer output given weights/biases, states, and input.

one_hot(x[, depth, on_value, off_value])

One-hot utility function for numpy.

relu(x[, alpha])

Implementation of the leaky ReLU function.

sigmoid(x[, derivative])

Returns the sigmoid function applied to x.

softmax(x[, axis, epsilon])

Returns the softmax values for x.