ray.rllib.policy.policy.Policy#
- class ray.rllib.policy.policy.Policy(observation_space: gymnasium.Space, action_space: gymnasium.Space, config: dict)[source]#
RLlib’s base class for all Policy implementations.
Policy is the abstract superclass for all DL-framework specific sub-classes (e.g. TFPolicy or TorchPolicy). It exposes APIs to
Compute actions from observation (and possibly other) inputs.
Manage the Policy’s NN model(s), like exporting and loading their weights.
- Postprocess a given trajectory from the environment or other input via the
postprocess_trajectory
method.
Compute losses from a train batch.
- Perform updates from a train batch on the NN-models (this normally includes loss
calculations) either:
in one monolithic step (
learn_on_batch
)- via batch pre-loading, then n steps of actual loss computations and updates
(
load_batch_into_buffer
+learn_on_loaded_batch
).
Methods
Initializes a Policy instance.
Calls the given function with this Policy instance.
Applies the (previously) computed gradients.
Computes actions for the current policy.
Computes actions from collected samples (across multiple-agents).
Computes gradients given a batch of experiences.
Computes the log-prob/likelihood for a given action and observation.
Computes and returns a single (B=1) action value.
Exports Policy checkpoint to a local directory and returns an AIR Checkpoint.
Exports the Policy's Model to local directory for serving.
Creates new Policy instance(s) from a given Policy or Algorithm checkpoint.
Recovers a Policy from a state object.
Get metrics on timing from connectors.
Returns the state of this Policy's exploration component.
Returns the computer's network name.
Returns initial RNN state for the current policy.
Returns the number of currently loaded samples in the given buffer.
Returns tf.Session object to use for computing actions or None.
Returns the entire current state of this Policy.
Returns model weights.
Imports Policy from local file.
Maximal view requirements dict for
learn_on_batch()
andcompute_actions
calls.Whether this Policy holds a recurrent Model.
Perform one learning update, given
samples
.Samples a batch from given replay actor and performs an update.
Runs a single step of SGD on an already loaded data in a buffer.
Bulk-loads the given SampleBatch into the devices' memories.
Loss function for this Policy.
Removes a time dimension for recurrent RLModules.
The number of internal states needed by the RNN-Model of the Policy.
Called on an update to global vars.
Implements algorithm-specific trajectory postprocessing.
Reset action- and agent-connectors for this policy.
Restore agent and action connectors if configs available.
Restores the entire current state of this Policy from
state
.Sets this Policy's model's weights.