Note
Ray 2.10.0 introduces the alpha stage of RLlib’s “new API stack”. The Ray Team plans to transition algorithms, example scripts, and documentation to the new code base thereby incrementally replacing the “old API stack” (e.g., ModelV2, Policy, RolloutWorker) throughout the subsequent minor releases leading up to Ray 3.0.
Note, however, that so far only PPO (single- and multi-agent) and SAC (single-agent only) support the “new API stack” and continue to run by default with the old APIs. You can continue to use the existing custom (old stack) classes.
See here for more details on how to use the new API stack.
Sampling the Environment or offline data#
Data ingest via either environment rollouts or other data-generating methods
(e.g. reading from offline files) is done in RLlib by EnvRunner
instances,
which sit inside a EnvRunnerGroup
(together with other parallel EnvRunners
) in the RLlib Algorithm
(under the self.env_runner_group
property):
RolloutWorker API#
Constructor#
Common experience collection class. |
Multi agent#
Adds a new policy to this RolloutWorker. |
|
Removes a policy from this RolloutWorker. |
|
Return policy for the specified id, or None. |
|
Sets |
|
Sets |
|
Calls the given function with the specified policy as first arg. |
|
Calls the given function with each (policy, policy_id) tuple. |
|
Calls the given function with each (policy, policy_id) tuple. |
Setter and getter methods#
Returns a snapshot of filters. |
|
Returns the current |
|
Updates this worker's and all its policies' global vars. |
|
Returns the hostname of the process running this evaluator. |
|
Returns the thus-far collected metrics from this worker's rollouts. |
|
Returns the IP address of the node that this worker runs on. |
|
Returns each policies' model weights of this worker. |
|
Sets each policies' model weights of this worker. |
|
Threading#
Locks this RolloutWorker via its own threading.Lock. |
|
Unlocks this RolloutWorker via its own threading.Lock. |
Sampling API#
Returns a batch of experience sampled from this worker. |
|
Same as sample() but returns the count as a separate value. |
|
Sample and batch and learn on it. |
Training API#
Update policies based on the given batch. |
|
Join a torch process group for distributed SGD. |
|
Returns a gradient computed w.r.t the specified samples. |
|
Applies the given gradients to this worker's models. |
Environment API#
Calls the given function with each sub-environment as arg. |
|
Calls given function with each sub-env plus env_ctx as args. |
Miscellaneous#
Releases all resources used by this RolloutWorker. |
|
Calls the given function with this Actor instance. |
|
Changes self's filter to given and rebases any accumulated delta. |
|
Finds a free port on the node that this worker runs on. |
|
Returns the kwargs dict used to create this worker. |
|
Checks that self.__init__() has been completed properly. |
EnvRunner API#
Base class for distributed RL-style data collection from an environment. |
EnvRunnerGroup API#
Constructor#
Set of EnvRunners with n @ray.remote workers and zero or one local worker. |
|
Calls |
|
Hard overrides the remote EnvRunners in this set with the provided ones. |
Worker Orchestration#
Creates and adds a number of remote workers to this worker set. |
|
Calls the given function with each EnvRunner as its argument. |
|
Calls the given function with each EnvRunner and its ID as its arguments. |
|
Calls the given function asynchronously with each worker as the argument. |
|
Get esults from outstanding asynchronous requests that are ready. |
|
Returns the number of in-flight async requests. |
|
Returns the number of healthy remote workers. |
|
Returns the number of all healthy workers, including the local worker. |
|
Total number of times managed remote workers have been restarted. |
|
Checks for unhealthy workers and tries restoring their states. |
Pass-through methods#
Adds a policy to this EnvRunnerGroup's workers or a specific list of workers. |
|
Calls |
|
Calls |
|
Calls |
|
Apply |
|
Syncs model weights from the given weight source to all remote workers. |
Sampler API#
InputReader
instances are used to collect and return experiences from the envs.
For more details on InputReader
used for offline RL (e.g. reading files of
pre-recorded data), see the offline RL API reference here.
Input Reader API#
API for collecting and returning experiences during policy evaluation. |
|
Returns the next batch of read experiences. |
Input Sampler API#
Reads input experiences from an existing sampler. |
|
Called by |
|
Returns list of extra batches since the last call to this method. |
|
Returns list of episode metrics since the last call to this method. |
Synchronous Sampler API#
Sync SamplerInput that collects experiences when |
Offline Sampler API#
The InputReader API is used by an individual RolloutWorker
to produce batches of experiences either from an simulator or from an
offline source (e.g. a file).
Here are some example extentions of the InputReader API:
JSON reader API#
Reader object that loads experiences from JSON file chunks. |
|
Reads through all files and yields one SampleBatchType per line. |
Mixed input reader#
Mixes input from a number of other input sources. |
D4RL reader#
Reader object that loads the dataset from the D4RL dataset. |
IOContext#
Class containing attributes to pass to input/output class constructors. |
|
Returns the RolloutWorker's SamplerInput object, if any. |
Policy Map API#
Maps policy IDs to Policy objects. |
|
Iterates over all policies, even the stashed ones. |
|
Returns all valid keys, even the stashed ones. |
|
Returns all valid values, even the stashed ones. |
Sample batch API#
Wrapper around a dictionary with string keys and array-like values. |
|
Sets a function to be called on every getitem. |
|
Sets the |
|
Returns the respective MultiAgentBatch |
|
Returns one column (by key) from the data or a default value. |
|
TODO: transfer batch to given device as framework tensor. |
|
Right (adding zeros at end) zero-pads this SampleBatch in-place. |
|
Returns a slice of the row data of this batch (w/o copying). |
|
Splits by |
|
Shuffles the rows of this batch in-place. |
|
Returns a list of the batch-data in the specified columns. |
|
Returns an iterator over data rows, i.e. dicts with column values. |
|
Creates a deep or shallow copy of this SampleBatch and returns it. |
|
Returns True if this SampleBatch only contains one trajectory. |
|
Returns True if |
|
Returns the same as len(self) (number of steps in this batch). |
|
Returns the same as len(self) (number of steps in this batch). |
MultiAgent batch API#
A batch of experiences from multiple agents in the environment. |
|
The number of env steps (there are >= 1 agent steps per env step). |
|
The number of agent steps (there are >= 1 agent steps per env step). |