Sampling the Environment or offline data#
Data ingest via either environment rollouts or other data-generating methods
(e.g. reading from offline files) is done in RLlib by RolloutWorker
instances,
which sit inside a WorkerSet
(together with other parallel RolloutWorkers
) in the RLlib Algorithm
(under the self.workers
property):
RolloutWorker API#
Constructor#
Common experience collection class. |
Multi agent#
Adds a new policy to this RolloutWorker. |
|
Removes a policy from this RolloutWorker. |
|
Return policy for the specified id, or None. |
|
Sets |
|
Sets |
|
Calls the given function with the specified policy as first arg. |
|
Calls the given function with each (policy, policy_id) tuple. |
|
Calls the given function with each (policy, policy_id) tuple. |
Setter and getter methods#
Returns a snapshot of filters. |
|
Returns the current |
|
Updates this worker's and all its policies' global vars. |
|
Returns the hostname of the process running this evaluator. |
|
Returns the thus-far collected metrics from this worker's rollouts. |
|
Returns the IP address of the node that this worker runs on. |
|
Returns each policies' model weights of this worker. |
|
Sets each policies' model weights of this worker. |
|
Returns this EnvRunner's (possibly serialized) current state as a dict. |
|
Restores this EnvRunner's state from the given state dict. |
Threading#
Locks this RolloutWorker via its own threading.Lock. |
|
Unlocks this RolloutWorker via its own threading.Lock. |
Sampling API#
Returns a batch of experience sampled from this worker. |
|
Same as sample() but returns the count as a separate value. |
|
Sample and batch and learn on it. |
Training API#
Update policies based on the given batch. |
|
Join a torch process group for distributed SGD. |
|
Returns a gradient computed w.r.t the specified samples. |
|
Applies the given gradients to this worker's models. |
Environment API#
Calls the given function with each sub-environment as arg. |
|
Calls given function with each sub-env plus env_ctx as args. |
Miscellaneous#
Releases all resources used by this RolloutWorker. |
|
Calls the given function with this Actor instance. |
|
Changes self's filter to given and rebases any accumulated delta. |
|
Finds a free port on the node that this worker runs on. |
|
Returns the kwargs dict used to create this worker. |
|
Checks that self.__init__() has been completed properly. |
WorkerSet API#
Constructor#
Set of EnvRunners with n @ray.remote workers and zero or one local worker. |
|
Calls |
|
Hard overrides the remote workers in this set with the given one. |
Worker Orchestration#
Creates and adds a number of remote workers to this worker set. |
|
Calls the given function with each EnvRunner as its argument. |
|
Calls the given function with each EnvRunner and its ID as its arguments. |
|
Calls the given function asynchronously with each worker as the argument. |
|
Get esults from outstanding asynchronous requests that are ready. |
|
Returns the number of in-flight async requests. |
|
Returns the local rollout worker. |
|
Returns the number of healthy remote workers. |
|
Returns the number of all healthy workers, including the local worker. |
|
Total number of times managed remote workers have been restarted. |
|
Checks for unhealthy workers and tries restoring their states. |
Pass-through methods#
Adds a policy to this WorkerSet's workers or a specific list of workers. |
|
Calls |
|
Calls |
|
Calls |
|
Apply |
|
Syncs model weights from the given weight source to all remote workers. |
Sampler API#
InputReader
instances are used to collect and return experiences from the envs.
For more details on InputReader
used for offline RL (e.g. reading files of
pre-recorded data), see the offline RL API reference here.
Input Reader API#
API for collecting and returning experiences during policy evaluation. |
|
Returns the next batch of read experiences. |
Input Sampler API#
Reads input experiences from an existing sampler. |
|
Called by |
|
Returns list of extra batches since the last call to this method. |
|
Returns list of episode metrics since the last call to this method. |
Synchronous Sampler API#
Sync SamplerInput that collects experiences when |
Offline Sampler API#
The InputReader API is used by an individual RolloutWorker
to produce batches of experiences either from an simulator or from an
offline source (e.g. a file).
Here are some example extentions of the InputReader API:
JSON reader API#
Reader object that loads experiences from JSON file chunks. |
|
Reads through all files and yields one SampleBatchType per line. |
Mixed input reader#
Mixes input from a number of other input sources. |
D4RL reader#
Reader object that loads the dataset from the D4RL dataset. |
IOContext#
Class containing attributes to pass to input/output class constructors. |
|
Returns the RolloutWorker's SamplerInput object, if any. |
Policy Map API#
Maps policy IDs to Policy objects. |
|
Iterates over all policies, even the stashed ones. |
|
Returns all valid keys, even the stashed ones. |
|
Returns all valid values, even the stashed ones. |
Sample batch API#
Wrapper around a dictionary with string keys and array-like values. |
|
Sets a function to be called on every getitem. |
|
Sets the |
|
Returns the respective MultiAgentBatch using DEFAULT_POLICY_ID. |
|
Returns one column (by key) from the data or a default value. |
|
TODO: transfer batch to given device as framework tensor. |
|
Right (adding zeros at end) zero-pads this SampleBatch in-place. |
|
Returns a slice of the row data of this batch (w/o copying). |
|
Splits by |
|
Shuffles the rows of this batch in-place. |
|
Returns a list of the batch-data in the specified columns. |
|
Returns an iterator over data rows, i.e. dicts with column values. |
|
Creates a deep or shallow copy of this SampleBatch and returns it. |
|
Returns True if this SampleBatch only contains one trajectory. |
|
Returns True if |
|
Returns the same as len(self) (number of steps in this batch). |
|
Returns the same as len(self) (number of steps in this batch). |
MultiAgent batch API#
A batch of experiences from multiple agents in the environment. |
|
The number of env steps (there are >= 1 agent steps per env step). |
|
The number of agent steps (there are >= 1 agent steps per env step). |