ray.rllib.evaluation.rollout_worker.RolloutWorker#

class ray.rllib.evaluation.rollout_worker.RolloutWorker(*, env_creator: Callable[[EnvContext], Any | gymnasium.Env | None], validate_env: Callable[[Any | gymnasium.Env, EnvContext], None] | None = None, config: AlgorithmConfig | None = None, worker_index: int = 0, num_workers: int | None = None, recreated_worker: bool = False, log_dir: str | None = None, spaces: Dict[str, Tuple[gymnasium.spaces.Space, gymnasium.spaces.Space]] | None = None, default_policy_class: Type[Policy] | None = None, dataset_shards: List[Dataset] | None = None, **kwargs)[source]#

Bases: ParallelIteratorWorker, EnvRunner

Common experience collection class.

This class wraps a policy instance and an environment class to collect experiences from the environment. You can create many replicas of this class as Ray actors to scale RL training.

This class supports vectorized and multi-agent policy evaluation (e.g., VectorEnv, MultiAgentEnv, etc.)

# Create a rollout worker and using it to collect experiences.
import gymnasium as gym
from ray.rllib.evaluation.rollout_worker import RolloutWorker
from ray.rllib.algorithms.ppo.ppo_tf_policy import PPOTF1Policy
worker = RolloutWorker(
  env_creator=lambda _: gym.make("CartPole-v1"),
  default_policy_class=PPOTF1Policy)
print(worker.sample())

# Creating a multi-agent rollout worker
from gymnasium.spaces import Discrete, Box
import random
MultiAgentTrafficGrid = ...
worker = RolloutWorker(
  env_creator=lambda _: MultiAgentTrafficGrid(num_cars=25),
  config=AlgorithmConfig().multi_agent(
    policies={
      # Use an ensemble of two policies for car agents
      "car_policy1":
        (PGTFPolicy, Box(...), Discrete(...),
         AlgorithmConfig.overrides(gamma=0.99)),
      "car_policy2":
        (PGTFPolicy, Box(...), Discrete(...),
         AlgorithmConfig.overrides(gamma=0.95)),
      # Use a single shared policy for all traffic lights
      "traffic_light_policy":
        (PGTFPolicy, Box(...), Discrete(...), {}),
    },
    policy_mapping_fn=(
      lambda agent_id, episode, **kwargs:
      random.choice(["car_policy1", "car_policy2"])
      if agent_id.startswith("car_") else "traffic_light_policy"),
    ),
)
print(worker.sample())
SampleBatch({
    "obs": [[...]], "actions": [[...]], "rewards": [[...]],
    "terminateds": [[...]], "truncateds": [[...]], "new_obs": [[...]]}
)

MultiAgentBatch({
    "car_policy1": SampleBatch(...),
    "car_policy2": SampleBatch(...),
    "traffic_light_policy": SampleBatch(...)}
)

Methods

__init__

Initializes a RolloutWorker instance.

add_policy

Adds a new policy to this RolloutWorker.

apply

Calls the given function with this Actor instance.

apply_gradients

Applies the given gradients to this worker's models.

compute_gradients

Returns a gradient computed w.r.t the specified samples.

creation_args

Returns the kwargs dict used to create this worker.

find_free_port

Finds a free port on the node that this worker runs on.

for_policy

Calls the given function with the specified policy as first arg.

foreach_env

Calls the given function with each sub-environment as arg.

foreach_env_with_context

Calls given function with each sub-env plus env_ctx as args.

foreach_policy

Calls the given function with each (policy, policy_id) tuple.

foreach_policy_to_train

Calls the given function with each (policy, policy_id) tuple.

get_filters

Returns a snapshot of filters.

get_global_vars

Returns the current self.global_vars dict of this RolloutWorker.

get_host

Returns the hostname of the process running this evaluator.

get_metrics

Returns the thus-far collected metrics from this worker's rollouts.

get_node_ip

Returns the IP address of the node that this worker runs on.

get_policies_to_train

Returns all policies-to-train, given an optional batch.

get_policy

Return policy for the specified id, or None.

get_weights

Returns each policies' model weights of this worker.

learn_on_batch

Update policies based on the given batch.

lock

Locks this RolloutWorker via its own threading.Lock.

make_module

Creates the RLModule for this EnvRunner and assigns it to self.module.

par_iter_init

Implements ParallelIterator worker init.

par_iter_next

Implements ParallelIterator worker item fetch.

par_iter_next_batch

Batches par_iter_next.

par_iter_slice

Iterates in increments of step starting from start.

par_iter_slice_batch

Batches par_iter_slice.

ping

Ping the actor.

remove_policy

Removes a policy from this RolloutWorker.

sample

Returns a batch of experience sampled from this worker.

sample_and_learn

Sample and batch and learn on it.

sample_with_count

Same as sample() but returns the count as a separate value.

set_global_vars

Updates this worker's and all its policies' global vars.

set_is_policy_to_train

Sets self.is_policy_to_train() to a new callable.

set_policy_mapping_fn

Sets self.policy_mapping_fn to a new callable (if provided).

set_weights

Sets each policies' model weights of this worker.

setup_torch_data_parallel

Join a torch process group for distributed SGD.

stop

Releases all resources used by this RolloutWorker.

sync_filters

Changes self's filter to given and rebases any accumulated delta.

unlock

Unlocks this RolloutWorker via its own threading.Lock.