AlgorithmConfig.rollouts(*, env_runner_cls: Optional[type] = <ray.rllib.utils.from_config._NotProvided object>, num_rollout_workers: Optional[int] = <ray.rllib.utils.from_config._NotProvided object>, num_envs_per_worker: Optional[int] = <ray.rllib.utils.from_config._NotProvided object>, create_env_on_local_worker: Optional[bool] = <ray.rllib.utils.from_config._NotProvided object>, sample_collector: Optional[Type[ray.rllib.evaluation.collectors.sample_collector.SampleCollector]] = <ray.rllib.utils.from_config._NotProvided object>, sample_async: Optional[bool] = <ray.rllib.utils.from_config._NotProvided object>, enable_connectors: Optional[bool] = <ray.rllib.utils.from_config._NotProvided object>, use_worker_filter_stats: Optional[bool] = <ray.rllib.utils.from_config._NotProvided object>, update_worker_filter_stats: Optional[bool] = <ray.rllib.utils.from_config._NotProvided object>, rollout_fragment_length: Optional[Union[int, str]] = <ray.rllib.utils.from_config._NotProvided object>, batch_mode: Optional[str] = <ray.rllib.utils.from_config._NotProvided object>, remote_worker_envs: Optional[bool] = <ray.rllib.utils.from_config._NotProvided object>, remote_env_batch_wait_ms: Optional[float] = <ray.rllib.utils.from_config._NotProvided object>, validate_workers_after_construction: Optional[bool] = <ray.rllib.utils.from_config._NotProvided object>, preprocessor_pref: Optional[str] = <ray.rllib.utils.from_config._NotProvided object>, observation_filter: Optional[str] = <ray.rllib.utils.from_config._NotProvided object>, compress_observations: Optional[bool] = <ray.rllib.utils.from_config._NotProvided object>, enable_tf1_exec_eagerly: Optional[bool] = <ray.rllib.utils.from_config._NotProvided object>, sampler_perf_stats_ema_coef: Optional[float] = <ray.rllib.utils.from_config._NotProvided object>, ignore_worker_failures=-1, recreate_failed_workers=-1, restart_failed_sub_environments=-1, num_consecutive_worker_failures_tolerance=-1, worker_health_probe_timeout_s=-1, worker_restore_timeout_s=-1, synchronize_filter=-1) ray.rllib.algorithms.algorithm_config.AlgorithmConfig[source]#

Sets the rollout worker configuration.

  • env_runner_cls – The EnvRunner class to use for environment rollouts (data collection).

  • num_rollout_workers – Number of rollout worker actors to create for parallel sampling. Setting this to 0 will force rollouts to be done in the local worker (driver process or the Algorithm’s actor when using Tune).

  • num_envs_per_worker – Number of environments to evaluate vector-wise per worker. This enables model inference batching, which can improve performance for inference bottlenecked workloads.

  • sample_collector – The SampleCollector class to be used to collect and retrieve environment-, model-, and sampler data. Override the SampleCollector base class to implement your own collection/buffering/retrieval logic.

  • create_env_on_local_worker – When num_rollout_workers > 0, the driver (local_worker; worker-idx=0) does not need an environment. This is because it doesn’t have to sample (done by remote_workers; worker_indices > 0) nor evaluate (done by evaluation workers; see below).

  • sample_async – Use a background thread for sampling (slightly off-policy, usually not advisable to turn on unless your env specifically requires it).

  • enable_connectors – Use connector based environment runner, so that all preprocessing of obs and postprocessing of actions are done in agent and action connectors.

  • use_worker_filter_stats – Whether to use the workers in the WorkerSet to update the central filters (held by the local worker). If False, stats from the workers will not be used and discarded.

  • update_worker_filter_stats – Whether to push filter updates from the central filters (held by the local worker) to the remote workers’ filters. Setting this to True might be useful within the evaluation config in order to disable the usage of evaluation trajectories for synching the central filter (used for training).

  • rollout_fragment_length – Divide episodes into fragments of this many steps each during rollouts. Trajectories of this size are collected from rollout workers and combined into a larger batch of train_batch_size for learning. For example, given rollout_fragment_length=100 and train_batch_size=1000: 1. RLlib collects 10 fragments of 100 steps each from rollout workers. 2. These fragments are concatenated and we perform an epoch of SGD. When using multiple envs per worker, the fragment size is multiplied by num_envs_per_worker. This is since we are collecting steps from multiple envs in parallel. For example, if num_envs_per_worker=5, then rollout workers will return experiences in chunks of 5*100 = 500 steps. The dataflow here can vary per algorithm. For example, PPO further divides the train batch into minibatches for multi-epoch SGD. Set to “auto” to have RLlib compute an exact rollout_fragment_length to match the given batch size.

  • batch_mode – How to build individual batches with the EnvRunner(s). Batches coming from distributed EnvRunners are usually concat’d to form the train batch. Note that “steps” below can mean different things (either env- or agent-steps) and depends on the count_steps_by setting, adjustable via AlgorithmConfig.multi_agent(count_steps_by=..): 1) “truncate_episodes”: Each call to EnvRunner.sample() will return a batch of at most rollout_fragment_length * num_envs_per_worker in size. The batch will be exactly rollout_fragment_length * num_envs in size if postprocessing does not change batch sizes. Episodes may be truncated in order to meet this size requirement. This mode guarantees evenly sized batches, but increases variance as the future return must now be estimated at truncation boundaries. 2) “complete_episodes”: Each call to EnvRunner.sample() will return a batch of at least rollout_fragment_length * num_envs_per_worker in size. Episodes will not be truncated, but multiple episodes may be packed within one batch to meet the (minimum) batch size. Note that when num_envs_per_worker > 1, episode steps will be buffered until the episode completes, and hence batches may contain significant amounts of off-policy data.

  • remote_worker_envs – If using num_envs_per_worker > 1, whether to create those new envs in remote processes instead of in the same worker. This adds overheads, but can make sense if your envs can take much time to step / reset (e.g., for StarCraft). Use this cautiously; overheads are significant.

  • remote_env_batch_wait_ms – Timeout that remote workers are waiting when polling environments. 0 (continue when at least one env is ready) is a reasonable default, but optimal value could be obtained by measuring your environment step / reset and model inference perf.

  • validate_workers_after_construction – Whether to validate that each created remote worker is healthy after its construction process.

  • preprocessor_pref – Whether to use “rllib” or “deepmind” preprocessors by default. Set to None for using no preprocessor. In this case, the model will have to handle possibly complex observations from the environment.

  • observation_filter – Element-wise observation filter, either “NoFilter” or “MeanStdFilter”.

  • compress_observations – Whether to LZ4 compress individual observations in the SampleBatches collected during rollouts.

  • enable_tf1_exec_eagerly – Explicitly tells the rollout worker to enable TF eager execution. This is useful for example when framework is “torch”, but a TF2 policy needs to be restored for evaluation or league-based purposes.

  • sampler_perf_stats_ema_coef – If specified, perf stats are in EMAs. This is the coeff of how much new data points contribute to the averages. Default is None, which uses simple global average instead. The EMA update rule is: updated = (1 - ema_coef) * old + ema_coef * new


This updated AlgorithmConfig object.