ray.rllib.algorithms.algorithm_config.AlgorithmConfig.rollouts#

AlgorithmConfig.rollouts(*, env_runner_cls: type | None = <ray.rllib.utils.from_config._NotProvided object>, num_rollout_workers: int | None = <ray.rllib.utils.from_config._NotProvided object>, num_envs_per_worker: int | None = <ray.rllib.utils.from_config._NotProvided object>, sample_timeout_s: float | None = <ray.rllib.utils.from_config._NotProvided object>, create_env_on_local_worker: bool | None = <ray.rllib.utils.from_config._NotProvided object>, sample_collector: ~typing.Type[~ray.rllib.evaluation.collectors.sample_collector.SampleCollector] | None = <ray.rllib.utils.from_config._NotProvided object>, enable_connectors: bool | None = <ray.rllib.utils.from_config._NotProvided object>, env_to_module_connector: ~typing.Callable[[~typing.Any | gymnasium.Env], ConnectorV2 | ~typing.List[ConnectorV2]] | None = <ray.rllib.utils.from_config._NotProvided object>, module_to_env_connector: ~typing.Callable[[~typing.Any | gymnasium.Env, RLModule], ConnectorV2 | ~typing.List[ConnectorV2]] | None = <ray.rllib.utils.from_config._NotProvided object>, add_default_connectors_to_env_to_module_pipeline: bool | None = <ray.rllib.utils.from_config._NotProvided object>, add_default_connectors_to_module_to_env_pipeline: bool | None = <ray.rllib.utils.from_config._NotProvided object>, episode_lookback_horizon: int | None = <ray.rllib.utils.from_config._NotProvided object>, use_worker_filter_stats: bool | None = <ray.rllib.utils.from_config._NotProvided object>, update_worker_filter_stats: bool | None = <ray.rllib.utils.from_config._NotProvided object>, rollout_fragment_length: int | str | None = <ray.rllib.utils.from_config._NotProvided object>, batch_mode: str | None = <ray.rllib.utils.from_config._NotProvided object>, remote_worker_envs: bool | None = <ray.rllib.utils.from_config._NotProvided object>, remote_env_batch_wait_ms: float | None = <ray.rllib.utils.from_config._NotProvided object>, validate_workers_after_construction: bool | None = <ray.rllib.utils.from_config._NotProvided object>, preprocessor_pref: str | None = <ray.rllib.utils.from_config._NotProvided object>, observation_filter: str | None = <ray.rllib.utils.from_config._NotProvided object>, compress_observations: bool | None = <ray.rllib.utils.from_config._NotProvided object>, enable_tf1_exec_eagerly: bool | None = <ray.rllib.utils.from_config._NotProvided object>, sampler_perf_stats_ema_coef: float | None = <ray.rllib.utils.from_config._NotProvided object>, ignore_worker_failures=-1, recreate_failed_workers=-1, restart_failed_sub_environments=-1, num_consecutive_worker_failures_tolerance=-1, worker_health_probe_timeout_s=-1, worker_restore_timeout_s=-1, synchronize_filter=-1) AlgorithmConfig[source]#

Sets the rollout worker configuration.

Parameters:
  • env_runner_cls – The EnvRunner class to use for environment rollouts (data collection).

  • num_rollout_workers – Number of rollout worker actors to create for parallel sampling. Setting this to 0 will force rollouts to be done in the local worker (driver process or the Algorithm’s actor when using Tune).

  • num_envs_per_worker – Number of environments to evaluate vector-wise per worker. This enables model inference batching, which can improve performance for inference bottlenecked workloads.

  • sample_timeout_s – The timeout in seconds for calling sample() on remote EnvRunner workers. Results (episode list) from workers that take longer than this time are discarded. Only used by algorithms that sample synchronously in turn with their update step (e.g. PPO or DQN). Not relevant for any algos that sample asynchronously, such as APPO or IMPALA.

  • sample_collector – For the old API stack only. The SampleCollector class to be used to collect and retrieve environment-, model-, and sampler data. Override the SampleCollector base class to implement your own collection/buffering/retrieval logic.

  • create_env_on_local_worker – When num_rollout_workers > 0, the driver (local_worker; worker-idx=0) does not need an environment. This is because it doesn’t have to sample (done by remote_workers; worker_indices > 0) nor evaluate (done by evaluation workers; see below).

  • enable_connectors – Use connector based environment runner, so that all preprocessing of obs and postprocessing of actions are done in agent and action connectors.

  • env_to_module_connector – A callable taking an Env as input arg and returning an env-to-module ConnectorV2 (might be a pipeline) object.

  • module_to_env_connector – A callable taking an Env and an RLModule as input args and returning a module-to-env ConnectorV2 (might be a pipeline) object.

  • add_default_connectors_to_env_to_module_pipeline – If True (default), RLlib’s EnvRunners will automatically add the default env-to-module ConnectorV2 pieces to the EnvToModulePipeline. These automatically perform adding observations and states (in case of stateful Module(s)), agent-to-module mapping, batching, and conversion to tensor data. Only if you know exactly what you are doing, you should set this setting to False. Note that this setting is only relevant if the new API stack is used (including the new EnvRunner classes).

  • add_default_connectors_to_module_to_env_pipeline – If True (default), RLlib’s EnvRunners will automatically add the default module-to-env ConnectorV2 pieces to the ModuleToEnvPipeline. These automatically perform removing the additional time-rank (if applicable, in case of stateful Module(s)), module-to-agent unmapping, un-batching (to lists), and conversion from tensor data to numpy. Only if you know exactly what you are doing, you should set this setting to False. Note that this setting is only relevant if the new API stack is used (including the new EnvRunner classes).

  • episode_lookback_horizon – The amount of data (in timesteps) to keep from the preceeding episode chunk when a new chunk (for the same episode) is generated to continue sampling at a later time. The larger this value, the more an env-to-module connector will be able to look back in time and compile RLModule input data from this information. For example, if your custom env-to-module connector (and your custom RLModule) requires the previous 10 rewards as inputs, you must set this to at least 10.

  • use_worker_filter_stats – Whether to use the workers in the WorkerSet to update the central filters (held by the local worker). If False, stats from the workers will not be used and discarded.

  • update_worker_filter_stats – Whether to push filter updates from the central filters (held by the local worker) to the remote workers’ filters. Setting this to True might be useful within the evaluation config in order to disable the usage of evaluation trajectories for synching the central filter (used for training).

  • rollout_fragment_length – Divide episodes into fragments of this many steps each during sampling. Trajectories of this size are collected from EnvRunners and combined into a larger batch of train_batch_size for learning. For example, given rollout_fragment_length=100 and train_batch_size=1000: 1. RLlib collects 10 fragments of 100 steps each from rollout workers. 2. These fragments are concatenated and we perform an epoch of SGD. When using multiple envs per worker, the fragment size is multiplied by num_envs_per_worker. This is since we are collecting steps from multiple envs in parallel. For example, if num_envs_per_worker=5, then EnvRunners will return experiences in chunks of 5*100 = 500 steps. The dataflow here can vary per algorithm. For example, PPO further divides the train batch into minibatches for multi-epoch SGD. Set rollout_fragment_length to “auto” to have RLlib compute an exact value to match the given batch size.

  • batch_mode – How to build individual batches with the EnvRunner(s). Batches coming from distributed EnvRunners are usually concat’d to form the train batch. Note that “steps” below can mean different things (either env- or agent-steps) and depends on the count_steps_by setting, adjustable via AlgorithmConfig.multi_agent(count_steps_by=..): 1) “truncate_episodes”: Each call to EnvRunner.sample() will return a batch of at most rollout_fragment_length * num_envs_per_worker in size. The batch will be exactly rollout_fragment_length * num_envs in size if postprocessing does not change batch sizes. Episodes may be truncated in order to meet this size requirement. This mode guarantees evenly sized batches, but increases variance as the future return must now be estimated at truncation boundaries. 2) “complete_episodes”: Each call to EnvRunner.sample() will return a batch of at least rollout_fragment_length * num_envs_per_worker in size. Episodes will not be truncated, but multiple episodes may be packed within one batch to meet the (minimum) batch size. Note that when num_envs_per_worker > 1, episode steps will be buffered until the episode completes, and hence batches may contain significant amounts of off-policy data.

  • remote_worker_envs – If using num_envs_per_worker > 1, whether to create those new envs in remote processes instead of in the same worker. This adds overheads, but can make sense if your envs can take much time to step / reset (e.g., for StarCraft). Use this cautiously; overheads are significant.

  • remote_env_batch_wait_ms – Timeout that remote workers are waiting when polling environments. 0 (continue when at least one env is ready) is a reasonable default, but optimal value could be obtained by measuring your environment step / reset and model inference perf.

  • validate_workers_after_construction – Whether to validate that each created remote worker is healthy after its construction process.

  • preprocessor_pref – Whether to use “rllib” or “deepmind” preprocessors by default. Set to None for using no preprocessor. In this case, the model will have to handle possibly complex observations from the environment.

  • observation_filter – Element-wise observation filter, either “NoFilter” or “MeanStdFilter”.

  • compress_observations – Whether to LZ4 compress individual observations in the SampleBatches collected during rollouts.

  • enable_tf1_exec_eagerly – Explicitly tells the rollout worker to enable TF eager execution. This is useful for example when framework is “torch”, but a TF2 policy needs to be restored for evaluation or league-based purposes.

  • sampler_perf_stats_ema_coef – If specified, perf stats are in EMAs. This is the coeff of how much new data points contribute to the averages. Default is None, which uses simple global average instead. The EMA update rule is: updated = (1 - ema_coef) * old + ema_coef * new

Returns:

This updated AlgorithmConfig object.