ray.rllib.algorithms.algorithm_config.AlgorithmConfig.env_runners#

AlgorithmConfig.env_runners(*, env_runner_cls: type | None = <ray.rllib.utils.from_config._NotProvided object>, num_env_runners: int | None = <ray.rllib.utils.from_config._NotProvided object>, num_envs_per_env_runner: int | None = <ray.rllib.utils.from_config._NotProvided object>, num_cpus_per_env_runner: int | None = <ray.rllib.utils.from_config._NotProvided object>, num_gpus_per_env_runner: int | float | None = <ray.rllib.utils.from_config._NotProvided object>, custom_resources_per_env_runner: dict | None = <ray.rllib.utils.from_config._NotProvided object>, validate_env_runners_after_construction: bool | None = <ray.rllib.utils.from_config._NotProvided object>, sample_timeout_s: float | None = <ray.rllib.utils.from_config._NotProvided object>, max_requests_in_flight_per_env_runner: int | None = <ray.rllib.utils.from_config._NotProvided object>, env_to_module_connector: ~typing.Callable[[~typing.Any | gymnasium.Env], ConnectorV2 | ~typing.List[ConnectorV2]] | None = <ray.rllib.utils.from_config._NotProvided object>, module_to_env_connector: ~typing.Callable[[~typing.Any | gymnasium.Env, RLModule], ConnectorV2 | ~typing.List[ConnectorV2]] | None = <ray.rllib.utils.from_config._NotProvided object>, add_default_connectors_to_env_to_module_pipeline: bool | None = <ray.rllib.utils.from_config._NotProvided object>, add_default_connectors_to_module_to_env_pipeline: bool | None = <ray.rllib.utils.from_config._NotProvided object>, episode_lookback_horizon: int | None = <ray.rllib.utils.from_config._NotProvided object>, use_worker_filter_stats: bool | None = <ray.rllib.utils.from_config._NotProvided object>, update_worker_filter_stats: bool | None = <ray.rllib.utils.from_config._NotProvided object>, compress_observations: bool | None = <ray.rllib.utils.from_config._NotProvided object>, rollout_fragment_length: int | str | None = <ray.rllib.utils.from_config._NotProvided object>, batch_mode: str | None = <ray.rllib.utils.from_config._NotProvided object>, explore: bool | None = <ray.rllib.utils.from_config._NotProvided object>, exploration_config: dict | None = <ray.rllib.utils.from_config._NotProvided object>, create_env_on_local_worker: bool | None = <ray.rllib.utils.from_config._NotProvided object>, sample_collector: ~typing.Type[~ray.rllib.evaluation.collectors.sample_collector.SampleCollector] | None = <ray.rllib.utils.from_config._NotProvided object>, remote_worker_envs: bool | None = <ray.rllib.utils.from_config._NotProvided object>, remote_env_batch_wait_ms: float | None = <ray.rllib.utils.from_config._NotProvided object>, preprocessor_pref: str | None = <ray.rllib.utils.from_config._NotProvided object>, observation_filter: str | None = <ray.rllib.utils.from_config._NotProvided object>, enable_tf1_exec_eagerly: bool | None = <ray.rllib.utils.from_config._NotProvided object>, sampler_perf_stats_ema_coef: float | None = <ray.rllib.utils.from_config._NotProvided object>, num_rollout_workers=-1, num_envs_per_worker=-1, validate_workers_after_construction=-1, ignore_worker_failures=-1, recreate_failed_workers=-1, restart_failed_sub_environments=-1, num_consecutive_worker_failures_tolerance=-1, worker_health_probe_timeout_s=-1, worker_restore_timeout_s=-1, synchronize_filter=-1, enable_connectors=-1) AlgorithmConfig[source]#

Sets the rollout worker configuration.

Parameters:
  • env_runner_cls – The EnvRunner class to use for environment rollouts (data collection).

  • num_env_runners – Number of EnvRunner actors to create for parallel sampling. Setting this to 0 forces sampling to be done in the local EnvRunner (main process or the Algorithm’s actor when using Tune).

  • num_envs_per_env_runner – Number of environments to step through (vector-wise) per EnvRunner. This enables batching when computing actions through RLModule inference, which can improve performance for inference-bottlenecked workloads.

  • num_cpus_per_env_runner – Number of CPUs to allocate per EnvRunner.

  • num_gpus_per_env_runner – Number of GPUs to allocate per EnvRunner. This can be fractional. This is usually needed only if your env itself requires a GPU (i.e., it is a GPU-intensive video game), or model inference is unusually expensive.

  • custom_resources_per_env_runner – Any custom Ray resources to allocate per EnvRunner.

  • sample_timeout_s – The timeout in seconds for calling sample() on remote EnvRunner workers. Results (episode list) from workers that take longer than this time are discarded. Only used by algorithms that sample synchronously in turn with their update step (e.g., PPO or DQN). Not relevant for any algos that sample asynchronously, such as APPO or IMPALA.

  • max_requests_in_flight_per_env_runner – Max number of inflight requests to each EnvRunner worker. See the FaultTolerantActorManager class for more details. Tuning these values is important when running experiments with large sample batches, where there is the risk that the object store may fill up, causing spilling of objects to disk. This can cause any asynchronous requests to become very slow, making your experiment run slowly as well. You can inspect the object store during your experiment via a call to Ray memory on your head node, and by using the Ray dashboard. If you’re seeing that the object store is filling up, turn down the number of remote requests in flight or enable compression.

  • sample_collector – For the old API stack only. The SampleCollector class to be used to collect and retrieve environment-, model-, and sampler data. Override the SampleCollector base class to implement your own collection/buffering/retrieval logic.

  • create_env_on_local_worker – When num_env_runners > 0, the driver (local_worker; worker-idx=0) does not need an environment. This is because it doesn’t have to sample (done by remote_workers; worker_indices > 0) nor evaluate (done by evaluation workers; see below).

  • env_to_module_connector – A callable taking an Env as input arg and returning an env-to-module ConnectorV2 (might be a pipeline) object.

  • module_to_env_connector – A callable taking an Env and an RLModule as input args and returning a module-to-env ConnectorV2 (might be a pipeline) object.

  • add_default_connectors_to_env_to_module_pipeline – If True (default), RLlib’s EnvRunners automatically add the default env-to-module ConnectorV2 pieces to the EnvToModulePipeline. These automatically perform adding observations and states (in case of stateful Module(s)), agent-to-module mapping, batching, and conversion to tensor data. Only if you know exactly what you are doing, you should set this setting to False. Note that this setting is only relevant if the new API stack is used (including the new EnvRunner classes).

  • add_default_connectors_to_module_to_env_pipeline – If True (default), RLlib’s EnvRunners automatically add the default module-to-env ConnectorV2 pieces to the ModuleToEnvPipeline. These automatically perform removing the additional time-rank (if applicable, in case of stateful Module(s)), module-to-agent unmapping, un-batching (to lists), and conversion from tensor data to numpy. Only if you know exactly what you are doing, you should set this setting to False. Note that this setting is only relevant if the new API stack is used (including the new EnvRunner classes).

  • episode_lookback_horizon – The amount of data (in timesteps) to keep from the preceeding episode chunk when a new chunk (for the same episode) is generated to continue sampling at a later time. The larger this value, the more an env-to-module connector can look back in time and compile RLModule input data from this information. For example, if your custom env-to-module connector (and your custom RLModule) requires the previous 10 rewards as inputs, you must set this to at least 10.

  • use_worker_filter_stats – Whether to use the workers in the EnvRunnerGroup to update the central filters (held by the local worker). If False, stats from the workers aren’t used and are discarded.

  • update_worker_filter_stats – Whether to push filter updates from the central filters (held by the local worker) to the remote workers’ filters. Setting this to True might be useful within the evaluation config in order to disable the usage of evaluation trajectories for synching the central filter (used for training).

  • rollout_fragment_length – Divide episodes into fragments of this many steps each during sampling. Trajectories of this size are collected from EnvRunners and combined into a larger batch of train_batch_size for learning. For example, given rollout_fragment_length=100 and train_batch_size=1000: 1. RLlib collects 10 fragments of 100 steps each from rollout workers. 2. These fragments are concatenated and we perform an epoch of SGD. When using multiple envs per worker, the fragment size is multiplied by num_envs_per_env_runner. This is since we are collecting steps from multiple envs in parallel. For example, if num_envs_per_env_runner=5, then EnvRunners return experiences in chunks of 5*100 = 500 steps. The dataflow here can vary per algorithm. For example, PPO further divides the train batch into minibatches for multi-epoch SGD. Set rollout_fragment_length to “auto” to have RLlib compute an exact value to match the given batch size.

  • batch_mode – How to build individual batches with the EnvRunner(s). Batches coming from distributed EnvRunners are usually concat’d to form the train batch. Note that “steps” below can mean different things (either env- or agent-steps) and depends on the count_steps_by setting, adjustable via AlgorithmConfig.multi_agent(count_steps_by=..): 1) “truncate_episodes”: Each call to EnvRunner.sample() returns a batch of at most rollout_fragment_length * num_envs_per_env_runner in size. The batch is exactly rollout_fragment_length * num_envs in size if postprocessing does not change batch sizes. Episodes may be truncated in order to meet this size requirement. This mode guarantees evenly sized batches, but increases variance as the future return must now be estimated at truncation boundaries. 2) “complete_episodes”: Each call to EnvRunner.sample() returns a batch of at least rollout_fragment_length * num_envs_per_env_runner in size. Episodes aren’t truncated, but multiple episodes may be packed within one batch to meet the (minimum) batch size. Note that when num_envs_per_env_runner > 1, episode steps are buffered until the episode completes, and hence batches may contain significant amounts of off-policy data.

  • explore – Default exploration behavior, iff explore=None is passed into compute_action(s). Set to False for no exploration behavior (e.g., for evaluation).

  • exploration_config – A dict specifying the Exploration object’s config.

  • remote_worker_envs – If using num_envs_per_env_runner > 1, whether to create those new envs in remote processes instead of in the same worker. This adds overheads, but can make sense if your envs can take much time to step / reset (e.g., for StarCraft). Use this cautiously; overheads are significant.

  • remote_env_batch_wait_ms – Timeout that remote workers are waiting when polling environments. 0 (continue when at least one env is ready) is a reasonable default, but optimal value could be obtained by measuring your environment step / reset and model inference perf.

  • validate_env_runners_after_construction – Whether to validate that each created remote EnvRunner is healthy after its construction process.

  • preprocessor_pref – Whether to use “rllib” or “deepmind” preprocessors by default. Set to None for using no preprocessor. In this case, the model has to handle possibly complex observations from the environment.

  • observation_filter – Element-wise observation filter, either “NoFilter” or “MeanStdFilter”.

  • compress_observations – Whether to LZ4 compress individual observations in the SampleBatches collected during rollouts.

  • enable_tf1_exec_eagerly – Explicitly tells the rollout worker to enable TF eager execution. This is useful for example when framework is “torch”, but a TF2 policy needs to be restored for evaluation or league-based purposes.

  • sampler_perf_stats_ema_coef – If specified, perf stats are in EMAs. This is the coeff of how much new data points contribute to the averages. Default is None, which uses simple global average instead. The EMA update rule is: updated = (1 - ema_coef) * old + ema_coef * new

Returns:

This updated AlgorithmConfig object.