Environment Samplers

When a simulator (environment) is available, InputReader - are used to collect and return experiences from the envs. For more details on InputReader used for offline RL (e.g. reading files of pre-recorded data), see the offline RL API reference here.

The base sampler API (SamplerInput) is defined as follows:

Base Sampler class (ray.rllib.evaluation.sampler.SamplerInput)

class ray.rllib.evaluation.sampler.SamplerInput[source]

Reads input experiences from an existing sampler.

next() Union[SampleBatch, MultiAgentBatch][source]

Returns the next batch of read experiences.

Returns

The experience read (SampleBatch or MultiAgentBatch).

abstract get_data() Union[SampleBatch, MultiAgentBatch][source]

Called by self.next() to return the next batch of data.

Override this in child classes.

Returns

The next batch of data.

abstract get_metrics() List[ray.rllib.evaluation.metrics.RolloutMetrics][source]

Returns list of episode metrics since the last call to this method.

The list will contain one RolloutMetrics object per completed episode.

Returns

List of RolloutMetrics objects, one per completed episode since the last call to this method.

abstract get_extra_batches() List[Union[SampleBatch, MultiAgentBatch]][source]

Returns list of extra batches since the last call to this method.

The list will contain all SampleBatches or MultiAgentBatches that the user has provided thus-far. Users can add these “extra batches” to an episode by calling the episode’s add_extra_batch([SampleBatchType]) method. This can be done from inside an overridden Policy.compute_actions_from_input_dict(…, episodes) or from a custom callback’s on_episode_[start|step|end]() methods.

Returns

List of SamplesBatches or MultiAgentBatches provided thus-far by the user since the last call to this method.

SyncSampler (ray.rllib.evaluation.sampler.SyncSampler)

The synchronous sampler starts stepping through and collecting samples from an environment only when its next() method is called. Calling this method blocks until a SampleBatch has been built and is returned.

class ray.rllib.evaluation.sampler.SyncSampler(*, worker: RolloutWorker, env: ray.rllib.env.base_env.BaseEnv, clip_rewards: Union[bool, float], rollout_fragment_length: int, count_steps_by: str = 'env_steps', callbacks: DefaultCallbacks, horizon: int = None, multiple_episodes_in_batch: bool = False, normalize_actions: bool = True, clip_actions: bool = False, soft_horizon: bool = False, no_done_at_end: bool = False, observation_fn: Optional[ObservationFunction] = None, sample_collector_class: Optional[Type[ray.rllib.evaluation.collectors.sample_collector.SampleCollector]] = None, render: bool = False, policies=None, policy_mapping_fn=None, preprocessors=None, obs_filters=None, tf_sess=None)[source]

Sync SamplerInput that collects experiences when get_data() is called.

__init__(*, worker: RolloutWorker, env: ray.rllib.env.base_env.BaseEnv, clip_rewards: Union[bool, float], rollout_fragment_length: int, count_steps_by: str = 'env_steps', callbacks: DefaultCallbacks, horizon: int = None, multiple_episodes_in_batch: bool = False, normalize_actions: bool = True, clip_actions: bool = False, soft_horizon: bool = False, no_done_at_end: bool = False, observation_fn: Optional[ObservationFunction] = None, sample_collector_class: Optional[Type[ray.rllib.evaluation.collectors.sample_collector.SampleCollector]] = None, render: bool = False, policies=None, policy_mapping_fn=None, preprocessors=None, obs_filters=None, tf_sess=None)[source]

Initializes a SyncSampler instance.

Parameters
  • worker – The RolloutWorker that will use this Sampler for sampling.

  • env – Any Env object. Will be converted into an RLlib BaseEnv.

  • clip_rewards – True for +/-1.0 clipping, actual float value for +/- value clipping. False for no clipping.

  • rollout_fragment_length – The length of a fragment to collect before building a SampleBatch from the data and resetting the SampleBatchBuilder object.

  • count_steps_by – One of “env_steps” (default) or “agent_steps”. Use “agent_steps”, if you want rollout lengths to be counted by individual agent steps. In a multi-agent env, a single env_step contains one or more agent_steps, depending on how many agents are present at any given time in the ongoing episode.

  • callbacks – The Callbacks object to use when episode events happen during rollout.

  • horizon – Hard-reset the Env after this many timesteps.

  • multiple_episodes_in_batch – Whether to pack multiple episodes into each batch. This guarantees batches will be exactly rollout_fragment_length in size.

  • normalize_actions – Whether to normalize actions to the action space’s bounds.

  • clip_actions – Whether to clip actions according to the given action_space’s bounds.

  • soft_horizon – If True, calculate bootstrapped values as if episode had ended, but don’t physically reset the environment when the horizon is hit.

  • no_done_at_end – Ignore the done=True at the end of the episode and instead record done=False.

  • observation_fn – Optional multi-agent observation func to use for preprocessing observations.

  • sample_collector_class – An optional Samplecollector sub-class to use to collect, store, and retrieve environment-, model-, and sampler data.

  • render – Whether to try to render the environment after each step.

get_data() Union[SampleBatch, MultiAgentBatch][source]

Called by self.next() to return the next batch of data.

Override this in child classes.

Returns

The next batch of data.

get_metrics() List[ray.rllib.evaluation.metrics.RolloutMetrics][source]

Returns list of episode metrics since the last call to this method.

The list will contain one RolloutMetrics object per completed episode.

Returns

List of RolloutMetrics objects, one per completed episode since the last call to this method.

get_extra_batches() List[Union[SampleBatch, MultiAgentBatch]][source]

Returns list of extra batches since the last call to this method.

The list will contain all SampleBatches or MultiAgentBatches that the user has provided thus-far. Users can add these “extra batches” to an episode by calling the episode’s add_extra_batch([SampleBatchType]) method. This can be done from inside an overridden Policy.compute_actions_from_input_dict(…, episodes) or from a custom callback’s on_episode_[start|step|end]() methods.

Returns

List of SamplesBatches or MultiAgentBatches provided thus-far by the user since the last call to this method.

AsyncSampler (ray.rllib.evaluation.sampler.AsyncSampler)

The asynchronous sampler has a separate thread that keeps stepping through and collecting samples from an environment in the background. Calling its next() method gets the next enqueued SampleBatch from a queue and returns it immediately.

class ray.rllib.evaluation.sampler.AsyncSampler(*, worker: RolloutWorker, env: ray.rllib.env.base_env.BaseEnv, clip_rewards: Union[bool, float], rollout_fragment_length: int, count_steps_by: str = 'env_steps', callbacks: DefaultCallbacks, horizon: Optional[int] = None, multiple_episodes_in_batch: bool = False, normalize_actions: bool = True, clip_actions: bool = False, soft_horizon: bool = False, no_done_at_end: bool = False, observation_fn: Optional[ObservationFunction] = None, sample_collector_class: Optional[Type[ray.rllib.evaluation.collectors.sample_collector.SampleCollector]] = None, render: bool = False, blackhole_outputs: bool = False, policies=None, policy_mapping_fn=None, preprocessors=None, obs_filters=None, tf_sess=None)[source]

Async SamplerInput that collects experiences in thread and queues them.

Once started, experiences are continuously collected in the background and put into a Queue, from where they can be unqueued by the caller of get_data().

__init__(*, worker: RolloutWorker, env: ray.rllib.env.base_env.BaseEnv, clip_rewards: Union[bool, float], rollout_fragment_length: int, count_steps_by: str = 'env_steps', callbacks: DefaultCallbacks, horizon: Optional[int] = None, multiple_episodes_in_batch: bool = False, normalize_actions: bool = True, clip_actions: bool = False, soft_horizon: bool = False, no_done_at_end: bool = False, observation_fn: Optional[ObservationFunction] = None, sample_collector_class: Optional[Type[ray.rllib.evaluation.collectors.sample_collector.SampleCollector]] = None, render: bool = False, blackhole_outputs: bool = False, policies=None, policy_mapping_fn=None, preprocessors=None, obs_filters=None, tf_sess=None)[source]

Initializes an AsyncSampler instance.

Parameters
  • worker – The RolloutWorker that will use this Sampler for sampling.

  • env – Any Env object. Will be converted into an RLlib BaseEnv.

  • clip_rewards – True for +/-1.0 clipping, actual float value for +/- value clipping. False for no clipping.

  • rollout_fragment_length – The length of a fragment to collect before building a SampleBatch from the data and resetting the SampleBatchBuilder object.

  • count_steps_by – One of “env_steps” (default) or “agent_steps”. Use “agent_steps”, if you want rollout lengths to be counted by individual agent steps. In a multi-agent env, a single env_step contains one or more agent_steps, depending on how many agents are present at any given time in the ongoing episode.

  • horizon – Hard-reset the Env after this many timesteps.

  • multiple_episodes_in_batch – Whether to pack multiple episodes into each batch. This guarantees batches will be exactly rollout_fragment_length in size.

  • normalize_actions – Whether to normalize actions to the action space’s bounds.

  • clip_actions – Whether to clip actions according to the given action_space’s bounds.

  • blackhole_outputs – Whether to collect samples, but then not further process or store them (throw away all samples).

  • soft_horizon – If True, calculate bootstrapped values as if episode had ended, but don’t physically reset the environment when the horizon is hit.

  • no_done_at_end – Ignore the done=True at the end of the episode and instead record done=False.

  • observation_fn – Optional multi-agent observation func to use for preprocessing observations.

  • sample_collector_class – An optional SampleCollector sub-class to use to collect, store, and retrieve environment-, model-, and sampler data.

  • render – Whether to try to render the environment after each step.

run()[source]

Method representing the thread’s activity.

You may override this method in a subclass. The standard run() method invokes the callable object passed to the object’s constructor as the target argument, if any, with sequential and keyword arguments taken from the args and kwargs arguments, respectively.

get_data() Union[SampleBatch, MultiAgentBatch][source]

Called by self.next() to return the next batch of data.

Override this in child classes.

Returns

The next batch of data.

get_metrics() List[ray.rllib.evaluation.metrics.RolloutMetrics][source]

Returns list of episode metrics since the last call to this method.

The list will contain one RolloutMetrics object per completed episode.

Returns

List of RolloutMetrics objects, one per completed episode since the last call to this method.

get_extra_batches() List[Union[SampleBatch, MultiAgentBatch]][source]

Returns list of extra batches since the last call to this method.

The list will contain all SampleBatches or MultiAgentBatches that the user has provided thus-far. Users can add these “extra batches” to an episode by calling the episode’s add_extra_batch([SampleBatchType]) method. This can be done from inside an overridden Policy.compute_actions_from_input_dict(…, episodes) or from a custom callback’s on_episode_[start|step|end]() methods.

Returns

List of SamplesBatches or MultiAgentBatches provided thus-far by the user since the last call to this method.