ray.rllib.env.multi_agent_episode.MultiAgentEpisode#

class ray.rllib.env.multi_agent_episode.MultiAgentEpisode(id_: str | None = None, *, observations: List[Dict[Hashable, Any]] | None = None, observation_space: gymnasium.Space | None = None, infos: List[Dict[Hashable, Any]] | None = None, actions: List[Dict[Hashable, Any]] | None = None, action_space: gymnasium.Space | None = None, rewards: List[Dict[Hashable, Any]] | None = None, terminateds: Dict[Hashable, Any] | bool = False, truncateds: Dict[Hashable, Any] | bool = False, extra_model_outputs: List[Dict[Hashable, Any]] | None = None, env_t_started: int | None = None, agent_t_started: Dict[Hashable, int] | None = None, len_lookback_buffer: int | str = 'auto', agent_episode_ids: Dict[Hashable, str] | None = None, agent_module_ids: Dict[Hashable, str] | None = None, agent_to_module_mapping_fn: Callable[[Hashable, MultiAgentEpisode], str] | None = None)[source]#

Stores multi-agent episode data.

The central attribute of the class is the timestep mapping self.env_t_to_agent_t that maps AgentIDs to their specific environment steps to the agent’s own scale/timesteps.

Each AgentID in the MultiAgentEpisode has its own SingleAgentEpisode object in which this agent’s data is stored. Together with the env_t_to_agent_t mapping, we can extract information either on any individual agent’s time scale or from the (global) multi-agent environment time scale.

Extraction of data from a MultiAgentEpisode happens via the getter APIs, e.g. get_observations(), which work analogous to the ones implemented in the SingleAgentEpisode class.

Note that recorded terminateds/truncateds come as simple MultiAgentDict`s mapping AgentID to bools and thus have no assignment to a certain timestep (analogous to a SingleAgentEpisode's single `terminated/truncated boolean flag). Instead we assign it to the last observation recorded. Theoretically, there could occur edge cases in some environments where an agent receives partial rewards and then terminates without a last observation. In these cases, we duplicate the last observation.

Also, if no initial observation has been received yet for an agent, but some rewards for this same agent already occurred, we delete the agent’s data up to here, b/c there is nothing to learn from these “premature” rewards.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

Methods

`__init__`	Initializes a `MultiAgentEpisode`.
`add_env_reset`	Stores initial observation.
`add_env_step`	Adds a timestep to the episode.
`agent_steps`	Number of agent steps.
`concat_episode`	Adds the given `other` MultiAgentEpisode to the right side of `self`.
`cut`	Returns a successor episode chunk (of len=0) continuing from this Episode.
`env_steps`	Returns the number of environment steps.
`from_state`	Creates a multi-agent episode from a state dictionary.
`get_actions`	Returns agents' actions or batched ranges thereof from this episode.
`get_agents_that_stepped`	Returns a set of agent IDs of those agents that just finished stepping.
`get_agents_to_act`	Returns a set of agent IDs required to send an action to `env.step()` next.
`get_duration_s`	Returns the duration of this Episode (chunk) in seconds.
`get_extra_model_outputs`	Returns agents' actions or batched ranges thereof from this episode.
`get_infos`	Returns agents' info dicts or list (ranges) thereof from this episode.
`get_observations`	Returns agents' observations or batched ranges thereof from this episode.
`get_return`	Returns all-agent return.
`get_rewards`	Returns agents' rewards or batched ranges thereof from this episode.
`get_sample_batch`	Converts this `MultiAgentEpisode` into a `MultiAgentBatch`.
`get_state`	Returns the state of a multi-agent episode.
`get_terminateds`	Gets the terminateds at given indices.
`module_for`	Returns the ModuleID for a given AgentID.
`print`	Prints this MultiAgentEpisode as a table of observations for the agents.
`set_actions`	Overwrites all or some of this Episode's actions with the provided data.
`set_extra_model_outputs`	Overwrites all or some of this Episode's extra model outputs with `new_data`.
`set_observations`	Overwrites all or some single-agent Episode's observations with the provided data.
`set_rewards`	Overwrites all or some of this Episode's rewards with the provided data.
`slice`	Returns a slice of this episode with the given slice object.
`to_numpy`	Converts this Episode's list attributes to numpy arrays.
`validate`	Validates the episode's data.

Attributes

`id_`
`agent_to_module_mapping_fn`
`observation_space`
`action_space`
`env_t_started`
`env_t`
`agent_t_started`
`env_t_to_agent_t`
`is_terminated`
`is_truncated`
`agent_episodes`
`SKIP_ENV_TS_TAG`
`agent_episode_ids`	Returns ids from each agent's `SingleAgentEpisode`.
`agent_ids`	Returns the agent ids.
`custom_data`
`is_done`	Whether the episode is actually done (terminated or truncated).
`is_numpy`	True, if the data in this episode is already stored as numpy arrays.
`is_reset`	Returns True if `self.add_env_reset()` has already been called.