ray.rllib.env.multi_agent_episode.MultiAgentEpisode#

class ray.rllib.env.multi_agent_episode.MultiAgentEpisode(id_: str | None = None, *, observations: List[Dict[Any, Any]] | None = None, observation_space: gymnasium.Space | None = None, infos: List[Dict[Any, Any]] | None = None, actions: List[Dict[Any, Any]] | None = None, action_space: gymnasium.Space | None = None, rewards: List[Dict[Any, Any]] | None = None, terminateds: Dict[Any, Any] | bool = False, truncateds: Dict[Any, Any] | bool = False, extra_model_outputs: List[Dict[Any, Any]] | None = None, env_t_started: int | None = None, agent_t_started: Dict[Any, int] | None = None, len_lookback_buffer: int | str = 'auto', agent_episode_ids: Dict[Any, str] | None = None, agent_module_ids: Dict[Any, str] | None = None, agent_to_module_mapping_fn: Callable[[Any, MultiAgentEpisode], str] | None = None)[source]#

Stores multi-agent episode data.

The central attribute of the class is the timestep mapping self.env_t_to_agent_t that maps AgentIDs to their specific environment steps to the agent’s own scale/timesteps.

Each AgentID in the MultiAgentEpisode has its own SingleAgentEpisode object in which this agent’s data is stored. Together with the env_t_to_agent_t mapping, we can extract information either on any individual agent’s time scale or from the (global) multi-agent environment time scale.

Extraction of data from a MultiAgentEpisode happens via the getter APIs, e.g. get_observations(), which work analogous to the ones implemented in the SingleAgentEpisode class.

Note that recorded terminateds/truncateds come as simple MultiAgentDict`s mapping AgentID to bools and thus have no assignment to a certain timestep (analogous to a SingleAgentEpisode's single `terminated/truncated boolean flag). Instead we assign it to the last observation recorded. Theoretically, there could occur edge cases in some environments where an agent receives partial rewards and then terminates without a last observation. In these cases, we duplicate the last observation.

Also, if no initial observation has been received yet for an agent, but some rewards for this same agent already occurred, we delete the agent’s data up to here, b/c there is nothing to learn from these “premature” rewards.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

Methods

__init__

Initializes a MultiAgentEpisode.

add_env_reset

Stores initial observation.

add_env_step

Adds a timestep to the episode.

add_temporary_timestep_data

Temporarily adds (until to_numpy() called) per-timestep data to self.

agent_steps

Number of agent steps.

concat_episode

Adds the given other MultiAgentEpisode to the right side of self.

cut

Returns a successor episode chunk (of len=0) continuing from this Episode.

env_steps

Returns the number of environment steps.

from_state

Creates a multi-agent episode from a state dictionary.

get_actions

Returns agents' actions or batched ranges thereof from this episode.

get_agents_that_stepped

Returns a set of agent IDs of those agents that just finished stepping.

get_agents_to_act

Returns a set of agent IDs required to send an action to env.step() next.

get_duration_s

Returns the duration of this Episode (chunk) in seconds.

get_extra_model_outputs

Returns agents' actions or batched ranges thereof from this episode.

get_infos

Returns agents' info dicts or list (ranges) thereof from this episode.

get_observations

Returns agents' observations or batched ranges thereof from this episode.

get_return

Returns all-agent return.

get_rewards

Returns agents' rewards or batched ranges thereof from this episode.

get_sample_batch

Converts this MultiAgentEpisode into a MultiAgentBatch.

get_state

Returns the state of a multi-agent episode.

get_temporary_timestep_data

Returns all temporarily stored data items (list) under the given key.

get_terminateds

Gets the terminateds at given indices.

module_for

Returns the ModuleID for a given AgentID.

print

Prints this MultiAgentEpisode as a table of observations for the agents.

slice

Returns a slice of this episode with the given slice object.

to_numpy

Converts this Episode's list attributes to numpy arrays.

validate

Validates the episode's data.

Attributes

id_

agent_to_module_mapping_fn

observation_space

action_space

env_t_started

env_t

agent_t_started

env_t_to_agent_t

is_terminated

is_truncated

agent_episodes

SKIP_ENV_TS_TAG

agent_episode_ids

Returns ids from each agent's SingleAgentEpisode.

agent_ids

Returns the agent ids.

is_done

Whether the episode is actually done (terminated or truncated).

is_numpy

True, if the data in this episode is already stored as numpy arrays.

is_reset

Returns True if self.add_env_reset() has already been called.