ray.rllib.env.multi_agent_episode.MultiAgentEpisode#
- class ray.rllib.env.multi_agent_episode.MultiAgentEpisode(id_: str | None = None, *, observations: List[Dict[Any, Any]] | None = None, observation_space: gymnasium.Space | None = None, infos: List[Dict[Any, Any]] | None = None, actions: List[Dict[Any, Any]] | None = None, action_space: gymnasium.Space | None = None, rewards: List[Dict[Any, Any]] | None = None, terminateds: Dict[Any, Any] | bool = False, truncateds: Dict[Any, Any] | bool = False, extra_model_outputs: List[Dict[Any, Any]] | None = None, env_t_started: int | None = None, agent_t_started: Dict[Any, int] | None = None, len_lookback_buffer: int | str = 'auto', agent_episode_ids: Dict[Any, str] | None = None, agent_module_ids: Dict[Any, str] | None = None, agent_to_module_mapping_fn: Callable[[Any, MultiAgentEpisode], str] | None = None)[source]#
Stores multi-agent episode data.
The central attribute of the class is the timestep mapping
self.env_t_to_agent_t
that maps AgentIDs to their specific environment steps to the agent’s own scale/timesteps.Each AgentID in the
MultiAgentEpisode
has its ownSingleAgentEpisode
object in which this agent’s data is stored. Together with the env_t_to_agent_t mapping, we can extract information either on any individual agent’s time scale or from the (global) multi-agent environment time scale.Extraction of data from a MultiAgentEpisode happens via the getter APIs, e.g.
get_observations()
, which work analogous to the ones implemented in theSingleAgentEpisode
class.Note that recorded
terminateds
/truncateds
come as simpleMultiAgentDict`s mapping AgentID to bools and thus have no assignment to a certain timestep (analogous to a SingleAgentEpisode's single `terminated/truncated
boolean flag). Instead we assign it to the last observation recorded. Theoretically, there could occur edge cases in some environments where an agent receives partial rewards and then terminates without a last observation. In these cases, we duplicate the last observation.Also, if no initial observation has been received yet for an agent, but some rewards for this same agent already occurred, we delete the agent’s data up to here, b/c there is nothing to learn from these “premature” rewards.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.
Methods
Initializes a
MultiAgentEpisode
.Stores initial observation.
Adds a timestep to the episode.
Temporarily adds (until
to_numpy()
called) per-timestep data to self.Number of agent steps.
Adds the given
other
MultiAgentEpisode to the right side of self.Returns a successor episode chunk (of len=0) continuing from this Episode.
Returns the number of environment steps.
Creates a multi-agent episode from a state dictionary.
Returns agents' actions or batched ranges thereof from this episode.
Returns a set of agent IDs of those agents that just finished stepping.
Returns a set of agent IDs required to send an action to
env.step()
next.Returns the duration of this Episode (chunk) in seconds.
Returns agents' actions or batched ranges thereof from this episode.
Returns agents' info dicts or list (ranges) thereof from this episode.
Returns agents' observations or batched ranges thereof from this episode.
Returns all-agent return.
Returns agents' rewards or batched ranges thereof from this episode.
Converts this
MultiAgentEpisode
into aMultiAgentBatch
.Returns the state of a multi-agent episode.
Returns all temporarily stored data items (list) under the given key.
Gets the terminateds at given indices.
Returns the ModuleID for a given AgentID.
Prints this MultiAgentEpisode as a table of observations for the agents.
Returns a slice of this episode with the given slice object.
Converts this Episode's list attributes to numpy arrays.
Validates the episode's data.
Attributes
Returns ids from each agent's
SingleAgentEpisode
.Returns the agent ids.
Whether the episode is actually done (terminated or truncated).
True, if the data in this episode is already stored as numpy arrays.
Returns True if
self.add_env_reset()
has already been called.