ray.rllib.env.multi_agent_episode.MultiAgentEpisode#
- class ray.rllib.env.multi_agent_episode.MultiAgentEpisode(id_: str | None = None, *, observations: List[Dict[Hashable, Any]] | None = None, observation_space: gymnasium.Space | None = None, infos: List[Dict[Hashable, Any]] | None = None, actions: List[Dict[Hashable, Any]] | None = None, action_space: gymnasium.Space | None = None, rewards: List[Dict[Hashable, Any]] | None = None, terminateds: Dict[Hashable, Any] | bool = False, truncateds: Dict[Hashable, Any] | bool = False, extra_model_outputs: List[Dict[Hashable, Any]] | None = None, env_t_started: int | None = None, agent_t_started: Dict[Hashable, int] | None = None, len_lookback_buffer: int | str = 'auto', agent_episode_ids: Dict[Hashable, str] | None = None, agent_module_ids: Dict[Hashable, str] | None = None, agent_to_module_mapping_fn: Callable[[Hashable, MultiAgentEpisode], str] | None = None)[source]#
Stores multi-agent episode data.
The central attribute of the class is the timestep mapping
self.env_t_to_agent_tthat maps AgentIDs to their specific environment steps to the agent’s own scale/timesteps.Each AgentID in the
MultiAgentEpisodehas its ownSingleAgentEpisodeobject in which this agent’s data is stored. Together with the env_t_to_agent_t mapping, we can extract information either on any individual agent’s time scale or from the (global) multi-agent environment time scale.Extraction of data from a MultiAgentEpisode happens via the getter APIs, e.g.
get_observations(), which work analogous to the ones implemented in theSingleAgentEpisodeclass.Note that recorded
terminateds/truncatedscome as simpleMultiAgentDict`s mapping AgentID to bools and thus have no assignment to a certain timestep (analogous to a SingleAgentEpisode's single `terminated/truncatedboolean flag). Instead we assign it to the last observation recorded. Theoretically, there could occur edge cases in some environments where an agent receives partial rewards and then terminates without a last observation. In these cases, we duplicate the last observation.Also, if no initial observation has been received yet for an agent, but some rewards for this same agent already occurred, we delete the agent’s data up to here, b/c there is nothing to learn from these “premature” rewards.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.
Methods
Initializes a
MultiAgentEpisode.Stores initial observation.
Adds a timestep to the episode.
Number of agent steps.
Adds the given
otherMultiAgentEpisode to the right side ofself.Returns a successor episode chunk (of len=0) continuing from this Episode.
Returns the number of environment steps.
Creates a multi-agent episode from a state dictionary.
Returns agents' actions or batched ranges thereof from this episode.
Returns a set of agent IDs of those agents that just finished stepping.
Returns a set of agent IDs required to send an action to
env.step()next.Returns the duration of this Episode (chunk) in seconds.
Returns agents' actions or batched ranges thereof from this episode.
Returns agents' info dicts or list (ranges) thereof from this episode.
Returns agents' observations or batched ranges thereof from this episode.
Returns all-agent return.
Returns agents' rewards or batched ranges thereof from this episode.
Converts this
MultiAgentEpisodeinto aMultiAgentBatch.Returns the state of a multi-agent episode.
Gets the terminateds at given indices.
Returns the ModuleID for a given AgentID.
Prints this MultiAgentEpisode as a table of observations for the agents.
Overwrites all or some of this Episode's actions with the provided data.
Overwrites all or some of this Episode's extra model outputs with
new_data.Overwrites all or some single-agent Episode's observations with the provided data.
Overwrites all or some of this Episode's rewards with the provided data.
Returns a slice of this episode with the given slice object.
Converts this Episode's list attributes to numpy arrays.
Validates the episode's data.
Attributes
Returns ids from each agent's
SingleAgentEpisode.Returns the agent ids.
Whether the episode is actually done (terminated or truncated).
True, if the data in this episode is already stored as numpy arrays.
Returns True if
self.add_env_reset()has already been called.