ray.rllib.env.multi_agent_episode.MultiAgentEpisode.__init__#

MultiAgentEpisode.__init__(id_: str | None = None, *, observations: List[Dict[Any, Any]] | None = None, observation_space: gymnasium.Space | None = None, infos: List[Dict[Any, Any]] | None = None, actions: List[Dict[Any, Any]] | None = None, action_space: gymnasium.Space | None = None, rewards: List[Dict[Any, Any]] | None = None, terminateds: Dict[Any, Any] | bool = False, truncateds: Dict[Any, Any] | bool = False, extra_model_outputs: List[Dict[Any, Any]] | None = None, env_t_started: int | None = None, agent_t_started: Dict[Any, int] | None = None, len_lookback_buffer: int | str = 'auto', agent_episode_ids: Dict[Any, str] | None = None, agent_module_ids: Dict[Any, str] | None = None, agent_to_module_mapping_fn: Callable[[Any, MultiAgentEpisode], str] | None = None)[source]#

Initializes a MultiAgentEpisode.

Parameters:
  • id – Optional. Either a string to identify an episode or None. If None, a hexadecimal id is created. In case of providing a string, make sure that it is unique, as episodes get concatenated via this string.

  • observations – A list of dictionaries mapping agent IDs to observations. Can be None. If provided, should match all other episode data (actions, rewards, etc.) in terms of list lengths and agent IDs.

  • observation_space – An optional gym.spaces.Dict mapping agent IDs to individual agents’ spaces, which all (individual agents’) observations should abide to. If not None and this MultiAgentEpisode is numpy’ized (via the self.to_numpy() method), and data is appended or set, the new data will be checked for correctness.

  • infos – A list of dictionaries mapping agent IDs to info dicts. Can be None. If provided, should match all other episode data (observations, rewards, etc.) in terms of list lengths and agent IDs.

  • actions – A list of dictionaries mapping agent IDs to actions. Can be None. If provided, should match all other episode data (observations, rewards, etc.) in terms of list lengths and agent IDs.

  • action_space – An optional gym.spaces.Dict mapping agent IDs to individual agents’ spaces, which all (individual agents’) actions should abide to. If not None and this MultiAgentEpisode is numpy’ized (via the self.to_numpy() method), and data is appended or set, the new data will be checked for correctness.

  • rewards – A list of dictionaries mapping agent IDs to rewards. Can be None. If provided, should match all other episode data (actions, rewards, etc.) in terms of list lengths and agent IDs.

  • terminateds – A boolean defining if an environment has terminated OR a MultiAgentDict mapping individual agent ids to boolean flags indicating whether individual agents have terminated. A special __all__ key in these dicts indicates, whether the episode is terminated for all agents. The default is False, i.e. the episode has not been terminated.

  • truncateds – A boolean defining if the environment has been truncated OR a MultiAgentDict mapping individual agent ids to boolean flags indicating whether individual agents have been truncated. A special __all__ key in these dicts indicates, whether the episode is truncated for all agents. The default is False, i.e. the episode has not been truncated.

  • extra_model_outputs – A list of dictionaries mapping agent IDs to their corresponding extra model outputs. Each of these “outputs” is a dict mapping keys (str) to model output values, for example for key=STATE_OUT, the values would be the internal state outputs for that agent.

  • env_t_started – The env timestep (int) that defines the starting point of the episode. This is only larger zero, if an already ongoing episode chunk is being created, for example by slicing an ongoing episode or by calling the cut() method on an ongoing episode.

  • agent_t_started – A dict mapping AgentIDs to the respective agent’s (local) timestep at which its SingleAgentEpisode chunk started.

  • len_lookback_buffer – The size of the lookback buffers to keep in front of this Episode for each type of data (observations, actions, etc..). If larger 0, will interpret the first len_lookback_buffer items in each type of data as NOT part of this actual episode chunk, but instead serve as “historical” record that may be viewed and used to derive new data from. For example, it might be necessary to have a lookback buffer of four if you would like to do observation frame stacking and your episode has been cut and you are now operating on a new chunk (continuing from the cut one). Then, for the first 3 items, you would have to be able to look back into the old chunk’s data. If len_lookback_buffer is “auto” (default), will interpret all provided data in the constructor as part of the lookback buffers.

  • agent_episode_ids – An optional dict mapping AgentIDs to their corresponding SingleAgentEpisode. If None, each SingleAgentEpisode in MultiAgentEpisode.agent_episodes will generate a hexadecimal code. If a dictionary is provided, make sure that IDs are unique, because the agents’ SingleAgentEpisode instances are concatenated or recreated by it.

  • agent_module_ids – An optional dict mapping AgentIDs to their respective ModuleIDs (these mapping are always valid for an entire episode and thus won’t change during the course of this episode). If a mapping from agent to module has already been provided via this dict, the (optional) agent_to_module_mapping_fn will NOT be used again to map the same agent (agents do not change their assigned module in the course of one episode).

  • agent_to_module_mapping_fn – A callable taking an AgentID and a MultiAgentEpisode as args and returning a ModuleID. Used to map agents that have not been mapped yet (because they just entered this episode) to a ModuleID. The resulting ModuleID is only stored inside the agent’s SingleAgentEpisode object.