ray.rllib.env.multi_agent_episode.MultiAgentEpisode.get_infos#

MultiAgentEpisode.get_infos(indices: int | slice | List[int] | None = None, agent_ids: Collection[Any] | Any | None = None, *, env_steps: bool = True, neg_index_as_lookback: bool = False, fill: Any | None = None, return_list: bool = False) Dict[Any, Any] | List[Dict[Any, Any]][source]#

Returns agents’ info dicts or list (ranges) thereof from this episode.

Parameters:
  • indices – A single int is interpreted as an index, from which to return the individual info dict stored at this index. A list of ints is interpreted as a list of indices from which to gather individual info dicts in a list of size len(indices). A slice object is interpreted as a range of info dicts to be returned. Thereby, negative indices by default are interpreted as “before the end” unless the neg_index_as_lookback=True option is used, in which case negative indices are interpreted as “before ts=0”, meaning going back into the lookback buffer. If None, will return all infos (from ts=0 to the end).

  • agent_ids – An optional collection of AgentIDs or a single AgentID to get info dicts for. If None, will return info dicts for all agents in this episode.

  • env_steps – Whether indices should be interpreted as environment time steps (True) or per-agent timesteps (False).

  • neg_index_as_lookback – If True, negative values in indices are interpreted as “before ts=0”, meaning going back into the lookback buffer. For example, an episode with agent A’s info dicts [{“l”:4}, {“l”:5}, {“l”:6}, {“a”:7}, {“b”:8}, {“c”:9}], where the first 3 items are the lookback buffer (ts=0 item is {“a”: 7}), will respond to get_infos(-1, agent_ids=A, neg_index_as_lookback=True) with {A: {"l":6}} and to get_infos(slice(-2, 1), agent_ids=A, neg_index_as_lookback=True) with {A: [{"l":5}, {"l":6},  {"a":7}]}.

  • fill – An optional value to use for filling up the returned results at the boundaries. This filling only happens if the requested index range’s start/stop boundaries exceed the episode’s boundaries (including the lookback buffer on the left side). This comes in very handy, if users don’t want to worry about reaching such boundaries and want to auto-fill. For example, an episode with agent A’s infos being [{“l”:10}, {“l”:11}, {“a”:12}, {“b”:13}, {“c”:14}] and lookback buffer size of 2 (meaning infos {“l”:10}, {“l”:11} are part of the lookback buffer) will respond to get_infos(slice(-7, -2), agent_ids=A, fill={"o": 0.0}) with {A: [{"o":0.0}, {"o":0.0}, {"l":10}, {"l":11}, {"a":12}]}.

  • return_list – Whether to return a list of multi-agent dicts (instead of a single multi-agent dict of lists/structs). False by default. This option can only be used when env_steps is True due to the fact the such a list can only be interpreted as one env step per list item (would not work with agent steps).

Returns:

A dictionary mapping agent IDs to observations (at the given indices). If env_steps is True, only agents that have stepped (were ready) at the given env step indices are returned (i.e. not all agent IDs are necessarily in the keys). If return_list is True, returns a list of MultiAgentDicts (mapping agent IDs to infos) instead.