ray.rllib.env.single_agent_episode.SingleAgentEpisode.get_extra_model_outputs#
- SingleAgentEpisode.get_extra_model_outputs(key: str, indices: int | slice | List[int] | None = None, *, neg_index_as_lookback: bool = False, fill: Any | None = None) Any [source]#
Returns extra model outputs (under given key) from this episode.
- Parameters:
key – The
key
withinself.extra_model_outputs
to extract data for.indices – A single int is interpreted as an index, from which to return an individual extra model output stored under
key
at index. A list of ints is interpreted as a list of indices from which to gather individual actions in a batch of size len(indices). A slice object is interpreted as a range of extra model outputs to be returned. Thereby, negative indices by default are interpreted as “before the end” unless theneg_index_as_lookback=True
option is used, in which case negative indices are interpreted as “before ts=0”, meaning going back into the lookback buffer. If None, will return all extra model outputs (from ts=0 to the end).neg_index_as_lookback – If True, negative values in
indices
are interpreted as “before ts=0”, meaning going back into the lookback buffer. For example, an episode with extra_model_outputs[‘a’] = [4, 5, 6, 7, 8, 9], where [4, 5, 6] is the lookback buffer range (ts=0 item is 7), will respond toget_extra_model_outputs("a", -1, neg_index_as_lookback=True)
with6
and toget_extra_model_outputs("a", slice(-2, 1), neg_index_as_lookback=True)
with[5, 6, 7]
.fill – An optional value to use for filling up the returned results at the boundaries. This filling only happens if the requested index range’s start/stop boundaries exceed the episode’s boundaries (including the lookback buffer on the left side). This comes in very handy, if users don’t want to worry about reaching such boundaries and want to zero-pad. For example, an episode with extra_model_outputs[“b”] = [10, 11, 12, 13, 14] and lookback buffer size of 2 (meaning
10
and11
are part of the lookback buffer) will respond toget_extra_model_outputs("b", slice(-7, -2), fill=0.0)
with[0.0, 0.0, 10, 11, 12]
. TODO (sven): This would require a space being provided. Maybe we can automatically infer the space from existing data?
Examples:
from ray.rllib.env.single_agent_episode import SingleAgentEpisode episode = SingleAgentEpisode( extra_model_outputs={"mo": [1, 2, 3]}, len_lookback_buffer=0, # no lookback; all data is actually "in" episode # The following is needed, but not relevant for this demo. observations=[0, 1, 2, 3], actions=[1, 2, 3], rewards=[1, 2, 3], ) # Plain usage (`indices` arg only). episode.get_extra_model_outputs("mo", -1) # 3 episode.get_extra_model_outputs("mo", 1) # 0 episode.get_extra_model_outputs("mo", [0, 2]) # [1, 3] episode.get_extra_model_outputs("mo", [-1, 0]) # [3, 1] episode.get_extra_model_outputs("mo", slice(None, 2)) # [1, 2] episode.get_extra_model_outputs("mo", slice(-2, None)) # [2, 3] # Using `fill=...` (requesting slices beyond the boundaries). # TODO (sven): This would require a space being provided. Maybe we can # automatically infer the space from existing data? # episode.get_extra_model_outputs("mo", slice(-5, -2), fill=0) # [0, 0, 1] # episode.get_extra_model_outputs("mo", slice(2, 5), fill=-1) # [3, -1, -1]
- Returns:
The collected extra_model_outputs[
key
]. As a 0-axis batch, if there are severalindices
or a list of exactly one index provided ORindices
is a slice object. As single item (B=0 -> no additional 0-axis) ifindices
is a single int.