ray.rllib.env.single_agent_episode.SingleAgentEpisode.get_actions#
- SingleAgentEpisode.get_actions(indices: int | slice | List[int] | None = None, *, neg_index_as_lookback: bool = False, fill: Any | None = None, one_hot_discrete: bool = False) Any[source]#
 Returns individual actions or batched ranges thereof from this episode.
- Parameters:
 indices – A single int is interpreted as an index, from which to return the individual action stored at this index. A list of ints is interpreted as a list of indices from which to gather individual actions in a batch of size len(indices). A slice object is interpreted as a range of actions to be returned. Thereby, negative indices by default are interpreted as “before the end” unless the
neg_index_as_lookback=Trueoption is used, in which case negative indices are interpreted as “before ts=0”, meaning going back into the lookback buffer. If None, will return all actions (from ts=0 to the end).neg_index_as_lookback – If True, negative values in
indicesare interpreted as “before ts=0”, meaning going back into the lookback buffer. For example, an episode with actions [4, 5, 6, 7, 8, 9], where [4, 5, 6] is the lookback buffer range (ts=0 item is 7), will respond toget_actions(-1, neg_index_as_lookback=True)with6and toget_actions(slice(-2, 1), neg_index_as_lookback=True)with[5, 6, 7].fill – An optional value to use for filling up the returned results at the boundaries. This filling only happens if the requested index range’s start/stop boundaries exceed the episode’s boundaries (including the lookback buffer on the left side). This comes in very handy, if users don’t want to worry about reaching such boundaries and want to zero-pad. For example, an episode with actions [10, 11, 12, 13, 14] and lookback buffer size of 2 (meaning actions
10and11are part of the lookback buffer) will respond toget_actions(slice(-7, -2), fill=0.0)with[0.0, 0.0, 10, 11, 12].one_hot_discrete – If True, will return one-hot vectors (instead of int-values) for those sub-components of a (possibly complex) action space that are Discrete or MultiDiscrete. Note that if
fill=0and the requestedindicesare out of the range of our data, the returned one-hot vectors will actually be zero-hot (all slots zero).
Examples:
import gymnasium as gym from ray.rllib.env.single_agent_episode import SingleAgentEpisode episode = SingleAgentEpisode( # Discrete(4) actions (ints between 0 and 4 (excl.)) action_space=gym.spaces.Discrete(4), actions=[1, 2, 3], observations=[0, 1, 2, 3], rewards=[1, 2, 3], # <- not relevant here len_lookback_buffer=0, # no lookback; all data is actually "in" episode ) # Plain usage (`indices` arg only). episode.get_actions(-1) # 3 episode.get_actions(0) # 1 episode.get_actions([0, 2]) # [1, 3] episode.get_actions([-1, 0]) # [3, 1] episode.get_actions(slice(None, 2)) # [1, 2] episode.get_actions(slice(-2, None)) # [2, 3] # Using `fill=...` (requesting slices beyond the boundaries). episode.get_actions(slice(-5, -2), fill=-9) # [-9, -9, 1, 2] episode.get_actions(slice(1, 5), fill=-7) # [2, 3, -7, -7] # Using `one_hot_discrete=True`. episode.get_actions(1, one_hot_discrete=True) # [0 0 1 0] (action=2) episode.get_actions(2, one_hot_discrete=True) # [0 0 0 1] (action=3) episode.get_actions( slice(0, 2), one_hot_discrete=True, ) # [[0 1 0 0], [0 0 0 1]] (actions=1 and 3) # Special case: Using `fill=0.0` AND `one_hot_discrete=True`. episode.get_actions( -1, neg_index_as_lookback=True, # -1 means one left of ts=0 fill=0.0, one_hot_discrete=True, ) # [0 0 0 0] <- all 0s one-hot tensor (note difference to [1 0 0 0]!)
- Returns:
 The collected actions. As a 0-axis batch, if there are several
indicesor a list of exactly one index provided ORindicesis a slice object. As single item (B=0 -> no additional 0-axis) ifindicesis a single int.