ray.rllib.env.single_agent_episode.SingleAgentEpisode.set_extra_model_outputs#
- SingleAgentEpisode.set_extra_model_outputs(*, key, new_data, at_indices: int | slice | List[int] | None = None, neg_index_as_lookback: bool = False) None [source]#
Overwrites all or some of this Episode’s extra model outputs with
new_data
.Note that an episode’s
extra_model_outputs
data cannot be written to directly as it is managed by aInfiniteLookbackBuffer
object. Normally, individual, currentextra_model_output
values are added to the episode either by callingself.add_env_step
or more directly (and manually) viaself.extra_model_outputs[key].append|extend()
. However, for certain postprocessing steps, the entirety (or a slice) of an episode’sextra_model_outputs
might have to be rewritten or a new key (a new type ofextra_model_outputs
) must be inserted, which is whenself.set_extra_model_outputs()
should be used.- Parameters:
key – The
key
withinself.extra_model_outputs
to override data on or to insert as a new key intoself.extra_model_outputs
.new_data – The new data to overwrite existing data with. This may be a list of individual reward(s) in case this episode is still not numpy’ized yet. In case this episode has already been numpy’ized, this should be a np.ndarray with a length exactly the size of the to-be-overwritten slice or segment (provided by
at_indices
).at_indices – A single int is interpreted as one index, which to overwrite with
new_data
(which is expected to be a single reward). A list of ints is interpreted as a list of indices, all of which to overwrite withnew_data
(which is expected to be of the same size aslen(at_indices)
). A slice object is interpreted as a range of indices to be overwritten withnew_data
(which is expected to be of the same size as the provided slice). Thereby, negative indices by default are interpreted as “before the end” unless theneg_index_as_lookback=True
option is used, in which case negative indices are interpreted as “before ts=0”, meaning going back into the lookback buffer.neg_index_as_lookback – If True, negative values in
at_indices
are interpreted as “before ts=0”, meaning going back into the lookback buffer. For example, an episode with rewards = [4, 5, 6, 7, 8, 9], where [4, 5, 6] is the lookback buffer range (ts=0 item is 7), will handle a call toset_rewards(individual_reward, -1, neg_index_as_lookback=True)
by overwriting the value of 6 in our rewards buffer with the provided “individual_reward”.
- Raises:
IndexError – If the provided
at_indices
do not match the size ofnew_data
.