ray.rllib.env.single_agent_episode.SingleAgentEpisode.to_numpy#

SingleAgentEpisode.to_numpy() → SingleAgentEpisode[source]#

Converts this Episode’s list attributes to numpy arrays.

This means in particular that this episodes’ lists of (possibly complex) data (e.g. if we have a dict obs space) will be converted to (possibly complex) structs, whose leafs are now numpy arrays. Each of these leaf numpy arrays will have the same length (batch dimension) as the length of the original lists.

Note that the data under the Columns.INFOS are NEVER numpy’ized and will remain a list (normally, a list of the original, env-returned dicts). This is due to the herterogenous nature of INFOS returned by envs, which would make it unwieldy to convert this information to numpy arrays.

After calling this method, no further data may be added to this episode via the self.add_env_step() method.

Examples:

import numpy as np

from ray.rllib.env.single_agent_episode import SingleAgentEpisode

episode = SingleAgentEpisode(
    observations=[0, 1, 2, 3],
    actions=[1, 2, 3],
    rewards=[1, 2, 3],
    # Note: terminated/truncated have nothing to do with an episode
    # being numpy'ized or not (via the `self.to_numpy()` method)!
    terminated=False,
    len_lookback_buffer=0,  # no lookback; all data is actually "in" episode
)
# Episode has not been numpy'ized yet.
assert not episode.is_numpy
# We are still operating on lists.
assert episode.get_observations([1]) == [1]
assert episode.get_observations(slice(None, 2)) == [0, 1]
# We can still add data (and even add the terminated=True flag).
episode.add_env_step(
    observation=4,
    action=4,
    reward=4,
    terminated=True,
)
# Still NOT numpy'ized.
assert not episode.is_numpy

# Numpy'ized the episode.
episode.to_numpy()
assert episode.is_numpy

# We cannot add data anymore. The following would crash.
# episode.add_env_step(observation=5, action=5, reward=5)

# Everything is now numpy arrays (with 0-axis of size
# B=[len of requested slice]).
assert isinstance(episode.get_observations([1]), np.ndarray)  # B=1
assert isinstance(episode.actions[0:2], np.ndarray)  # B=2
assert isinstance(episode.rewards[1:4], np.ndarray)  # B=3

Returns:: This SingleAgentEpisode object with the converted numpy data.