ray.rllib.env.multi_agent_episode.MultiAgentEpisode.to_numpy#

MultiAgentEpisode.to_numpy() MultiAgentEpisode[source]#

Converts this Episode’s list attributes to numpy arrays.

This means in particular that this episodes’ lists (per single agent) of (possibly complex) data (e.g. an agent having a dict obs space) will be converted to (possibly complex) structs, whose leafs are now numpy arrays. Each of these leaf numpy arrays will have the same length (batch dimension) as the length of the original lists.

Note that Columns.INFOS are NEVER numpy’ized and will remain a list (normally, a list of the original, env-returned dicts). This is due to the heterogeneous nature of INFOS returned by envs, which would make it unwieldy to convert this information to numpy arrays.

After calling this method, no further data may be added to this episode via the self.add_env_step() method.

Examples:

import numpy as np

from ray.rllib.env.multi_agent_episode import MultiAgentEpisode
from ray.rllib.env.tests.test_multi_agent_episode import (
    TestMultiAgentEpisode
)

# Create some multi-agent episode data.
(
    observations,
    actions,
    rewards,
    terminateds,
    truncateds,
    infos,
) = TestMultiAgentEpisode._mock_multi_agent_records()
# Define the agent ids.
agent_ids = ["agent_1", "agent_2", "agent_3", "agent_4", "agent_5"]

episode = MultiAgentEpisode(
    observations=observations,
    infos=infos,
    actions=actions,
    rewards=rewards,
    # Note: terminated/truncated have nothing to do with an episode
    # being converted `to_numpy` or not (via the `self.to_numpy()` method)!
    terminateds=terminateds,
    truncateds=truncateds,
    len_lookback_buffer=0,  # no lookback; all data is actually "in" episode
)

# Episode has not been numpy'ized yet.
assert not episode.is_numpy
# We are still operating on lists.
assert (
    episode.get_observations(
        indices=[1],
        agent_ids="agent_1",
    ) == {"agent_1": [1]}
)

# Numpy'ized the episode.
episode.to_numpy()
assert episode.is_numpy

# Everything is now numpy arrays (with 0-axis of size
# B=[len of requested slice]).
assert (
    isinstance(episode.get_observations(
        indices=[1],
        agent_ids="agent_1",
    )["agent_1"], np.ndarray)
)
Returns:

This MultiAgentEpisode object with the converted numpy data.