ConnectorV2 and ConnectorV2 pipelines#
RLlib stores and transports all trajectory data in the form of SingleAgentEpisode
or MultiAgentEpisode
objects.
Connector pipelines are the components that translate this episode data into tensor batches
readable by neural network models right before the model forward pass.
Generic ConnectorV2 Pipeline: All pipelines consist of one or more ConnectorV2
pieces.
When calling the pipeline, you pass in a list of Episodes, the RLModule
instance,
and a batch, which initially might be an empty dict.
Each ConnectorV2
piece in the pipeline takes its predecessor’s output,
starting on the left side with the batch, performs some transformations on the episodes, the batch, or both, and passes everything
on to the next piece. Thereby, all ConnectorV2
pieces can read from and write to the
provided episodes, add any data from these episodes to the batch, or change the data that’s already in the batch.
The pipeline then returns the output batch of the last piece.#
Note
Note that the batch output of the pipeline lives only as long as the succeeding
RLModule
forward pass or Env.step()
call. RLlib discards the data afterwards.
The list of episodes, however, may persist longer. For example, if a env-to-module pipeline reads an observation from an episode,
mutates that observation, and then writes it back into the episode, the subsequent module-to-env pipeline is able to see the changed observation.
Also, the Learner pipeline operates on the same episodes that have already passed through both env-to-module and module-to-env pipelines
and thus might have undergone changes.
Three ConnectorV2 pipeline types#
There are three different types of connector pipelines in RLlib:
Env-to-module pipeline, which creates tensor batches for action computing forward passes.
Module-to-env pipeline, which translates a model’s output into RL environment actions.
Learner connector pipeline, which creates the train batch for a model update.
The ConnectorV2
API is an extremely powerful tool for
customizing your RLlib experiments and algorithms. It allows you to take full control over accessing, changing, and re-assembling
the episode data collected from your RL environments or your offline RL input files as well as controlling the exact
nature and shape of the tensor batches that RLlib feeds into your models for computing actions or losses.
ConnectorV2 Pipelines: Connector pipelines convert episodes into batched data, which your model can process
(env-to-module and Learner) or convert your model’s output into action batches, which your possibly vectorized RL environment needs for
stepping (module-to-env).
The env-to-module pipeline, located on an EnvRunner
, takes a list of
episodes as input and outputs a batch for an RLModule
forward pass
that computes the next action. The module-to-env pipeline on the same EnvRunner
takes the output of that RLModule
and converts it into actions
for the next call to your RL environment’s step()
method.
Lastly, a Learner connector pipeline, located on a Learner
worker, converts a list of episodes into a train batch for the next RLModule
update.#
The succeeding pages discuss the three pipeline types in more detail, however, all three have in common:
All connector pipelines are sequences of one or more
ConnectorV2
pieces. You can nest these as well, meaning some of the pieces may be connector pipelines themselves.All connector pieces and -pipelines are Python callables, overriding the
__call__()
method.The call signatures are uniform across the different pipeline types. The main, mandatory arguments are the list of episodes, the batch to-be-built, and the
RLModule
instance. See the__call__()
method for more details.All connector pipelines can read from and write to the provided list of episodes as well as the batch and thereby perform data transforms as required.