ConnectorV2 and ConnectorV2 pipelines#

RLlib stores and transports all trajectory data in the form of SingleAgentEpisode or MultiAgentEpisode objects. Connector pipelines are the components that translate this episode data into tensor batches readable by neural network models right before the model forward pass.

../_images/generic_connector_pipeline.svg

Generic ConnectorV2 Pipeline: All pipelines consist of one or more ConnectorV2 pieces. When calling the pipeline, you pass in a list of Episodes, the RLModule instance, and a batch, which initially might be an empty dict. Each ConnectorV2 piece in the pipeline takes its predecessor’s output, starting on the left side with the batch, performs some transformations on the episodes, the batch, or both, and passes everything on to the next piece. Thereby, all ConnectorV2 pieces can read from and write to the provided episodes, add any data from these episodes to the batch, or change the data that’s already in the batch. The pipeline then returns the output batch of the last piece.#

Note

Note that the batch output of the pipeline lives only as long as the succeeding RLModule forward pass or Env.step() call. RLlib discards the data afterwards. The list of episodes, however, may persist longer. For example, if a env-to-module pipeline reads an observation from an episode, mutates that observation, and then writes it back into the episode, the subsequent module-to-env pipeline is able to see the changed observation. Also, the Learner pipeline operates on the same episodes that have already passed through both env-to-module and module-to-env pipelines and thus might have undergone changes.

Three ConnectorV2 pipeline types#

There are three different types of connector pipelines in RLlib:

  1. Env-to-module pipeline, which creates tensor batches for action computing forward passes.

  2. Module-to-env pipeline, which translates a model’s output into RL environment actions.

  3. Learner connector pipeline, which creates the train batch for a model update.

The ConnectorV2 API is an extremely powerful tool for customizing your RLlib experiments and algorithms. It allows you to take full control over accessing, changing, and re-assembling the episode data collected from your RL environments or your offline RL input files as well as controlling the exact nature and shape of the tensor batches that RLlib feeds into your models for computing actions or losses.

../_images/location_of_connector_pipelines_in_rllib.svg

ConnectorV2 Pipelines: Connector pipelines convert episodes into batched data, which your model can process (env-to-module and Learner) or convert your model’s output into action batches, which your possibly vectorized RL environment needs for stepping (module-to-env). The env-to-module pipeline, located on an EnvRunner, takes a list of episodes as input and outputs a batch for an RLModule forward pass that computes the next action. The module-to-env pipeline on the same EnvRunner takes the output of that RLModule and converts it into actions for the next call to your RL environment’s step() method. Lastly, a Learner connector pipeline, located on a Learner worker, converts a list of episodes into a train batch for the next RLModule update.#

The succeeding pages discuss the three pipeline types in more detail, however, all three have in common:

  • All connector pipelines are sequences of one or more ConnectorV2 pieces. You can nest these as well, meaning some of the pieces may be connector pipelines themselves.

  • All connector pieces and -pipelines are Python callables, overriding the __call__() method.

  • The call signatures are uniform across the different pipeline types. The main, mandatory arguments are the list of episodes, the batch to-be-built, and the RLModule instance. See the __call__() method for more details.

  • All connector pipelines can read from and write to the provided list of episodes as well as the batch and thereby perform data transforms as required.