ray.rllib.offline.offline_prelearner.OfflinePreLearner#

class ray.rllib.offline.offline_prelearner.OfflinePreLearner(*, config: AlgorithmConfig, learner: Learner | list[ActorHandle], spaces: Tuple[gymnasium.Space, gymnasium.Space] | None = None, module_spec: MultiRLModuleSpec | None = None, module_state: Dict[str, Any] | None = None, **kwargs: Dict[str, Any])[source]#

Class that coordinates data transformation from dataset to learner.

This class is an essential part of the new Offline RL API of RLlib. It is a callable class that is run in ray.data.Dataset.map_batches when iterating over batches for training. It’s basic function is to convert data in batch from rows to episodes (SingleAGentEpisode`s for now) and to then run the learner connector pipeline to convert further to trainable batches. These batches are used directly in the `Learner’s update method.

The main reason to run these transformations inside of map_batches is for better performance. Batches can be pre-fetched in ray.data and therefore batch trransformation can be run highly parallelized to the Learner''s `update.

This class can be overridden to implement custom logic for transforming batches and make them ‘Learner’-ready. When deriving from this class the __call__ method and _map_to_episodes can be overridden to induce custom logic for the complete transformation pipeline (__call__) or for converting to episodes only (‘_map_to_episodes`). For an example how this class can be used to also compute values and advantages see rllib.algorithm.marwil.marwil_prelearner.MAWRILOfflinePreLearner.

Custom OfflinePreLearner classes can be passed into AlgorithmConfig.offline’s prelearner_class. The OfflineData class will then use the custom class in its data pipeline.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

Methods

Attributes

default_prelearner_buffer_class

Sets the default replay buffer.

default_prelearner_buffer_kwargs

Sets the default arguments for the replay buffer.