ray.rllib.algorithms.algorithm_config.AlgorithmConfig.offline_data#
- AlgorithmConfig.offline_data(*, input_: str | ~typing.Callable[[~ray.rllib.offline.io_context.IOContext], ~ray.rllib.offline.input_reader.InputReader] | None = <ray.rllib.utils.from_config._NotProvided object>, input_read_method: str | ~typing.Callable | None = <ray.rllib.utils.from_config._NotProvided object>, input_read_method_kwargs: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, input_read_schema: ~typing.Dict[str, str] | None = <ray.rllib.utils.from_config._NotProvided object>, input_read_episodes: bool | None = <ray.rllib.utils.from_config._NotProvided object>, input_read_sample_batches: bool | None = <ray.rllib.utils.from_config._NotProvided object>, input_filesystem: str | None = <ray.rllib.utils.from_config._NotProvided object>, input_filesystem_kwargs: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, input_compress_columns: ~typing.List[str] | None = <ray.rllib.utils.from_config._NotProvided object>, materialize_data: bool | None = <ray.rllib.utils.from_config._NotProvided object>, materialize_mapped_data: bool | None = <ray.rllib.utils.from_config._NotProvided object>, map_batches_kwargs: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, iter_batches_kwargs: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, prelearner_class: ~typing.Type | None = <ray.rllib.utils.from_config._NotProvided object>, prelearner_buffer_class: ~typing.Type | None = <ray.rllib.utils.from_config._NotProvided object>, prelearner_buffer_kwargs: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, prelearner_module_synch_period: int | None = <ray.rllib.utils.from_config._NotProvided object>, dataset_num_iters_per_learner: int | None = <ray.rllib.utils.from_config._NotProvided object>, input_config: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, actions_in_input_normalized: bool | None = <ray.rllib.utils.from_config._NotProvided object>, postprocess_inputs: bool | None = <ray.rllib.utils.from_config._NotProvided object>, shuffle_buffer_size: int | None = <ray.rllib.utils.from_config._NotProvided object>, output: str | None = <ray.rllib.utils.from_config._NotProvided object>, output_config: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, output_compress_columns: ~typing.List[str] | None = <ray.rllib.utils.from_config._NotProvided object>, output_max_file_size: float | None = <ray.rllib.utils.from_config._NotProvided object>, output_max_rows_per_file: int | None = <ray.rllib.utils.from_config._NotProvided object>, output_write_method: str | None = <ray.rllib.utils.from_config._NotProvided object>, output_write_method_kwargs: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, output_filesystem: str | None = <ray.rllib.utils.from_config._NotProvided object>, output_filesystem_kwargs: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, output_write_episodes: bool | None = <ray.rllib.utils.from_config._NotProvided object>, offline_sampling: str | None = <ray.rllib.utils.from_config._NotProvided object>) AlgorithmConfig [source]#
Sets the config’s offline data settings.
- Parameters:
input – Specify how to generate experiences: - “sampler”: Generate experiences via online (env) simulation (default). - A local directory or file glob expression (e.g., “/tmp/.json”). - A list of individual file paths/URIs (e.g., [“/tmp/1.json”, “s3://bucket/2.json”]). - A dict with string keys and sampling probabilities as values (e.g., {“sampler”: 0.4, “/tmp/.json”: 0.4, “s3://bucket/expert.json”: 0.2}). - A callable that takes an
IOContext
object as only arg and returns aray.rllib.offline.InputReader
. - A string key that indexes a callable withtune.registry.register_input
input_read_method – Read method for the
ray.data.Dataset
to read in the offline data frominput_
. The default isread_parquet
for Parquet files. See https://docs.ray.io/en/latest/data/api/input_output.html for more info about available read methods inray.data
.input_read_method_kwargs – Keyword args for
input_read_method
. These will be passed into the read method without checking. If no arguments are passed in the default argument{'override_num_blocks': max(num_learners * 2, 2)}
is used. Use these keyword args together withmap_batches_kwargs
anditer_batches_kwargs
to tune the performance of the data pipeline.input_read_schema – Table schema for converting offline data to episodes. This schema maps the offline data columns to ray.rllib.core.columns.Columns:
{Columns.OBS: 'o_t', Columns.ACTIONS: 'a_t', ...}
. Columns in the data set that are not mapped via this schema are sorted into episodes’extra_model_outputs
. If no schema is passed in the default schema used isray.rllib.offline.offline_data.SCHEMA
. If your data set contains already the names in this schema, noinput_read_schema
is needed.input_read_episodes – Whether offline data is already stored in RLlib’s
EpisodeType
format, i.e.ray.rllib.env.SingleAgentEpisode
(multi -agent is planned but not supported, yet). Reading episodes directly avoids additional transform steps and is usually faster and therefore the recommended format when your application remains fully inside of RLlib’s schema. The other format is a columnar format and is agnostic to the RL framework used. Use the latter format, if you are unsure when to use the data or in which RL framework. The default is to read column data, i.e. False.input_read_episodes
andinput_read_sample_batches
cannot be True at the same time. See alsooutput_write_episodes
to define the output data format when recording.input_read_sample_batches – Whether offline data is stored in RLlib’s old stack
SampleBatch
type. This is usually the case for older data recorded with RLlib in JSON line format. Reading inSampleBatch
data needs extra transforms and might not concatenate episode chunks contained in differentSampleBatch`es in the data. If possible avoid to read `SampleBatch`es and convert them in a controlled form into RLlib's `EpisodeType
(i.e.SingleAgentEpisode
orMultiAgentEpisode
). The default is False.input_read_episodes
andinput_read_sample_batches
cannot be True at the same time.input_filesystem – A cloud filesystem to handle access to cloud storage when reading experiences. Should be either “gcs” for Google Cloud Storage, “s3” for AWS S3 buckets, or “abs” for Azure Blob Storage.
input_filesystem_kwargs – A dictionary holding the kwargs for the filesystem given by
input_filesystem
. Seegcsfs.GCSFilesystem
for GCS,pyarrow.fs.S3FileSystem
, for S3, andablfs.AzureBlobFilesystem
for ABS filesystem arguments.input_compress_columns – What input columns are compressed with LZ4 in the input data. If data is stored in RLlib’s
SingleAgentEpisode
(MultiAgentEpisode
not supported, yet). Note,rllib.core.columns.Columns.OBS
will also try to decompressrllib.core.columns.Columns.NEXT_OBS
.materialize_data – Whether the raw data should be materialized in memory. This boosts performance, but requires enough memory to avoid an OOM, so make sure that your cluster has the resources available. For very large data you might want to switch to streaming mode by setting this to
False
(default). If your algorithm does not need the RLModule in the Learner connector pipeline or all (learner) connectors are stateless you should consider settingmaterialize_mapped_data
toTrue
instead (and setmaterialize_data
toFalse
). If your data does not fit into memory and your Learner connector pipeline requires an RLModule or is stateful, set bothmaterialize_data
andmaterialize_mapped_data
toFalse
.materialize_mapped_data – Whether the data should be materialized after running it through the Learner connector pipeline (i.e. after running the
OfflinePreLearner
). This improves performance, but should only be used in case the (learner) connector pipeline does not require an RLModule and the (learner) connector pipeline is stateless. For example, MARWIL’s Learner connector pipeline requires the RLModule for value function predictions and training batches would become stale after some iterations causing learning degradation or divergence. Also ensure that your cluster has enough memory available to avoid an OOM. If set toTrue
(True), make sure thatmaterialize_data
is set toFalse
to avoid materialization of two datasets. If your data does not fit into memory and your Learner connector pipeline requires an RLModule or is stateful, set bothmaterialize_data
andmaterialize_mapped_data
toFalse
.map_batches_kwargs – Keyword args for the
map_batches
method. These will be passed into theray.data.Dataset.map_batches
method when sampling without checking. If no arguments passed in the default arguments{'concurrency': max(2, num_learners), 'zero_copy_batch': True}
is used. Use these keyword args together withinput_read_method_kwargs
anditer_batches_kwargs
to tune the performance of the data pipeline.iter_batches_kwargs – Keyword args for the
iter_batches
method. These will be passed into theray.data.Dataset.iter_batches
method when sampling without checking. If no arguments are passed in, the default argument{'prefetch_batches': 2, 'local_buffer_shuffle_size': train_batch_size_per_learner x 4}
is used. Use these keyword args together withinput_read_method_kwargs
andmap_batches_kwargs
to tune the performance of the data pipeline.prelearner_class – An optional
OfflinePreLearner
class that is used to transform data batches inray.data.map_batches
used in theOfflineData
class to transform data from columns to batches that can be used in theLearner.update...()
methods. Override theOfflinePreLearner
class and pass your derived class in here, if you need to make some further transformations specific for your data or loss. The default is None which uses the baseOfflinePreLearner
defined inray.rllib.offline.offline_prelearner
.prelearner_module_synch_period – The period (number of batches converted) after which the
RLModule
held by thePreLearner
should sync weights. ThePreLearner
is used to preprocess batches for the learners. The higher this value the more off-policy thePreLearner
’s module will be. Values too small will force thePreLearner
to sync more frequently and thus might slow down the data pipeline. The default value chosen by theOfflinePreLearner
is 10.dataset_num_iters_per_learner – Number of updates to run in each learner during a single training iteration. If None, each learner runs a complete epoch over its data block (the dataset is partitioned into at least as many blocks as there are learners). The default is
None
.input_config – Arguments that describe the settings for reading the input. If input is “sample”, this will be environment configuration, e.g.
env_name
andenv_config
, etc. SeeEnvContext
for more info. If the input is “dataset”, this will be e.g.format
,path
.actions_in_input_normalized – True, if the actions in a given offline “input” are already normalized (between -1.0 and 1.0). This is usually the case when the offline file has been generated by another RLlib algorithm (e.g. PPO or SAC), while “normalize_actions” was set to True.
postprocess_inputs – Whether to run postprocess_trajectory() on the trajectory fragments from offline inputs. Note that postprocessing will be done using the current policy, not the behavior policy, which is typically undesirable for on-policy algorithms.
shuffle_buffer_size – If positive, input batches will be shuffled via a sliding window buffer of this number of batches. Use this if the input data is not in random enough order. Input is delayed until the shuffle buffer is filled.
output – Specify where experiences should be saved: - None: don’t save any experiences - “logdir” to save to the agent log dir - a path/URI to save to a custom output directory (e.g., “s3://bckt/”) - a function that returns a rllib.offline.OutputWriter
output_config – Arguments accessible from the IOContext for configuring custom output.
output_compress_columns – What sample batch columns to LZ4 compress in the output data. Note,
rllib.core.columns.Columns.OBS
will also compressrllib.core.columns.Columns.NEXT_OBS
.output_max_file_size – Max output file size (in bytes) before rolling over to a new file.
output_max_rows_per_file – Max output row numbers before rolling over to a new file.
output_write_method – Write method for the
ray.data.Dataset
to write the offline data tooutput
. The default isread_parquet
for Parquet files. See https://docs.ray.io/en/latest/data/api/input_output.html for more info about available read methods inray.data
.output_write_method_kwargs –
kwargs
for theoutput_write_method
. These will be passed into the write method without checking.output_filesystem – A cloud filesystem to handle access to cloud storage when writing experiences. Should be either “gcs” for Google Cloud Storage, “s3” for AWS S3 buckets, or “abs” for Azure Blob Storage.
output_filesystem_kwargs – A dictionary holding the kwargs for the filesystem given by
output_filesystem
. Seegcsfs.GCSFilesystem
for GCS,pyarrow.fs.S3FileSystem
, for S3, andablfs.AzureBlobFilesystem
for ABS filesystem arguments.offline_sampling – Whether sampling for the Algorithm happens via reading from offline data. If True, EnvRunners will NOT limit the number of collected batches within the same
sample()
call based on the number of sub-environments within the worker (no sub-environments present).
- Returns:
This updated AlgorithmConfig object.