ray.rllib.algorithms.algorithm_config.AlgorithmConfig.offline_data#

AlgorithmConfig.offline_data(*, input_: str | ~typing.Callable[[~ray.rllib.offline.io_context.IOContext], ~ray.rllib.offline.input_reader.InputReader] | None = <ray.rllib.utils.from_config._NotProvided object>, input_read_method: str | ~typing.Callable | None = <ray.rllib.utils.from_config._NotProvided object>, input_read_method_kwargs: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, input_read_schema: ~typing.Dict[str, str] | None = <ray.rllib.utils.from_config._NotProvided object>, input_read_episodes: bool | None = <ray.rllib.utils.from_config._NotProvided object>, input_compress_columns: ~typing.List[str] | None = <ray.rllib.utils.from_config._NotProvided object>, map_batches_kwargs: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, iter_batches_kwargs: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, prelearner_class: ~typing.Type | None = <ray.rllib.utils.from_config._NotProvided object>, prelearner_module_synch_period: int | None = <ray.rllib.utils.from_config._NotProvided object>, dataset_num_iters_per_learner: int | None = <ray.rllib.utils.from_config._NotProvided object>, input_config: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, actions_in_input_normalized: bool | None = <ray.rllib.utils.from_config._NotProvided object>, postprocess_inputs: bool | None = <ray.rllib.utils.from_config._NotProvided object>, shuffle_buffer_size: int | None = <ray.rllib.utils.from_config._NotProvided object>, output: str | None = <ray.rllib.utils.from_config._NotProvided object>, output_config: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, output_compress_columns: ~typing.List[str] | None = <ray.rllib.utils.from_config._NotProvided object>, output_max_file_size: float | None = <ray.rllib.utils.from_config._NotProvided object>, output_max_rows_per_file: int | None = <ray.rllib.utils.from_config._NotProvided object>, output_write_method: str | None = <ray.rllib.utils.from_config._NotProvided object>, output_write_method_kwargs: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, output_filesystem: str | None = <ray.rllib.utils.from_config._NotProvided object>, output_filesystem_kwargs: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, output_write_episodes: bool | None = <ray.rllib.utils.from_config._NotProvided object>, offline_sampling: str | None = <ray.rllib.utils.from_config._NotProvided object>) AlgorithmConfig[source]#

Sets the config’s offline data settings.

Parameters:
  • input – Specify how to generate experiences: - “sampler”: Generate experiences via online (env) simulation (default). - A local directory or file glob expression (e.g., “/tmp/.json”). - A list of individual file paths/URIs (e.g., [“/tmp/1.json”, “s3://bucket/2.json”]). - A dict with string keys and sampling probabilities as values (e.g., {“sampler”: 0.4, “/tmp/.json”: 0.4, “s3://bucket/expert.json”: 0.2}). - A callable that takes an IOContext object as only arg and returns a ray.rllib.offline.InputReader. - A string key that indexes a callable with tune.registry.register_input

  • input_read_method – Read method for the ray.data.Dataset to read in the offline data from input_. The default is read_parquet for Parquet files. See https://docs.ray.io/en/latest/data/api/input_output.html for more info about available read methods in ray.data.

  • input_read_method_kwargskwargs for the input_read_method. These will be passed into the read method without checking. If no arguments are passed in the default argument {'override_num_blocks': max(num_learners * 2, 2)} is used. Use these kwargs` together with the map_batches_kwargs and iter_batches_kwargs to tune the performance of the data pipeline.

  • input_read_schema – Table schema for converting offline data to episodes. This schema maps the offline data columns to ray.rllib.core.columns. Columns: {Columns.OBS: ‘o_t’, Columns.ACTIONS: ‘a_t’, …}. Columns in the data set that are not mapped via this schema are sorted into episodes’ extra_model_outputs. If no schema is passed in the default schema used is ray.rllib.offline.offline_data.SCHEMA. If your data set contains already the names in this schema, no input_read_schema is needed.

  • input_read_episodes – Whether offline data is already stored in RLlib’s EpisodeType format, i.e. ray.rllib.env.SingleAgentEpisode (multi -agent is planned but not supported, yet). Reading episodes directly avoids additional transform steps and is usually faster and therefore the recommended format when your application remains fully inside of RLlib’s schema. The other format is a columnar format and is agnostic to the RL framework used. Use the latter format, if you are unsure when to use the data or in which RL framework. The default is to read column data, i.e. False. See also output_write_episodes to define the output data format when recording.

  • input_compress_columns – What input columns are compressed with LZ4 in the input data. If data is stored in RLlib’s SingleAgentEpisode ( MultiAgentEpisode not supported, yet). Note, rllib.core.columns.Columns.OBS will also try to decompress rllib.core.columns.Columns.NEXT_OBS.

  • map_batches_kwargskwargs for the map_batches method. These will be passed into the ray.data.Dataset.map_batches method when sampling without checking. If no arguments passed in the default arguments { 'concurrency': max(2, num_learners), 'zero_copy_batch': True} is used. Use these kwargs` together with the input_read_method_kwargs and iter_batches_kwargs to tune the performance of the data pipeline.

  • iter_batches_kwargskwargs for the iter_batches method. These will be passed into the ray.data.Dataset.iter_batches method when sampling without checking. If no arguments are passed in, the default argument { 'prefetch_batches': 2, 'local_buffer_shuffle_size': train_batch_size_per_learner * 4} is used. Use these kwargs` together with the input_read_method_kwargs and map_batches_kwargs to tune the performance of the data pipeline.

  • prelearner_class – An optional OfflinePreLearner class that is used to transform data batches in ray.data.map_batches used in the OfflineData class to transform data from columns to batches that can be used in the Learner’s update methods. Override the OfflinePreLearner class and pass your dervied class in here, if you need to make some further transformations specific for your data or loss. The default is None which uses the base OfflinePreLearner defined in ray.rllib.offline.offline_prelearner.

  • prelearner_module_synch_period – The period (number of batches converted) after which the RLModule held by the PreLearner should sync weights. The PreLearner is used to preprocess batches for the learners. The higher this value the more off-policy the PreLearner’s module will be. Values too small will force the PreLearner to sync more frequently and thus might slow down the data pipeline. The default value chosen by the OfflinePreLearner is 10.

  • dataset_num_iters_per_learner – Number of iterations to run in each learner during a single training iteration. If None, each learner runs a complete epoch over its data block (the dataset is partitioned into as many blocks as there are learners). The default is None.

  • input_config – Arguments that describe the settings for reading the inpu t. If input is sample, this will be environment configuation, e.g. env_name and env_config, etc. See EnvContext for more info. If the input is dataset, this will be e.g. format, path.

  • actions_in_input_normalized – True, if the actions in a given offline “input” are already normalized (between -1.0 and 1.0). This is usually the case when the offline file has been generated by another RLlib algorithm (e.g. PPO or SAC), while “normalize_actions” was set to True.

  • postprocess_inputs – Whether to run postprocess_trajectory() on the trajectory fragments from offline inputs. Note that postprocessing will be done using the current policy, not the behavior policy, which is typically undesirable for on-policy algorithms.

  • shuffle_buffer_size – If positive, input batches will be shuffled via a sliding window buffer of this number of batches. Use this if the input data is not in random enough order. Input is delayed until the shuffle buffer is filled.

  • output – Specify where experiences should be saved: - None: don’t save any experiences - “logdir” to save to the agent log dir - a path/URI to save to a custom output directory (e.g., “s3://bckt/”) - a function that returns a rllib.offline.OutputWriter

  • output_config – Arguments accessible from the IOContext for configuring custom output.

  • output_compress_columns – What sample batch columns to LZ4 compress in the output data. Note, rllib.core.columns.Columns.OBS will also compress rllib.core.columns.Columns.NEXT_OBS.

  • output_max_file_size – Max output file size (in bytes) before rolling over to a new file.

  • output_max_rows_per_file – Max output row numbers before rolling over to a new file.

  • output_write_method – Write method for the ray.data.Dataset to write the offline data to output. The default is read_parquet for Parquet files. See https://docs.ray.io/en/latest/data/api/input_output.html for more info about available read methods in ray.data.

  • output_write_method_kwargskwargs for the output_write_method. These will be passed into the write method without checking.

  • output_filesystem – A cloud filesystem to handle access to cloud storage when writing experiences. Should be either gcs for Google Cloud Storage, s3 for AWS S3 buckets, or abs for Azure Blob Storage.

  • output_filesystem_kwargs – A dictionary holding the kwargs for the filesystem given by output_filesystem. See gcsfs.GCSFilesystem for GCS, pyarrow.fs.S3FileSystem, for S3, and ablfs.AzureBlobFilesystem for ABS filesystem arguments.

  • offline_sampling – Whether sampling for the Algorithm happens via reading from offline data. If True, EnvRunners will NOT limit the number of collected batches within the same sample() call based on the number of sub-environments within the worker (no sub-environments present).

Returns:

This updated AlgorithmConfig object.