ray.rllib.algorithms.algorithm_config.AlgorithmConfig.offline_data#

AlgorithmConfig.offline_data(*, input_=<ray.rllib.utils.from_config._NotProvided object>, input_config=<ray.rllib.utils.from_config._NotProvided object>, actions_in_input_normalized=<ray.rllib.utils.from_config._NotProvided object>, input_evaluation=<ray.rllib.utils.from_config._NotProvided object>, postprocess_inputs=<ray.rllib.utils.from_config._NotProvided object>, shuffle_buffer_size=<ray.rllib.utils.from_config._NotProvided object>, output=<ray.rllib.utils.from_config._NotProvided object>, output_config=<ray.rllib.utils.from_config._NotProvided object>, output_compress_columns=<ray.rllib.utils.from_config._NotProvided object>, output_max_file_size=<ray.rllib.utils.from_config._NotProvided object>, offline_sampling=<ray.rllib.utils.from_config._NotProvided object>) AlgorithmConfig[source]#

Sets the config’s offline data settings.

Parameters:
  • input – Specify how to generate experiences: - “sampler”: Generate experiences via online (env) simulation (default). - A local directory or file glob expression (e.g., “/tmp/.json”). - A list of individual file paths/URIs (e.g., [“/tmp/1.json”, “s3://bucket/2.json”]). - A dict with string keys and sampling probabilities as values (e.g., {“sampler”: 0.4, “/tmp/.json”: 0.4, “s3://bucket/expert.json”: 0.2}). - A callable that takes an IOContext object as only arg and returns a ray.rllib.offline.InputReader. - A string key that indexes a callable with tune.registry.register_input

  • input_config – Arguments that describe the settings for reading the input. If input is sample, this will be environment configuation, e.g. env_name and env_config, etc. See EnvContext for more info. If the input is dataset, this will be e.g. format, path.

  • actions_in_input_normalized – True, if the actions in a given offline “input” are already normalized (between -1.0 and 1.0). This is usually the case when the offline file has been generated by another RLlib algorithm (e.g. PPO or SAC), while “normalize_actions” was set to True.

  • postprocess_inputs – Whether to run postprocess_trajectory() on the trajectory fragments from offline inputs. Note that postprocessing will be done using the current policy, not the behavior policy, which is typically undesirable for on-policy algorithms.

  • shuffle_buffer_size – If positive, input batches will be shuffled via a sliding window buffer of this number of batches. Use this if the input data is not in random enough order. Input is delayed until the shuffle buffer is filled.

  • output – Specify where experiences should be saved: - None: don’t save any experiences - “logdir” to save to the agent log dir - a path/URI to save to a custom output directory (e.g., “s3://bckt/”) - a function that returns a rllib.offline.OutputWriter

  • output_config – Arguments accessible from the IOContext for configuring custom output.

  • output_compress_columns – What sample batch columns to LZ4 compress in the output data.

  • output_max_file_size – Max output file size (in bytes) before rolling over to a new file.

  • offline_sampling – Whether sampling for the Algorithm happens via reading from offline data. If True, EnvRunners will NOT limit the number of collected batches within the same sample() call based on the number of sub-environments within the worker (no sub-environments present).

Returns:

This updated AlgorithmConfig object.