ray.train.DataConfig.__init__#

DataConfig.__init__(datasets_to_split: Literal['all'] | List[str] = 'all', execution_options: ExecutionOptions | Dict[str, ExecutionOptions] | None = None, enable_shard_locality: bool = True)[source]#

Construct a DataConfig.

Parameters:
  • datasets_to_split – Specifies which datasets should be split among workers. Can be set to “all” or a list of dataset names. Defaults to “all”, i.e. split all datasets.

  • execution_options – The execution options to pass to Ray Data. Can be either: 1. A single ExecutionOptions object that is applied to all datasets. 2. A dict mapping dataset names to ExecutionOptions. If a dataset name is not in the dict, it defaults to DataConfig.default_ingest_options(). By default, the options are optimized for data ingest. When overriding, base your options off DataConfig.default_ingest_options().

  • enable_shard_locality – If true, dataset sharding across Train workers will consider locality to minimize cross-node data transfer. Enabled by default.