ray.train.DataConfig.__init__#
- DataConfig.__init__(datasets_to_split: Literal['all'] | List[str] = 'all', execution_options: ExecutionOptions | Dict[str, ExecutionOptions] | None = None, enable_shard_locality: bool = True)[source]#
Construct a DataConfig.
- Parameters:
datasets_to_split – Specifies which datasets should be split among workers. Can be set to “all” or a list of dataset names. Defaults to “all”, i.e. split all datasets.
execution_options – The execution options to pass to Ray Data. Can be either: 1. A single ExecutionOptions object that is applied to all datasets. 2. A dict mapping dataset names to ExecutionOptions. If a dataset name is not in the dict, it defaults to
DataConfig.default_ingest_options(). By default, the options are optimized for data ingest. When overriding, base your options offDataConfig.default_ingest_options().enable_shard_locality – If true, dataset sharding across Train workers will consider locality to minimize cross-node data transfer. Enabled by default.