ray.train.v2.api.data_parallel_trainer.DataParallelTrainer#

class ray.train.v2.api.data_parallel_trainer.DataParallelTrainer(train_loop_per_worker: Callable[[], None] | Callable[[Dict], None], *, train_loop_config: Dict | None = None, backend_config: BackendConfig | None = None, scaling_config: ScalingConfig | None = None, run_config: RunConfig | None = None, datasets: Dict[str, Dataset | Callable[[], Dataset]] | None = None, dataset_config: DataConfig | None = None, resume_from_checkpoint: Checkpoint | None = None, metadata: Dict[str, Any] | None = None)[source]#

Base class for distributed data parallel training on Ray.

This class supports the SPMD parallelization pattern, where a single training function is executed in parallel across multiple workers, and different shards of data are processed by each worker.

DeveloperAPI: This API may change across minor Ray releases.

Methods

can_restore

[Deprecated] Checks if a Train experiment can be restored from a previously interrupted/failed run.

fit

Launches the Ray Train controller to run training on workers.

restore

[Deprecated] Restores a Train experiment from a previously interrupted/failed run.