ray.train.v2.api.data_parallel_trainer.DataParallelTrainer.fit#
- DataParallelTrainer.fit() Result[source]#
Launches the Ray Train controller to run training on workers.
- Returns:
A Result object containing the training result.
- Raises:
ray.train.TrainingFailedError – This is a union of the ControllerError and WorkerGroupError. This returns a
ray.train.ControllerErrorif internal Ray Train controller logic encounters a non-retryable error or reaches the controller failure limit configured inFailureConfig. This returns aray.train.WorkerGroupErrorif one or more workers fail during training and reaches the worker group failure limit configured inFailureConfig(max_failures).