ray.train.v2.api.data_parallel_trainer.DataParallelTrainer.fit#
- DataParallelTrainer.fit() Result [source]#
Launches the Ray Train controller to run training on workers.
- Returns:
A Result object containing the training result.
- Raises:
ray.train.TrainingFailedError – This is a union of the ControllerError and WorkerGroupError. This returns a
ray.train.ControllerError
if internal Ray Train controller logic encounters a non-retryable error or reaches the controller failure limit configured inFailureConfig
. This returns aray.train.WorkerGroupError
if one or more workers fail during training and reaches the worker group failure limit configured inFailureConfig(max_failures)
.