ray.train.v2.api.data_parallel_trainer.DataParallelTrainer.fit#

DataParallelTrainer.fit() Result[source]#

Launches the Ray Train controller to run training on workers.

Returns:

A Result object containing the training result.

Raises:

ray.train.TrainingFailedError – This is a union of the ControllerError and WorkerGroupError. This returns a ray.train.ControllerError if internal Ray Train controller logic encounters a non-retryable error or reaches the controller failure limit configured in FailureConfig. This returns a ray.train.WorkerGroupError if one or more workers fail during training and reaches the worker group failure limit configured in FailureConfig(max_failures).