ray.train.FailureConfig#
- class ray.train.FailureConfig(max_failures: int = 0, fail_fast: bool | str = False)#
Configuration related to failure handling of each training/tuning run.
- Parameters:
max_failures – Tries to recover a run at least this many times. Will recover from the latest checkpoint if present. Setting to -1 will lead to infinite recovery retries. Setting to 0 will disable retries. Defaults to 0.
fail_fast – Whether to fail upon the first error. If fail_fast=’raise’ provided, the original error during training will be immediately raised. fail_fast=’raise’ can easily leak resources and should be used with caution.
Methods
Attributes