ray.air.util.check_ingest.DummyTrainer
ray.air.util.check_ingest.DummyTrainer#
- class ray.air.util.check_ingest.DummyTrainer(*args, **kwargs)[source]#
Bases:
ray.train.data_parallel_trainer.DataParallelTrainer
A Trainer that does nothing except read the data for a given number of epochs.
It prints out as much debugging statistics as possible.
This is useful for debugging data ingest problem. This trainer supports normal scaling options same as any other Trainer (e.g., num_workers, use_gpu).
- Parameters
scaling_config – Configuration for how to scale training. This is the same as for
BaseTrainer
.num_epochs – How many many times to iterate through the datasets for.
prefetch_blocks – The number of blocks to prefetch ahead of the current block during the scan. This is the same as
iter_batches()
time_preprocessing_separately – Whether to time the preprocessing separately from actual iteration during training. If set to True, preprocessing execution is fully executed before training begins and the preprocessing time is printed out. Defaults to False, which mimics the actual behavior of Trainers.
DeveloperAPI: This API may change across minor Ray releases.
- preprocess_datasets()[source]#
Called during fit() to preprocess dataset attributes with preprocessor.
Note
This method is run on a remote process.
This method is called prior to entering the training_loop.
If the
Trainer
has both a datasets dict and a preprocessor, the datasets dict contains a training dataset (denoted by the “train” key), and the preprocessor has not yet been fit, then it will be fit on the train dataset.Then, all Trainer’s datasets will be transformed by the preprocessor.
The transformed datasets will be set back in the
self.datasets
attribute of the Trainer to be used when overridingtraining_loop
.