ray.air.util.check_ingest.DummyTrainer#

class ray.air.util.check_ingest.DummyTrainer(*args, **kwargs)[source]#

Bases: ray.train.data_parallel_trainer.DataParallelTrainer

A Trainer that does nothing except read the data for a given number of epochs.

It prints out as much debugging statistics as possible.

This is useful for debugging data ingest problem. This trainer supports normal scaling options same as any other Trainer (e.g., num_workers, use_gpu).

Parameters
  • scaling_config – Configuration for how to scale training. This is the same as for BaseTrainer.

  • num_epochs – How many many times to iterate through the datasets for.

  • prefetch_blocks – The number of blocks to prefetch ahead of the current block during the scan. This is the same as iter_batches()

  • time_preprocessing_separately – Whether to time the preprocessing separately from actual iteration during training. If set to True, preprocessing execution is fully executed before training begins and the preprocessing time is printed out. Defaults to False, which mimics the actual behavior of Trainers.

DeveloperAPI: This API may change across minor Ray releases.

preprocess_datasets()[source]#

Called during fit() to preprocess dataset attributes with preprocessor.

Note

This method is run on a remote process.

This method is called prior to entering the training_loop.

If the Trainer has both a datasets dict and a preprocessor, the datasets dict contains a training dataset (denoted by the “train” key), and the preprocessor has not yet been fit, then it will be fit on the train dataset.

Then, all Trainer’s datasets will be transformed by the preprocessor.

The transformed datasets will be set back in the self.datasets attribute of the Trainer to be used when overriding training_loop.

static make_train_loop(num_epochs: int, prefetch_blocks: int, batch_size: Optional[int])[source]#

Make a debug train loop that runs for the given amount of epochs.