ray.train.torch.prepare_data_loader(data_loader: torch.utils.data.DataLoader, add_dist_sampler: bool = True, move_to_device: bool = True, auto_transfer: bool = True) torch.utils.data.DataLoader[source]#

Prepares DataLoader for distributed execution.

This allows you to use the same exact code regardless of number of workers or the device type being used (CPU, GPU).


This method adds a DistributedSampler to the DataLoader if the number of training workers is greater than 1. If shuffling is enabled on the original DataLoader, then shuffle=True will also be passed into the DistributedSampler constructor. shuffle=False on the original DataLoader also means that shuffling is disabled on the sampler.

With more than 1 worker, calling the DistributedSampler.set_epoch method at the beginning of each epoch before creating the DataLoader iterator is necessary to make shuffling work properly across multiple epochs. Otherwise, the same ordering will be always used. See: https://pytorch.org/docs/stable/data.html#torch.utils.data.distributed.DistributedSampler # noqa: E501


  • data_loader (torch.utils.data.DataLoader) – The DataLoader to prepare.

  • add_dist_sampler – Whether to add a DistributedSampler to the provided DataLoader.

  • move_to_device – If set, automatically move the data returned by the data loader to the correct device.

  • auto_transfer – If set and device is GPU, another CUDA stream is created to automatically copy data from host (CPU) memory to device (GPU) memory (the default CUDA stream still runs the training procedure). If device is CPU, it will be disabled regardless of the setting. This configuration will be ignored if move_to_device is False.