Ray Train: Scalable Model Training#
Ray Train is a scalable machine learning library for distributed training and fine-tuning.
Ray Train allows you to scale model training code from a single machine to a cluster of machines in the cloud, and abstracts away the complexities of distributed computing. Whether you have large models or large datasets, Ray Train is the simplest solution for distributed training.
Ray Train provides support for many frameworks:
PyTorch Ecosystem |
More Frameworks |
---|---|
PyTorch |
TensorFlow |
PyTorch Lightning |
Keras |
Hugging Face Transformers |
Horovod |
Hugging Face Accelerate |
XGBoost |
DeepSpeed |
LightGBM |
Install Ray Train#
To install Ray Train, run:
$ pip install -U "ray[train]"
To learn more about installing Ray and its libraries, see Installing Ray.
Get started#
Overview
Understand the key concepts for distributed training with Ray Train.
PyTorch
Get started on distributed model training with Ray Train and PyTorch.
PyTorch Lightning
Get started on distributed model training with Ray Train and Lightning.
Hugging Face Transformers
Get started on distributed model training with Ray Train and Transformers.
Learn more#
More Frameworks
Don’t see your framework? See these guides.
User Guides
Get how-to instructions for common training tasks with Ray Train.
Examples
Browse end-to-end code examples for different use cases.
API
Consult the API Reference for full descriptions of the Ray Train API.