Ray Use Cases
Ray Use Cases#
This page indexes common Ray use cases for scaling ML. It contains highlighted references to blogs, examples, and tutorials also located elsewhere in the Ray documentation.
You can filter our use cases by the framework you are using, the use case category, and the type of workload you are running. You can select (and deselect) multiple values:
Batch inference refers to generating model predictions over a set of input observations. The model could be a regression model, neural network, or simply a Python function. Ray can scale batch inference from single GPU machines to large clusters.
Many Model Training#
Many model training is common in ML use cases such as time series forecasting, which require fitting of models on multiple data batches corresponding to locations, products, etc. Here, the focus is on training many models on subsets of a dataset. This is in contrast to training a single model on the entire dataset.
How do I do many model training on Ray?#
There are three ways of using Ray to express this workload.
If you have a large amount of data, use Ray Data (Tutorial).
If you have a small amount of data (<10GB), want to integrate with tools, such as wandb and mlflow, and you have less than 20,000 models, use Ray Tune (Tutorial).
If your use case does not fit in any of the above categories, for example if you need to scale up to 1 million models, use Ray Core (Tutorial), which gives you finer-grained control over the application. However, note that this is for advanced users and will require understanding of Ray Core design patterns and anti-patterns.
Ray’s official serving solution is Ray Serve. Ray Serve is particularly well suited for model composition, enabling you to build a complex inference service consisting of multiple ML models and business logic all in Python code.
Ray’s Tune library enables any parallel Ray workload to be run under a hyperparameter tuning algorithm. Learn more about the Tune library with the following talks and user guides.
Ray’s Train library integrates many distributed training frameworks under a simple Trainer API, providing distributed orchestration and management capabilities out of the box. Learn more about the Train library with the following talks and user guides.
RLlib is an open-source library for reinforcement learning (RL), offering support for production-level, highly distributed RL workloads while maintaining unified and simple APIs for a large variety of industry applications. RLlib is used by industry leaders in many different verticals, such as climate control, industrial control, manufacturing and logistics, finance, gaming, automobile, robotics, boat design, and many others.
The following highlights feature companies leveraging Ray’s unified API to build simpler, more flexible ML platforms.
End-to-End ML Workflows#
The following are highlighted examples utilizing Ray AIR to implement end-to-end ML workflows.
Large Scale Workload Orchestration#
The following highlights feature projects leveraging Ray Core’s distributed APIs to simplify the orchestration of large scale workloads.