ML workloads with Ray#

Explore the ways that Ray helps to build AI applications.

Batch inference on CPUs and GPUs#

Performing inference on incoming batches of data can be parallelized by exporting the architecture and weights of a trained model to the shared object store. Using these model replicas, Ray scales predictions on batches across workers.

../_images/batch_inference.png

Using Ray AIR’s BatchPredictor for batch inference.

See batch inference examples for use cases.

Model serving#

Ray Serve supports complex model deployment patterns requiring the orchestration of multiple Ray actors, where different actors provide inference for different models. Serve handles both batch and online inference and can scale to thousands of models in production.

../_images/multi_model_serve.png

Deployment patterns with Ray Serve. (Click image to enlarge.)

See model serving examples for use cases.

Parallel training of many models#

When any given model you want to train can fit on a single GPU, Ray can assign each training run to a separate Ray Task. In this way, all available workers are utilized to run independent remote training rather than one worker running jobs sequentially.

../_images/training_small_models.png

Data parallelism pattern for distributed training on large datasets.

See many model examples for use cases.

Distributed training of large models#

In contrast to training many models, model parallelism partitions a large model across many machines for training. Ray Train has built-in abstractions for distributing shards of models and running training in parallel.

../_images/model_parallelism.png

Model parallelism pattern for distributed large model training.

See distributed training examples for use cases.

Parallel hyperparameter tuning experiments#

Running multiple hyperparameter tuning experiments is a pattern apt for distributed computing because each experiment is independent of one another. Ray Tune handles the hard bit of distributing hyperparameter optimization and makes available key features such as checkpointing the best result, optimizing scheduling, and specifying search patterns.

../_images/tuning_use_case.png

Distributed tuning with distributed training per trial.

See hyperparameter tuning examples for use cases.

Reinforcement learning#

Ray RLlib offers support for production-level, distributed reinforcement learning workloads while maintaining unified and simple APIs for a large variety of industry applications.

../_images/rllib_use_case.png

Decentralized distributed proximal polixy optimiation (DD-PPO) architecture.

See reinforcement learning examples for use cases.

ML platform#

Merlin is Shopify’s ML platform built on Ray. It enables fast-iteration and scaling of distributed applications such as product categorization and recommendations.

../_images/shopify-workload.png

Shopify’s Merlin architecture built on Ray.

Spotify uses Ray for advanced applications that include personalizing content recommendations for home podcasts, and personalizing Spotify Radio track sequencing.

../_images/spotify.png

How Ray ecosystem empowers ML scientists and engineers at Spotify.

See ML platform examples for use cases.