Ray Data: Scalable Datasets for ML#
Ray Data is a scalable data processing library for ML workloads. It provides flexible and performant APIs for scaling Offline batch inference and Data preprocessing and ingest for ML training. Ray Data uses streaming execution to efficiently process large datasets.
Install Ray Data#
To install Ray Data, run:
$ pip install -U 'ray[data]'
To learn more about installing Ray and its libraries, see Installing Ray.
Learn more#
Ray Data Overview
Get an overview of Ray Data, the workloads that it supports, and how it compares to alternatives.
Quickstart
Understand the key concepts behind Ray Data. Learn what Datasets are and how they’re used.
User Guides
Learn how to use Ray Data, from basic usage to end-to-end guides.
Examples
Find both simple and scaling-out examples of using Ray Data.
API
Get more in-depth information about the Ray Data API.
Ray Blogs
Get the latest on engineering updates from the Ray team and how companies are using Ray Data.