Getting Started Guide
Contents

Getting Started Guide#
This guide gives a quick tour of Ray’s features.
Starting a local Ray cluster#
To get started, install, import, and initialize Ray. Most of the examples in this guide are based on Python, and some examples use Ray Core in Java.
Python
To use Ray in Python, install it with
pip install ray
Java
To use Ray in Java, first add the ray-api and ray-runtime dependencies in your project.
To build Ray from source or with Docker, see the detailed installation instructions.
Ray AI Runtime Quick Start#
To use Ray’s AI Runtime install Ray with the optional extra air
packages:
pip install "ray[air]"
Efficiently process your data into features.
Load data into a Dataset
.
import ray
# Load data.
dataset = ray.data.read_csv("s3://anonymous@air-example-data/breast_cancer.csv")
# Split data into train and validation.
train_dataset, valid_dataset = dataset.train_test_split(test_size=0.3)
# Create a test dataset by dropping the target column.
test_dataset = valid_dataset.drop_columns(cols=["target"])
Preprocess your data with a Preprocessor
.
# Create a preprocessor to scale some columns.
from ray.data.preprocessors import StandardScaler
preprocessor = StandardScaler(columns=["mean radius", "mean texture"])
Scale out model training.
This example will use XGBoost to train a Machine Learning model, so, install Ray’s wrapper library xgboost_ray
:
pip install xgboost_ray
Train a model with an XGBoostTrainer
.
from ray.air.config import ScalingConfig
from ray.train.xgboost import XGBoostTrainer
trainer = XGBoostTrainer(
scaling_config=ScalingConfig(
# Number of workers to use for data parallelism.
num_workers=2,
# Whether to use GPU acceleration.
use_gpu=False,
# Make sure to leave some CPUs free for Ray Data operations.
_max_cpu_fraction_per_node=0.9,
),
label_column="target",
num_boost_round=20,
params={
# XGBoost specific params
"objective": "binary:logistic",
# "tree_method": "gpu_hist", # uncomment this to use GPUs.
"eval_metric": ["logloss", "error"],
},
datasets={"train": train_dataset, "valid": valid_dataset},
preprocessor=preprocessor,
)
result = trainer.fit()
print(result.metrics)
Tune the hyperparameters to find the best model with Ray Tune.
Configure the parameters for tuning:
from ray import tune
param_space = {"params": {"max_depth": tune.randint(1, 9)}}
metric = "train-logloss"
Run hyperparameter tuning with Ray Tune to find the best model:
from ray.tune.tuner import Tuner, TuneConfig
tuner = Tuner(
trainer,
param_space=param_space,
tune_config=TuneConfig(num_samples=5, metric=metric, mode="min"),
)
result_grid = tuner.fit()
best_result = result_grid.get_best_result()
print("Best result:", best_result)
Use the trained model for Batch prediction
Use the trained model for batch prediction with a BatchPredictor
.
from ray.train.batch_predictor import BatchPredictor
from ray.train.xgboost import XGBoostPredictor
# You can also create a checkpoint from a trained model using
# `XGBoostCheckpoint.from_model`.
checkpoint = best_result.checkpoint
batch_predictor = BatchPredictor.from_checkpoint(checkpoint, XGBoostPredictor)
predicted_probabilities = batch_predictor.predict(test_dataset)
predicted_probabilities.show()
# {'predictions': 0.9970690608024597}
# {'predictions': 0.9943051934242249}
# {'predictions': 0.00334902573376894}
# ...
Ray Libraries Quick Start#
Ray has a rich ecosystem of libraries and frameworks built on top of it. Simply click on the dropdowns below to see examples of our most popular libraries.
Ray Core Quick Start#
Ray Core provides simple primitives for building and running distributed applications. Below you find examples that show you how to turn your functions and classes easily into Ray tasks and actors, for both Python and Java.
Ray Cluster Quick Start#
You can deploy your applications on Ray clusters, often with minimal code changes to your existing code. See an example of this below.
Debugging and Monitoring Quick Start#
You can use built-in observability tools to monitor and debug Ray applications and clusters.
Learn More#
Here are some talks, papers, and press coverage involving Ray and its libraries. Please raise an issue if any of the below links are broken, or if you’d like to add your own talk!
Blog and Press#
Talks (Videos)#
Unifying Large Scale Data Preprocessing and Machine Learning Pipelines with Ray Datasets | PyData 2021 (slides)
Programming at any Scale with Ray | SF Python Meetup Sept 2019
Ray: A Cluster Computing Engine for Reinforcement Learning Applications | Spark Summit
Enabling Composition in Distributed Reinforcement Learning | Spark Summit 2018
Slides#
Papers#
