Configuring Hyperparameter Tuning
Contents
Configuring Hyperparameter Tuning#
The Ray AIR Tuner
is the recommended way to tune hyperparameters in Ray AIR.
The Tuner
will take in a Trainer
and execute multiple training runs, each with different hyperparameter configurations.#
As part of Ray Tune, the Tuner
provides an interface that works with AIR Trainers to perform distributed
hyperparameter tuning. It provides a variety of state-of-the-art hyperparameter tuning algorithms for optimizing model
performance.
What follows next is basic coverage of what a Tuner is and how you can use it for basic examples. If you are interested in reading more, please take a look at the Ray Tune documentation.
Key Concepts#
There are a number of key concepts that dictate proper use of a Tuner:
A set of hyperparameters you want to tune in a
search space
.A
search algorithm
to effectively optimize your parameters and optionally use ascheduler
to stop searches early and speed up your experiments.The
search space
,search algorithm
,scheduler
, andTrainer
are passed to aTuner
, which runs the hyperparameter tuning workload by evaluating multiple hyperparameters in parallel.Each individual hyperparameter evaluation run is called a
trial
.The
Tuner
returns its results in aResultGrid
.
Note
Tuners can also be used to launch hyperparameter tuning without using Ray AIR Trainers. See the Ray Tune documentation for more guides and examples.
Basic usage#
Below, we demonstrate how you can use a Trainer object with a Tuner.
import ray
from ray import tune
from ray.tune import Tuner
from ray.train.xgboost import XGBoostTrainer
dataset = ray.data.read_csv("s3://[email protected]/breast_cancer.csv")
trainer = XGBoostTrainer(
label_column="target",
params={
"objective": "binary:logistic",
"eval_metric": ["logloss", "error"],
"max_depth": 4,
},
datasets={"train": dataset},
)
# Create Tuner
tuner = Tuner(
trainer,
# Add some parameters to tune
param_space={"params": {"max_depth": tune.choice([4, 5, 6])}},
# Specify tuning behavior
tune_config=tune.TuneConfig(metric="train-logloss", mode="min", num_samples=2),
)
# Run tuning job
tuner.fit()
How to configure a search space?#
A Tuner
takes in a param_space
argument where you can define the search space
from which hyperparameter configurations will be sampled.
Depending on the model and dataset, you may want to tune:
The training batch size
The learning rate for deep learning training (e.g., image classification)
The maximum depth for tree-based models (e.g., XGBoost)
The following shows some example code on how to specify the param_space
.
import ray
from ray import tune
from ray.tune import Tuner
from ray.train.xgboost import XGBoostTrainer
from ray.air.config import ScalingConfig, RunConfig
dataset = ray.data.read_csv("s3://[email protected]/breast_cancer.csv")
# Create an XGBoost trainer
trainer = XGBoostTrainer(
label_column="target",
params={
"objective": "binary:logistic",
"eval_metric": ["logloss", "error"],
"max_depth": 4,
},
num_boost_round=10,
datasets={"train": dataset},
)
param_space = {
# Tune parameters directly passed into the XGBoostTrainer
"num_boost_round": tune.randint(5, 20),
# `params` will be merged with the `params` defined in the above XGBoostTrainer
"params": {
"min_child_weight": tune.uniform(0.8, 1.0),
# Below will overwrite the XGBoostTrainer setting
"max_depth": tune.randint(1, 5),
},
# Tune the number of distributed workers
"scaling_config": ScalingConfig(num_workers=tune.grid_search([1, 2])),
}
tuner = Tuner(
trainable=trainer,
run_config=RunConfig(name="test_tuner"),
param_space=param_space,
tune_config=tune.TuneConfig(
mode="min", metric="train-logloss", num_samples=2, max_concurrent_trials=2
),
)
result_grid = tuner.fit()
from ray import tune
from ray.tune import Tuner
from ray.train.examples.pytorch.torch_linear_example import (
train_func as linear_train_func,
)
from ray.train.torch import TorchTrainer
trainer = TorchTrainer(
train_loop_per_worker=linear_train_func,
train_loop_config={"lr": 1e-2, "batch_size": 4, "epochs": 10},
scaling_config=ScalingConfig(num_workers=1, use_gpu=False),
)
param_space = {
# The params will be merged with the ones defined in the TorchTrainer
"train_loop_config": {
# This is a parameter that hasn't been set in the TorchTrainer
"hidden_size": tune.randint(1, 4),
# This will overwrite whatever was set when TorchTrainer was instantiated
"batch_size": tune.choice([4, 8]),
},
# Tune the number of distributed workers
"scaling_config": ScalingConfig(num_workers=tune.grid_search([1, 2])),
}
tuner = Tuner(
trainable=trainer,
run_config=RunConfig(name="test_tuner", local_dir="~/ray_results"),
param_space=param_space,
tune_config=tune.TuneConfig(
mode="min", metric="loss", num_samples=2, max_concurrent_trials=2
),
)
result_grid = tuner.fit()
Read more about Tune search spaces here.
You can use a Tuner to tune most arguments and configurations in Ray AIR, including but not limited to:
Ray Datasets
Preprocessors
Scaling configurations
and other hyperparameters.
There are a couple gotchas about parameter specification when using Tuners with Trainers:
By default, configuration dictionaries and config objects will be deep-merged.
Parameters that are duplicated in the Trainer and Tuner will be overwritten by the Tuner
param_space
.Exception: all arguments of the
RunConfig
andTuneConfig
are inherently un-tunable.
See Getting Data in and out of Tune for an example.
How to configure a Tuner?#
There are two main configuration objects that can be passed into a Tuner: the TuneConfig
and the RunConfig
.
The TuneConfig
contains tuning specific settings, including:
the tuning algorithm to use
the metric and mode to rank results
the amount of parallelism to use
Here are some common configurations for TuneConfig
:
from ray.tune import TuneConfig
from ray.tune.search.bayesopt import BayesOptSearch
tune_config = TuneConfig(
metric="loss",
mode="min",
max_concurrent_trials=10,
num_samples=100,
search_alg=BayesOptSearch(),
)
See the TuneConfig API reference
for more details.
The RunConfig
contains configurations that are more generic than tuning specific settings.
This may include:
failure/retry configurations
verbosity levels
the name of the experiment
the logging directory
checkpoint configurations
custom callbacks
integration with cloud storage
Below we showcase some common configurations of RunConfig
.
from ray import air, tune
from ray.air.config import RunConfig
run_config = RunConfig(
name="MyExperiment",
local_dir="./your_log_directory/",
verbose=2,
sync_config=tune.SyncConfig(upload_dir="s3://..."),
checkpoint_config=air.CheckpointConfig(checkpoint_frequency=2),
)
See the RunConfig API reference
for more details.
How to specify parallelism?#
You can specify parallelism via the TuneConfig
by setting the following flags:
num_samples
which specifies the number of trials to run in totalmax_concurrent_trials
which specifies the max number of trials to run concurrently
Note that actual parallelism can be less than max_concurrent_trials
and will be determined by how many trials
can fit in the cluster at once (i.e., if you have a trial that requires 16 GPUs, your cluster has 32 GPUs,
and max_concurrent_trials=10
, the Tuner
can only run 2 trials concurrently).
from ray.tune import TuneConfig
config = TuneConfig(
# ...
num_samples=100,
max_concurrent_trials=10,
)
Read more about this in A Guide To Parallelism and Resources for Ray Tune section.
How to specify an optimization algorithm?#
You can specify your hyperparameter optimization method via the TuneConfig
by setting the following flags:
search_alg
which provides an optimizer for selecting the optimal hyperparametersscheduler
which provides a scheduling/resource allocation algorithm for accelerating the search process
from ray.tune.search.bayesopt import BayesOptSearch
from ray.tune.schedulers import HyperBandScheduler
from ray.tune import TuneConfig
config = TuneConfig(
# ...
search_alg=BayesOptSearch(),
scheduler=HyperBandScheduler(),
)
Read more about this in the Search Algorithm and Scheduler section.
How to analyze results?#
Tuner.fit()
generates a ResultGrid object.
This object contains metrics, results, and checkpoints of each trial.
Below is a simple example:
from ray.tune import Tuner, TuneConfig
tuner = Tuner(
trainable=trainer,
param_space=param_space,
tune_config=TuneConfig(mode="min", metric="loss", num_samples=5),
)
result_grid = tuner.fit()
num_results = len(result_grid)
# Check if there have been errors
if result_grid.errors:
print("At least one trial failed.")
# Get the best result
best_result = result_grid.get_best_result()
# And the best checkpoint
best_checkpoint = best_result.checkpoint
# And the best metrics
best_metric = best_result.metrics
# Or a dataframe for further analysis
results_df = result_grid.get_dataframe()
print("Shortest training time:", results_df["time_total_s"].min())
# Iterate over results
for result in result_grid:
if result.error:
print("The trial had an error:", result.error)
continue
print("The trial finished successfully with the metrics:", result.metrics["loss"])
See Analyzing Tune Experiment Results for more usage examples.
Advanced Tuning#
Tuners also offer the ability to tune different data preprocessing steps, as shown in the following snippet.
from ray.data.preprocessors import StandardScaler
from ray.tune import Tuner
prep_v1 = StandardScaler(["worst radius", "worst area"])
prep_v2 = StandardScaler(["worst concavity", "worst smoothness"])
tuner = Tuner(
trainer,
param_space={
"preprocessor": tune.grid_search([prep_v1, prep_v2]),
# Your other parameters go here
},
)
Additionally, you can sample different train/validation datasets:
def get_dataset():
return ray.data.read_csv("s3://[email protected]/breast_cancer.csv")
def get_another_dataset():
# imagine this is a different dataset
return ray.data.read_csv("s3://[email protected]/breast_cancer.csv")
dataset_1 = get_dataset()
dataset_2 = get_another_dataset()
tuner = tune.Tuner(
trainer,
param_space={
"datasets": {
"train": tune.grid_search([dataset_1, dataset_2]),
}
# Your other parameters go here
},
)
Restoring and resuming#
A Tuner regularly saves its state, so that a tuning run can be resumed after being interrupted.
Additionally, if trials fail during a tuning run, they can be retried - either from scratch or from the latest available checkpoint.
To restore the Tuner state, pass the path to the experiment directory as an argument to Tuner.restore(...)
.
This path is obtained from the output of a tuning run, namely “Result logdir”.
However, if you specify a name
in the RunConfig
, it is located
under ~/ray_results/<name>
.
tuner = Tuner.restore("~/ray_results/test_tuner", restart_errored=True)
tuner.fit()
For more resume options, please see the documentation of
Tuner.restore()
.