Tuning XGBoost hyperparameters with Ray Tune
Contents
Tuning XGBoost hyperparameters with Ray Tune#
XGBoost is currently one of the most popular machine learning algorithms. It performs very well on a large selection of tasks, and was the key to success in many Kaggle competitions.

This tutorial will give you a quick introduction to XGBoost, show you how to train an XGBoost model, and then guide you on how to optimize XGBoost parameters using Tune to get the best performance. We tackle the following topics:
Contents
Note
To run this tutorial, you will need to install the following:
$ pip install xgboost
What is XGBoost#
XGBoost is an acronym for eXtreme Gradient Boosting. Internally, XGBoost uses decision trees. Instead of training just one large decision tree, XGBoost and other related algorithms train many small decision trees. The intuition behind this is that even though single decision trees can be inaccurate and suffer from high variance, combining the output of a large number of these weak learners can actually lead to strong learner, resulting in better predictions and less variance.
A single decision tree (left) might be able to get to an accuracy of 70% for a binary classification task. By combining the output of several small decision trees, an ensemble learner (right) might end up with a higher accuracy of 90%.#
Boosting algorithms start with a single small decision tree and evaluate how well it predicts the given examples. When building the next tree, those samples that have been misclassified before have a higher chance of being used to generate the tree. This is useful because it avoids overfitting to samples that can be easily classified and instead tries to come up with models that are able to classify hard examples, too. Please see here for a more thorough introduction to bagging and boosting algorithms.
There are many boosting algorithms. In their core, they are all very similar. XGBoost uses second-level derivatives to find splits that maximize the gain (the inverse of the loss) - hence the name. In practice, there really is no drawback in using XGBoost over other boosting algorithms - in fact, it usually shows the best performance.
Training a simple XGBoost classifier#
Let’s first see how a simple XGBoost classifier can be trained. We’ll use the
breast_cancer
-Dataset included in the sklearn
dataset collection. This is
a binary classification dataset. Given 30 different input features, our task is to
learn to identify subjects with breast cancer and those without.
Here is the full code to train a simple XGBoost model:
import sklearn.datasets
import sklearn.metrics
from sklearn.model_selection import train_test_split
import xgboost as xgb
def train_breast_cancer(config):
# Load dataset
data, labels = sklearn.datasets.load_breast_cancer(return_X_y=True)
# Split into train and test set
train_x, test_x, train_y, test_y = train_test_split(data, labels, test_size=0.25)
# Build input matrices for XGBoost
train_set = xgb.DMatrix(train_x, label=train_y)
test_set = xgb.DMatrix(test_x, label=test_y)
# Train the classifier
results = {}
bst = xgb.train(
config,
train_set,
evals=[(test_set, "eval")],
evals_result=results,
verbose_eval=False,
)
return results
if __name__ == "__main__":
results = train_breast_cancer(
{"objective": "binary:logistic", "eval_metric": ["logloss", "error"]}
)
accuracy = 1.0 - results["eval"]["error"][-1]
print(f"Accuracy: {accuracy:.4f}")
Accuracy: 0.9650
As you can see, the code is quite simple. First, the dataset is loaded and split
into a test
and train
set. The XGBoost model is trained with xgb.train()
.
XGBoost automatically evaluates metrics we specified on the test set. In our case
it calculates the logloss and the prediction error, which is the percentage of
misclassified examples. To calculate the accuracy, we just have to subtract the error
from 1.0
. Even in this simple example, most runs result
in a good accuracy of over 0.90
.
Maybe you have noticed the config
parameter we pass to the XGBoost algorithm. This
is a dict
in which you can specify parameters for the XGBoost algorithm. In this
simple example, the only parameters we passed are the objective
and eval_metric
parameters.
The value binary:logistic
tells XGBoost that we aim to train a logistic regression model for
a binary classification task. You can find an overview over all valid objectives
here in the XGBoost documentation.
XGBoost Hyperparameters#
Even with the default settings, XGBoost was able to get to a good accuracy on the breast cancer dataset. However, as in many machine learning algorithms, there are many knobs to tune which might lead to even better performance. Let’s explore some of them below.
Maximum tree depth#
Remember that XGBoost internally uses many decision tree models to come up with predictions. When training a decision tree, we need to tell the algorithm how large the tree may get. The parameter for this is called the tree depth.
In this image, the left tree has a depth of 2, and the right tree a depth of 3. Note that with each level, \(2^{(d-1)}\) splits are added, where d is the depth of the tree.#
Tree depth is a property that concerns the model complexity. If you only allow short
trees, the models are likely not very precise - they underfit the data. If you allow
very large trees, the single models are likely to overfit to the data. In practice,
a number between 2
and 6
is often a good starting point for this parameter.
XGBoost’s default value is 3
.
Minimum child weight#
When a decision tree creates new leaves, it splits up the remaining data at one node into two groups. If there are only few samples in one of these groups, it often doesn’t make sense to split it further. One of the reasons for this is that the model is harder to train when we have fewer samples.
In this example, we start with 100 examples. At the first node, they are split into 4 and 96 samples, respectively. In the next step, our model might find that it doesn’t make sense to split the 4 examples more. It thus only continues to add leaves on the right side.#
The parameter used by the model to decide if it makes sense to split a node is called the minimum child weight. In the case of linear regression, this is just the absolute number of nodes requried in each child. In other objectives, this value is determined using the weights of the examples, hence the name.
The larger the value, the more constrained the trees are and the less deep they will be.
This parameter thus also affects the model complexity. Values can range between 0
and infinity and are dependent on the sample size. For our ca. 500 examples in the
breast cancer dataset, values between 0
and 10
should be sensible.
XGBoost’s default value is 1
.
Subsample size#
Each decision tree we add is trained on a subsample of the total training dataset. The probabilities for the samples are weighted according to the XGBoost algorithm, but we can decide on which fraction of the samples we want to train each decision tree on.
Setting this value to 0.7
would mean that we randomly sample 70%
of the
training dataset before each training iteration.
XGBoost’s default value is 1
.
Learning rate / Eta#
Remember that XGBoost sequentially trains many decision trees, and that later trees are more likely trained on data that has been misclassified by prior trees. In effect this means that earlier trees make decisions for easy samples (i.e. those samples that can easily be classified) and later trees make decisions for harder samples. It is then sensible to assume that the later trees are less accurate than earlier trees.
To address this fact, XGBoost uses a parameter called Eta, which is sometimes called the learning rate. Don’t confuse this with learning rates from gradient descent! The original paper on stochastic gradient boosting introduces this parameter like so:
This is just a complicated way to say that when we train we new decision tree, represented by \(\gamma_{lm} \textbf{1}(x \in R_{lm})\), we want to dampen its effect on the previous prediction \(F_{m-1}(x)\) with a factor \(\eta\).
Typical values for this parameter are between 0.01
and 0.3`
.
XGBoost’s default value is 0.3
.
Number of boost rounds#
Lastly, we can decide on how many boosting rounds we perform, which means how many decision trees we ultimately train. When we do heavy subsampling or use small learning rate, it might make sense to increase the number of boosting rounds.
XGBoost’s default value is 10
.
Putting it together#
Let’s see how this looks like in code! We just need to adjust our config
dict:
if __name__ == "__main__":
config = {
"objective": "binary:logistic",
"eval_metric": ["logloss", "error"],
"max_depth": 2,
"min_child_weight": 0,
"subsample": 0.8,
"eta": 0.2,
}
results = train_breast_cancer(config)
accuracy = 1.0 - results["eval"]["error"][-1]
print(f"Accuracy: {accuracy:.4f}")
Accuracy: 0.9790
The rest stays the same. Please note that we do not adjust the num_boost_rounds
here.
The result should also show a high accuracy of over 90%.
Tuning the configuration parameters#
XGBoosts default parameters already lead to a good accuracy, and even our guesses in the last section should result in accuracies well above 90%. However, our guesses were just that: guesses. Often we do not know what combination of parameters would actually lead to the best results on a machine learning task.
Unfortunately, there are infinitely many combinations of hyperparameters we could try
out. Should we combine max_depth=3
with subsample=0.8
or with subsample=0.9
?
What about the other parameters?
This is where hyperparameter tuning comes into play. By using tuning libraries such as Ray Tune we can try out combinations of hyperparameters. Using sophisticated search strategies, these parameters can be selected so that they are likely to lead to good results (avoiding an expensive exhaustive search). Also, trials that do not perform well can be preemptively stopped to reduce waste of computing resources. Lastly, Ray Tune also takes care of training these runs in parallel, greatly increasing search speed.
Let’s start with a basic example on how to use Tune for this. We just need to make a few changes to our code-block:
import sklearn.datasets
import sklearn.metrics
from ray import train, tune
def train_breast_cancer(config):
# Load dataset
data, labels = sklearn.datasets.load_breast_cancer(return_X_y=True)
# Split into train and test set
train_x, test_x, train_y, test_y = train_test_split(data, labels, test_size=0.25)
# Build input matrices for XGBoost
train_set = xgb.DMatrix(train_x, label=train_y)
test_set = xgb.DMatrix(test_x, label=test_y)
# Train the classifier
results = {}
xgb.train(
config,
train_set,
evals=[(test_set, "eval")],
evals_result=results,
verbose_eval=False,
)
# Return prediction accuracy
accuracy = 1.0 - results["eval"]["error"][-1]
train.report({"mean_accuracy": accuracy, "done": True})
if __name__ == "__main__":
config = {
"objective": "binary:logistic",
"eval_metric": ["logloss", "error"],
"max_depth": tune.randint(1, 9),
"min_child_weight": tune.choice([1, 2, 3]),
"subsample": tune.uniform(0.5, 1.0),
"eta": tune.loguniform(1e-4, 1e-1),
}
tuner = tune.Tuner(
train_breast_cancer,
tune_config=tune.TuneConfig(
num_samples=10,
),
param_space=config,
)
results = tuner.fit()
2022-07-22 15:52:52,004 INFO services.py:1483 -- View the Ray dashboard at http://127.0.0.1:8268
2022-07-22 15:52:55,858 WARNING function_trainable.py:619 --
Current time: 2022-07-22 15:53:04 (running for 00:00:07.77)
Memory usage on this node: 10.5/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/4.57 GiB heap, 0.0/2.0 GiB objects
Result logdir: /Users/kai/ray_results/train_breast_cancer_2022-07-22_15-52-48
Number of trials: 10/10 (10 TERMINATED)
Trial name | status | loc | eta | max_depth | min_child_weight | subsample | acc | iter | total time (s) |
---|---|---|---|---|---|---|---|---|---|
train_breast_cancer_f8669_00000 | TERMINATED | 127.0.0.1:48852 | 0.0069356 | 5 | 3 | 0.823504 | 0.944056 | 1 | 0.0316169 |
train_breast_cancer_f8669_00001 | TERMINATED | 127.0.0.1:48857 | 0.00145619 | 6 | 3 | 0.832947 | 0.958042 | 1 | 0.0328588 |
train_breast_cancer_f8669_00002 | TERMINATED | 127.0.0.1:48858 | 0.00108208 | 7 | 3 | 0.987319 | 0.944056 | 1 | 0.0319381 |
train_breast_cancer_f8669_00003 | TERMINATED | 127.0.0.1:48859 | 0.00530429 | 8 | 2 | 0.615691 | 0.923077 | 1 | 0.028388 |
train_breast_cancer_f8669_00004 | TERMINATED | 127.0.0.1:48860 | 0.000721843 | 8 | 1 | 0.650973 | 0.958042 | 1 | 0.0299618 |
train_breast_cancer_f8669_00005 | TERMINATED | 127.0.0.1:48861 | 0.0074509 | 1 | 1 | 0.738341 | 0.874126 | 1 | 0.0193682 |
train_breast_cancer_f8669_00006 | TERMINATED | 127.0.0.1:48862 | 0.0879882 | 8 | 2 | 0.671576 | 0.944056 | 1 | 0.0267372 |
train_breast_cancer_f8669_00007 | TERMINATED | 127.0.0.1:48863 | 0.0765404 | 7 | 2 | 0.708157 | 0.965035 | 1 | 0.0276129 |
train_breast_cancer_f8669_00008 | TERMINATED | 127.0.0.1:48864 | 0.000627649 | 6 | 1 | 0.81121 | 0.951049 | 1 | 0.0310998 |
train_breast_cancer_f8669_00009 | TERMINATED | 127.0.0.1:48865 | 0.000383711 | 2 | 3 | 0.990579 | 0.93007 | 1 | 0.0274954 |
2022-07-22 15:52:57,385 INFO plugin_schema_manager.py:52 -- Loading the default runtime env schemas: ['/Users/kai/coding/ray/python/ray/_private/runtime_env/../../runtime_env/schemas/working_dir_schema.json', '/Users/kai/coding/ray/python/ray/_private/runtime_env/../../runtime_env/schemas/pip_schema.json'].
Result for train_breast_cancer_f8669_00000:
date: 2022-07-22_15-53-00
done: true
experiment_id: 07d10c5f31e74133b53272b7ccf9c528
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 1
mean_accuracy: 0.9440559440559441
node_ip: 127.0.0.1
pid: 48852
time_since_restore: 0.031616926193237305
time_this_iter_s: 0.031616926193237305
time_total_s: 0.031616926193237305
timestamp: 1658501580
timesteps_since_restore: 0
training_iteration: 1
trial_id: f8669_00000
warmup_time: 0.0027849674224853516
Result for train_breast_cancer_f8669_00009:
date: 2022-07-22_15-53-04
done: true
experiment_id: bc0d5dd2d079432b859faac8a18928f0
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 1
mean_accuracy: 0.9300699300699301
node_ip: 127.0.0.1
pid: 48865
time_since_restore: 0.027495384216308594
time_this_iter_s: 0.027495384216308594
time_total_s: 0.027495384216308594
timestamp: 1658501584
timesteps_since_restore: 0
training_iteration: 1
trial_id: f8669_00009
warmup_time: 0.005235910415649414
Result for train_breast_cancer_f8669_00001:
date: 2022-07-22_15-53-04
done: true
experiment_id: 4b10d350d4374a0d9e7d0c3b1d4e3203
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 1
mean_accuracy: 0.958041958041958
node_ip: 127.0.0.1
pid: 48857
time_since_restore: 0.032858848571777344
time_this_iter_s: 0.032858848571777344
time_total_s: 0.032858848571777344
timestamp: 1658501584
timesteps_since_restore: 0
training_iteration: 1
trial_id: f8669_00001
warmup_time: 0.004731178283691406
Result for train_breast_cancer_f8669_00008:
date: 2022-07-22_15-53-04
done: true
experiment_id: 91c25cbbeb6f409d93e1d6537cb8e1ee
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 1
mean_accuracy: 0.951048951048951
node_ip: 127.0.0.1
pid: 48864
time_since_restore: 0.031099796295166016
time_this_iter_s: 0.031099796295166016
time_total_s: 0.031099796295166016
timestamp: 1658501584
timesteps_since_restore: 0
training_iteration: 1
trial_id: f8669_00008
warmup_time: 0.003270864486694336
Result for train_breast_cancer_f8669_00005:
date: 2022-07-22_15-53-04
done: true
experiment_id: d225b0fb59e14da7adba952456ccf1d5
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 1
mean_accuracy: 0.8741258741258742
node_ip: 127.0.0.1
pid: 48861
time_since_restore: 0.01936817169189453
time_this_iter_s: 0.01936817169189453
time_total_s: 0.01936817169189453
timestamp: 1658501584
timesteps_since_restore: 0
training_iteration: 1
trial_id: f8669_00005
warmup_time: 0.003901958465576172
Result for train_breast_cancer_f8669_00004:
date: 2022-07-22_15-53-04
done: true
experiment_id: 322484af6ea5422f8aaf8ff6a91af4f7
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 1
mean_accuracy: 0.958041958041958
node_ip: 127.0.0.1
pid: 48860
time_since_restore: 0.029961824417114258
time_this_iter_s: 0.029961824417114258
time_total_s: 0.029961824417114258
timestamp: 1658501584
timesteps_since_restore: 0
training_iteration: 1
trial_id: f8669_00004
warmup_time: 0.003547191619873047
Result for train_breast_cancer_f8669_00002:
date: 2022-07-22_15-53-04
done: true
experiment_id: 3f588954160b42ce8ce200f68127ebcd
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 1
mean_accuracy: 0.9440559440559441
node_ip: 127.0.0.1
pid: 48858
time_since_restore: 0.03193807601928711
time_this_iter_s: 0.03193807601928711
time_total_s: 0.03193807601928711
timestamp: 1658501584
timesteps_since_restore: 0
training_iteration: 1
trial_id: f8669_00002
warmup_time: 0.003523111343383789
Result for train_breast_cancer_f8669_00003:
date: 2022-07-22_15-53-04
done: true
experiment_id: a39ea777ce2d4ebca51b3d7a4179dae5
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 1
mean_accuracy: 0.9230769230769231
node_ip: 127.0.0.1
pid: 48859
time_since_restore: 0.028388023376464844
time_this_iter_s: 0.028388023376464844
time_total_s: 0.028388023376464844
timestamp: 1658501584
timesteps_since_restore: 0
training_iteration: 1
trial_id: f8669_00003
warmup_time: 0.0035560131072998047
Result for train_breast_cancer_f8669_00006:
date: 2022-07-22_15-53-04
done: true
experiment_id: f97c6b9674854f8d89ec26ba58cc1618
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 1
mean_accuracy: 0.9440559440559441
node_ip: 127.0.0.1
pid: 48862
time_since_restore: 0.026737213134765625
time_this_iter_s: 0.026737213134765625
time_total_s: 0.026737213134765625
timestamp: 1658501584
timesteps_since_restore: 0
training_iteration: 1
trial_id: f8669_00006
warmup_time: 0.003425121307373047
Result for train_breast_cancer_f8669_00007:
date: 2022-07-22_15-53-04
done: true
experiment_id: ff172037065a4d55998ed72f51bdc5df
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 1
mean_accuracy: 0.965034965034965
node_ip: 127.0.0.1
pid: 48863
time_since_restore: 0.027612924575805664
time_this_iter_s: 0.027612924575805664
time_total_s: 0.027612924575805664
timestamp: 1658501584
timesteps_since_restore: 0
training_iteration: 1
trial_id: f8669_00007
warmup_time: 0.0031311511993408203
2022-07-22 15:53:04,846 INFO tune.py:738 -- Total run time: 8.99 seconds (7.74 seconds for the tuning loop).
As you can see, the changes in the actual training function are minimal. Instead of
returning the accuracy value, we report it back to Tune using session.report()
.
Our config
dictionary only changed slightly. Instead of passing hard-coded
parameters, we tell Tune to choose values from a range of valid options. There are
a number of options we have here, all of which are explained in
the Tune docs.
For a brief explanation, this is what they do:
tune.randint(min, max)
chooses a random integer value between min and max. Note that max is exclusive, so it will not be sampled.tune.choice([a, b, c])
chooses one of the items of the list at random. Each item has the same chance to be sampled.tune.uniform(min, max)
samples a floating point number between min and max. Note that max is exclusive here, too.tune.loguniform(min, max, base=10)
samples a floating point number between min and max, but applies a logarithmic transformation to these boundaries first. Thus, this makes it easy to sample values from different orders of magnitude.
The num_samples=10
option we pass to the TuneConfig()
means that we sample 10 different
hyperparameter configurations from this search space.
The output of our training run coud look like this:
Number of trials: 10/10 (10 TERMINATED)
+---------------------------------+------------+-------+-------------+-------------+--------------------+-------------+----------+--------+------------------+
| Trial name | status | loc | eta | max_depth | min_child_weight | subsample | acc | iter | total time (s) |
|---------------------------------+------------+-------+-------------+-------------+--------------------+-------------+----------+--------+------------------|
| train_breast_cancer_b63aa_00000 | TERMINATED | | 0.000117625 | 2 | 2 | 0.616347 | 0.916084 | 1 | 0.0306492 |
| train_breast_cancer_b63aa_00001 | TERMINATED | | 0.0382954 | 8 | 2 | 0.581549 | 0.937063 | 1 | 0.0357082 |
| train_breast_cancer_b63aa_00002 | TERMINATED | | 0.000217926 | 1 | 3 | 0.528428 | 0.874126 | 1 | 0.0264609 |
| train_breast_cancer_b63aa_00003 | TERMINATED | | 0.000120929 | 8 | 1 | 0.634508 | 0.958042 | 1 | 0.036406 |
| train_breast_cancer_b63aa_00004 | TERMINATED | | 0.00839715 | 5 | 1 | 0.730624 | 0.958042 | 1 | 0.0389378 |
| train_breast_cancer_b63aa_00005 | TERMINATED | | 0.000732948 | 8 | 2 | 0.915863 | 0.958042 | 1 | 0.0382841 |
| train_breast_cancer_b63aa_00006 | TERMINATED | | 0.000856226 | 4 | 1 | 0.645209 | 0.916084 | 1 | 0.0357089 |
| train_breast_cancer_b63aa_00007 | TERMINATED | | 0.00769908 | 7 | 1 | 0.729443 | 0.909091 | 1 | 0.0390737 |
| train_breast_cancer_b63aa_00008 | TERMINATED | | 0.00186339 | 5 | 3 | 0.595744 | 0.944056 | 1 | 0.0343912 |
| train_breast_cancer_b63aa_00009 | TERMINATED | | 0.000950272 | 3 | 2 | 0.835504 | 0.965035 | 1 | 0.0348201 |
+---------------------------------+------------+-------+-------------+-------------+--------------------+-------------+----------+--------+------------------+
The best configuration we found used eta=0.000950272
, max_depth=3
,
min_child_weight=2
, subsample=0.835504
and reached an accuracy of
0.965035
.
Early stopping#
Currently, Tune samples 10 different hyperparameter configurations and trains a full XGBoost on all of them. In our small example, training is very fast. However, if training takes longer, a significant amount of computer resources is spent on trials that will eventually show a bad performance, e.g. a low accuracy. It would be good if we could identify these trials early and stop them, so we don’t waste any resources.
This is where Tune’s Schedulers shine. A Tune TrialScheduler
is responsible
for starting and stopping trials. Tune implements a number of different schedulers, each
described in the Tune documentation.
For our example, we will use the AsyncHyperBandScheduler
or ASHAScheduler
.
The basic idea of this scheduler: We sample a number of hyperparameter configurations. Each of these configurations is trained for a specific number of iterations. After these iterations, only the best performing hyperparameters are retained. These are selected according to some loss metric, usually an evaluation loss. This cycle is repeated until we end up with the best configuration.
The ASHAScheduler
needs to know three things:
Which metric should be used to identify badly performing trials?
Should this metric be maximized or minimized?
How many iterations does each trial train for?
There are more parameters, which are explained in the documentation.
Lastly, we have to report the loss metric to Tune. We do this with a Callback
that
XGBoost accepts and calls after each evaluation round. Ray Tune comes
with two XGBoost callbacks
we can use for this. The TuneReportCallback
just reports the evaluation
metrics back to Tune. The TuneReportCheckpointCallback
also saves
checkpoints after each evaluation round. We will just use the latter in this
example so that we can retrieve the saved model later.
These parameters from the eval_metrics
configuration setting are then automatically
reported to Tune via the callback. Here, the raw error will be reported, not the accuracy.
To display the best reached accuracy, we will inverse it later.
We will also load the best checkpointed model so that we can use it for predictions.
The best model is selected with respect to the metric
and mode
parameters we
pass to the TunerConfig()
.
import sklearn.datasets
import sklearn.metrics
import os
from ray.tune.schedulers import ASHAScheduler
from sklearn.model_selection import train_test_split
import xgboost as xgb
from ray import train, tune
from ray.tune.integration.xgboost import TuneReportCheckpointCallback
def train_breast_cancer(config: dict):
# This is a simple training function to be passed into Tune
# Load dataset
data, labels = sklearn.datasets.load_breast_cancer(return_X_y=True)
# Split into train and test set
train_x, test_x, train_y, test_y = train_test_split(data, labels, test_size=0.25)
# Build input matrices for XGBoost
train_set = xgb.DMatrix(train_x, label=train_y)
test_set = xgb.DMatrix(test_x, label=test_y)
# Train the classifier, using the Tune callback
xgb.train(
config,
train_set,
evals=[(test_set, "eval")],
verbose_eval=False,
callbacks=[TuneReportCheckpointCallback(filename="model.xgb")],
)
def get_best_model_checkpoint(results):
best_bst = xgb.Booster()
best_result = results.get_best_result()
with best_result.checkpoint.as_directory() as best_checkpoint_dir:
best_bst.load_model(os.path.join(best_checkpoint_dir, "model.xgb"))
accuracy = 1.0 - best_result.metrics["eval-error"]
print(f"Best model parameters: {best_result.config}")
print(f"Best model total accuracy: {accuracy:.4f}")
return best_bst
def tune_xgboost(smoke_test=False):
search_space = {
# You can mix constants with search space objects.
"objective": "binary:logistic",
"eval_metric": ["logloss", "error"],
"max_depth": tune.randint(1, 9),
"min_child_weight": tune.choice([1, 2, 3]),
"subsample": tune.uniform(0.5, 1.0),
"eta": tune.loguniform(1e-4, 1e-1),
}
# This will enable aggressive early stopping of bad trials.
scheduler = ASHAScheduler(
max_t=10, grace_period=1, reduction_factor=2 # 10 training iterations
)
tuner = tune.Tuner(
train_breast_cancer,
tune_config=tune.TuneConfig(
metric="eval-logloss",
mode="min",
scheduler=scheduler,
num_samples=1 if smoke_test else 10,
),
param_space=search_space,
)
results = tuner.fit()
return results
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser()
parser.add_argument(
"--smoke-test", action="store_true", help="Finish quickly for testing"
)
args, _ = parser.parse_known_args()
results = tune_xgboost(smoke_test=args.smoke_test)
# Load the best model checkpoint.
best_bst = get_best_model_checkpoint(results)
# You could now do further predictions with
# best_bst.predict(...)
Current time: 2022-07-22 16:56:01 (running for 00:00:10.38)
Memory usage on this node: 10.3/16.0 GiB
Using AsyncHyperBand: num_stopped=10 Bracket: Iter 8.000: -0.5107275277792991 | Iter 4.000: -0.5876629346317344 | Iter 2.000: -0.6544494184997531 | Iter 1.000: -0.6859214191253369
Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/4.57 GiB heap, 0.0/2.0 GiB objects
Current best trial: c28a3_00003 with eval-logloss=0.38665050018083796 and parameters={'objective': 'binary:logistic', 'eval_metric': ['logloss', 'error'], 'max_depth': 2, 'min_child_weight': 3, 'subsample': 0.782626252548841, 'eta': 0.06385952388342125}
Result logdir: /Users/kai/ray_results/train_breast_cancer_2022-07-22_16-55-50
Number of trials: 10/10 (10 TERMINATED)
Trial name | status | loc | eta | max_depth | min_child_weight | subsample | iter | total time (s) | eval-logloss | eval-error |
---|---|---|---|---|---|---|---|---|---|---|
train_breast_cancer_c28a3_00000 | TERMINATED | 127.0.0.1:54416 | 0.0186954 | 2 | 2 | 0.516916 | 10 | 0.22218 | 0.571496 | 0.0629371 |
train_breast_cancer_c28a3_00001 | TERMINATED | 127.0.0.1:54440 | 0.0304404 | 8 | 2 | 0.745969 | 2 | 0.135674 | 0.650353 | 0.0629371 |
train_breast_cancer_c28a3_00002 | TERMINATED | 127.0.0.1:54441 | 0.0217157 | 8 | 3 | 0.764138 | 2 | 0.173076 | 0.658545 | 0.041958 |
train_breast_cancer_c28a3_00003 | TERMINATED | 127.0.0.1:54442 | 0.0638595 | 2 | 3 | 0.782626 | 10 | 0.281865 | 0.386651 | 0.041958 |
train_breast_cancer_c28a3_00004 | TERMINATED | 127.0.0.1:54443 | 0.00442794 | 7 | 2 | 0.792359 | 1 | 0.0270212 | 0.689577 | 0.0699301 |
train_breast_cancer_c28a3_00005 | TERMINATED | 127.0.0.1:54444 | 0.00222624 | 3 | 1 | 0.536331 | 1 | 0.0238512 | 0.691446 | 0.0839161 |
train_breast_cancer_c28a3_00006 | TERMINATED | 127.0.0.1:54445 | 0.000825129 | 1 | 1 | 0.82472 | 1 | 0.015312 | 0.692624 | 0.118881 |
train_breast_cancer_c28a3_00007 | TERMINATED | 127.0.0.1:54446 | 0.000770826 | 7 | 2 | 0.947268 | 1 | 0.0175898 | 0.692598 | 0.132867 |
train_breast_cancer_c28a3_00008 | TERMINATED | 127.0.0.1:54447 | 0.000429759 | 7 | 1 | 0.88524 | 1 | 0.0193739 | 0.692785 | 0.0559441 |
train_breast_cancer_c28a3_00009 | TERMINATED | 127.0.0.1:54448 | 0.0149863 | 2 | 1 | 0.722738 | 1 | 0.0165932 | 0.682266 | 0.111888 |
Result for train_breast_cancer_c28a3_00000:
date: 2022-07-22_16-55-55
done: false
eval-error: 0.08391608391608392
eval-logloss: 0.6790360066440556
experiment_id: 2a3189442db341519836a07fb2d65dd9
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 1
node_ip: 127.0.0.1
pid: 54416
time_since_restore: 0.01624011993408203
time_this_iter_s: 0.01624011993408203
time_total_s: 0.01624011993408203
timestamp: 1658505355
timesteps_since_restore: 0
training_iteration: 1
trial_id: c28a3_00000
warmup_time: 0.0035409927368164062
Result for train_breast_cancer_c28a3_00000:
date: 2022-07-22_16-55-56
done: true
eval-error: 0.06293706293706294
eval-logloss: 0.5714958122560194
experiment_id: 2a3189442db341519836a07fb2d65dd9
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 10
node_ip: 127.0.0.1
pid: 54416
time_since_restore: 0.22218012809753418
time_this_iter_s: 0.007044076919555664
time_total_s: 0.22218012809753418
timestamp: 1658505356
timesteps_since_restore: 0
training_iteration: 10
trial_id: c28a3_00000
warmup_time: 0.0035409927368164062
Result for train_breast_cancer_c28a3_00003:
date: 2022-07-22_16-56-01
done: false
eval-error: 0.08391608391608392
eval-logloss: 0.6472820101918041
experiment_id: 7ff6133237404b4ea4755b9f8cd114f2
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 1
node_ip: 127.0.0.1
pid: 54442
time_since_restore: 0.023206233978271484
time_this_iter_s: 0.023206233978271484
time_total_s: 0.023206233978271484
timestamp: 1658505361
timesteps_since_restore: 0
training_iteration: 1
trial_id: c28a3_00003
warmup_time: 0.006722211837768555
Result for train_breast_cancer_c28a3_00005:
date: 2022-07-22_16-56-01
done: true
eval-error: 0.08391608391608392
eval-logloss: 0.6914464114429234
experiment_id: 344762ab6d574b63a9374e19526d0510
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 1
node_ip: 127.0.0.1
pid: 54444
time_since_restore: 0.02385115623474121
time_this_iter_s: 0.02385115623474121
time_total_s: 0.02385115623474121
timestamp: 1658505361
timesteps_since_restore: 0
training_iteration: 1
trial_id: c28a3_00005
warmup_time: 0.008936882019042969
Result for train_breast_cancer_c28a3_00009:
date: 2022-07-22_16-56-01
done: true
eval-error: 0.11188811188811189
eval-logloss: 0.6822656309688008
experiment_id: 133901655fa64bf79f2dcc4e8e5e41b1
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 1
node_ip: 127.0.0.1
pid: 54448
time_since_restore: 0.016593217849731445
time_this_iter_s: 0.016593217849731445
time_total_s: 0.016593217849731445
timestamp: 1658505361
timesteps_since_restore: 0
training_iteration: 1
trial_id: c28a3_00009
warmup_time: 0.004940032958984375
Result for train_breast_cancer_c28a3_00007:
date: 2022-07-22_16-56-01
done: true
eval-error: 0.13286713286713286
eval-logloss: 0.6925980357023386
experiment_id: b4331027cbaf442ab905b2e51797dbbd
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 1
node_ip: 127.0.0.1
pid: 54446
time_since_restore: 0.017589807510375977
time_this_iter_s: 0.017589807510375977
time_total_s: 0.017589807510375977
timestamp: 1658505361
timesteps_since_restore: 0
training_iteration: 1
trial_id: c28a3_00007
warmup_time: 0.003782033920288086
Result for train_breast_cancer_c28a3_00006:
date: 2022-07-22_16-56-01
done: true
eval-error: 0.11888111888111888
eval-logloss: 0.6926244418104212
experiment_id: d3906de5943a4e05a4cc782382f67d24
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 1
node_ip: 127.0.0.1
pid: 54445
time_since_restore: 0.015311956405639648
time_this_iter_s: 0.015311956405639648
time_total_s: 0.015311956405639648
timestamp: 1658505361
timesteps_since_restore: 0
training_iteration: 1
trial_id: c28a3_00006
warmup_time: 0.005506038665771484
Result for train_breast_cancer_c28a3_00002:
date: 2022-07-22_16-56-01
done: false
eval-error: 0.04895104895104895
eval-logloss: 0.6752762102580571
experiment_id: a3645fc2d43145d88a1f5b7cc94df703
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 1
node_ip: 127.0.0.1
pid: 54441
time_since_restore: 0.027367830276489258
time_this_iter_s: 0.027367830276489258
time_total_s: 0.027367830276489258
timestamp: 1658505361
timesteps_since_restore: 0
training_iteration: 1
trial_id: c28a3_00002
warmup_time: 0.0062830448150634766
Result for train_breast_cancer_c28a3_00001:
date: 2022-07-22_16-56-01
done: false
eval-error: 0.07692307692307693
eval-logloss: 0.6698804135089154
experiment_id: 85766fe4d9fa482a91e396a8fd509a19
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 1
node_ip: 127.0.0.1
pid: 54440
time_since_restore: 0.017169952392578125
time_this_iter_s: 0.017169952392578125
time_total_s: 0.017169952392578125
timestamp: 1658505361
timesteps_since_restore: 0
training_iteration: 1
trial_id: c28a3_00001
warmup_time: 0.006204843521118164
Result for train_breast_cancer_c28a3_00008:
date: 2022-07-22_16-56-01
done: true
eval-error: 0.05594405594405594
eval-logloss: 0.692784742458717
experiment_id: 2c7d8bc38ad04536b1dec76819a2b3bf
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 1
node_ip: 127.0.0.1
pid: 54447
time_since_restore: 0.01937389373779297
time_this_iter_s: 0.01937389373779297
time_total_s: 0.01937389373779297
timestamp: 1658505361
timesteps_since_restore: 0
training_iteration: 1
trial_id: c28a3_00008
warmup_time: 0.004342079162597656
Result for train_breast_cancer_c28a3_00001:
date: 2022-07-22_16-56-01
done: true
eval-error: 0.06293706293706294
eval-logloss: 0.6503534216980834
experiment_id: 85766fe4d9fa482a91e396a8fd509a19
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 2
node_ip: 127.0.0.1
pid: 54440
time_since_restore: 0.13567376136779785
time_this_iter_s: 0.11850380897521973
time_total_s: 0.13567376136779785
timestamp: 1658505361
timesteps_since_restore: 0
training_iteration: 2
trial_id: c28a3_00001
warmup_time: 0.006204843521118164
Result for train_breast_cancer_c28a3_00004:
date: 2022-07-22_16-56-01
done: true
eval-error: 0.06993006993006994
eval-logloss: 0.689577207281873
experiment_id: ef4fdc645c444112985b4957ab8a84e9
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 1
node_ip: 127.0.0.1
pid: 54443
time_since_restore: 0.027021169662475586
time_this_iter_s: 0.027021169662475586
time_total_s: 0.027021169662475586
timestamp: 1658505361
timesteps_since_restore: 0
training_iteration: 1
trial_id: c28a3_00004
warmup_time: 0.0063669681549072266
Result for train_breast_cancer_c28a3_00002:
date: 2022-07-22_16-56-01
done: true
eval-error: 0.04195804195804196
eval-logloss: 0.658545415301423
experiment_id: a3645fc2d43145d88a1f5b7cc94df703
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 2
node_ip: 127.0.0.1
pid: 54441
time_since_restore: 0.17307591438293457
time_this_iter_s: 0.1457080841064453
time_total_s: 0.17307591438293457
timestamp: 1658505361
timesteps_since_restore: 0
training_iteration: 2
trial_id: c28a3_00002
warmup_time: 0.0062830448150634766
Result for train_breast_cancer_c28a3_00003:
date: 2022-07-22_16-56-01
done: true
eval-error: 0.04195804195804196
eval-logloss: 0.38665050018083796
experiment_id: 7ff6133237404b4ea4755b9f8cd114f2
hostname: Kais-MacBook-Pro.local
iterations_since_restore: 10
node_ip: 127.0.0.1
pid: 54442
time_since_restore: 0.28186488151550293
time_this_iter_s: 0.03063178062438965
time_total_s: 0.28186488151550293
timestamp: 1658505361
timesteps_since_restore: 0
training_iteration: 10
trial_id: c28a3_00003
warmup_time: 0.006722211837768555
2022-07-22 16:56:01,498 INFO tune.py:738 -- Total run time: 10.53 seconds (10.37 seconds for the tuning loop).
Best model parameters: {'objective': 'binary:logistic', 'eval_metric': ['logloss', 'error'], 'max_depth': 2, 'min_child_weight': 3, 'subsample': 0.782626252548841, 'eta': 0.06385952388342125}
Best model total accuracy: 0.9580
The output of our run could look like this:
Number of trials: 10/10 (10 TERMINATED)
+---------------------------------+------------+-------+-------------+-------------+--------------------+-------------+--------+------------------+----------------+--------------+
| Trial name | status | loc | eta | max_depth | min_child_weight | subsample | iter | total time (s) | eval-logloss | eval-error |
|---------------------------------+------------+-------+-------------+-------------+--------------------+-------------+--------+------------------+----------------+--------------|
| train_breast_cancer_ba275_00000 | TERMINATED | | 0.00205087 | 2 | 1 | 0.898391 | 10 | 0.380619 | 0.678039 | 0.090909 |
| train_breast_cancer_ba275_00001 | TERMINATED | | 0.000183834 | 4 | 3 | 0.924939 | 1 | 0.0228798 | 0.693009 | 0.111888 |
| train_breast_cancer_ba275_00002 | TERMINATED | | 0.0242721 | 7 | 2 | 0.501551 | 10 | 0.376154 | 0.54472 | 0.06993 |
| train_breast_cancer_ba275_00003 | TERMINATED | | 0.000449692 | 5 | 3 | 0.890212 | 1 | 0.0234981 | 0.692811 | 0.090909 |
| train_breast_cancer_ba275_00004 | TERMINATED | | 0.000376393 | 7 | 2 | 0.883609 | 1 | 0.0231569 | 0.692847 | 0.062937 |
| train_breast_cancer_ba275_00005 | TERMINATED | | 0.00231942 | 3 | 3 | 0.877464 | 2 | 0.104867 | 0.689541 | 0.083916 |
| train_breast_cancer_ba275_00006 | TERMINATED | | 0.000542326 | 1 | 2 | 0.578584 | 1 | 0.0213971 | 0.692765 | 0.083916 |
| train_breast_cancer_ba275_00007 | TERMINATED | | 0.0016801 | 1 | 2 | 0.975302 | 1 | 0.02226 | 0.691999 | 0.083916 |
| train_breast_cancer_ba275_00008 | TERMINATED | | 0.000595756 | 8 | 3 | 0.58429 | 1 | 0.0221152 | 0.692657 | 0.06993 |
| train_breast_cancer_ba275_00009 | TERMINATED | | 0.000357845 | 8 | 1 | 0.637776 | 1 | 0.022635 | 0.692859 | 0.090909 |
+---------------------------------+------------+-------+-------------+-------------+--------------------+-------------+--------+------------------+----------------+--------------+
Best model parameters: {'objective': 'binary:logistic', 'eval_metric': ['logloss', 'error'], 'max_depth': 7, 'min_child_weight': 2, 'subsample': 0.5015513240240503, 'eta': 0.024272050872920895}
Best model total accuracy: 0.9301
As you can see, most trials have been stopped only after a few iterations. Only the two most promising trials were run for the full 10 iterations.
You can also ensure that all available resources are being used as the scheduler
terminates trials, freeing them up. This can be done through the
ResourceChangingScheduler
. An example of this can be found here:
XGBoost Dynamic Resources Example.
Using fractional GPUs#
You can often accelerate your training by using GPUs in addition to CPUs. However, you usually don’t have as many GPUs as you have trials to run. For instance, if you run 10 Tune trials in parallel, you usually don’t have access to 10 separate GPUs.
Tune supports fractional GPUs. This means that each task is assigned a fraction of the GPU memory for training. For 10 tasks, this could look like this:
config = {
"objective": "binary:logistic",
"eval_metric": ["logloss", "error"],
"tree_method": "gpu_hist",
"max_depth": tune.randint(1, 9),
"min_child_weight": tune.choice([1, 2, 3]),
"subsample": tune.uniform(0.5, 1.0),
"eta": tune.loguniform(1e-4, 1e-1),
}
tuner = tune.Tuner(
tune.with_resources(train_breast_cancer, resources={"cpu": 1, "gpu": 0.1}),
tune_config=tune.TuneConfig(
num_samples=10,
),
param_space=config,
)
results = tuner.fit()
Each task thus works with 10% of the available GPU memory. You also have to tell
XGBoost to use the gpu_hist
tree method, so it knows it should use the GPU.
Conclusion#
You should now have a basic understanding on how to train XGBoost models and on how to tune the hyperparameters to yield the best results. In our simple example, Tuning the parameters didn’t make a huge difference for the accuracy. But in larger applications, intelligent hyperparameter tuning can make the difference between a model that doesn’t seem to learn at all, and a model that outperforms all the other ones.
More XGBoost Examples#
XGBoost Dynamic Resources Example: Trains a basic XGBoost model with Tune with the class-based API and a ResourceChangingScheduler, ensuring all resources are being used at all time.