Using MLflow with Tune
Contents
Using MLflow with Tune#
Warning
If you are using these MLflow integrations with Ray Client: Interactive Development, it is recommended that you setup a remote Mlflow tracking server instead of one that is backed by the local filesystem.
MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry. It currently offers four components, including MLflow Tracking to record and query experiments, including code, data, config, and results.

Ray Tune currently offers two lightweight integrations for MLflow Tracking. One is the MLflowLoggerCallback, which automatically logs metrics reported to Tune to the MLflow Tracking API.
The other one is the @mlflow_mixin decorator, which can be
used with the function API. It automatically
initializes the MLflow API with Tune’s training information and creates a run for each Tune trial.
Then within your training function, you can just use the
MLflow like you would normally do, e.g. using mlflow.log_metrics()
or even mlflow.autolog()
to log to your training process.
Running an MLflow Example#
In the following example we’re going to use both of the above methods, namely the MLflowLoggerCallback
and
the mlflow_mixin
decorator to log metrics.
Let’s start with a few crucial imports:
import os
import tempfile
import time
import mlflow
from ray import air, tune
from ray.air import session
from ray.air.integrations.mlflow import MLflowLoggerCallback
from ray.tune.integration.mlflow import mlflow_mixin
Next, let’s define an easy objective function (a Tune Trainable
) that iteratively computes steps and evaluates
intermediate scores that we report to Tune.
def evaluation_fn(step, width, height):
return (0.1 + width * step / 100) ** (-1) + height * 0.1
def easy_objective(config):
width, height = config["width"], config["height"]
for step in range(config.get("steps", 100)):
# Iterative training function - can be any arbitrary training procedure
intermediate_score = evaluation_fn(step, width, height)
# Feed the score back to Tune.
session.report({"iterations": step, "mean_loss": intermediate_score})
time.sleep(0.1)
Given an MLFlow tracking URI, you can now simply use the MLflowLoggerCallback
as a callback
argument to
your RunConfig()
:
def tune_function(mlflow_tracking_uri, finish_fast=False):
tuner = tune.Tuner(
easy_objective,
tune_config=tune.TuneConfig(
num_samples=5
),
run_config=air.RunConfig(
name="mlflow",
callbacks=[
MLflowLoggerCallback(
tracking_uri=mlflow_tracking_uri,
experiment_name="example",
save_artifact=True,
)
],
),
param_space={
"width": tune.randint(10, 100),
"height": tune.randint(0, 100),
"steps": 5 if finish_fast else 100,
},
)
results = tuner.fit()
To use the mlflow_mixin
decorator, you can simply decorate the objective function from earlier.
Note that we also use mlflow.log_metrics(...)
to log metrics to MLflow.
Otherwise, the decorated version of our objective is identical to its original.
@mlflow_mixin
def decorated_easy_objective(config):
# Hyperparameters
width, height = config["width"], config["height"]
for step in range(config.get("steps", 100)):
# Iterative training function - can be any arbitrary training procedure
intermediate_score = evaluation_fn(step, width, height)
# Log the metrics to mlflow
mlflow.log_metrics(dict(mean_loss=intermediate_score), step=step)
# Feed the score back to Tune.
session.report({"iterations": step, "mean_loss": intermediate_score})
time.sleep(0.1)
With this new objective function ready, you can now create a Tune run with it as follows:
def tune_decorated(mlflow_tracking_uri, finish_fast=False):
# Set the experiment, or create a new one if does not exist yet.
mlflow.set_tracking_uri(mlflow_tracking_uri)
mlflow.set_experiment(experiment_name="mixin_example")
tuner = tune.Tuner(
decorated_easy_objective,
tune_config=tune.TuneConfig(
num_samples=5
),
run_config=air.RunConfig(
name="mlflow",
),
param_space={
"width": tune.randint(10, 100),
"height": tune.randint(0, 100),
"steps": 5 if finish_fast else 100,
"mlflow": {
"experiment_name": "mixin_example",
"tracking_uri": mlflow.get_tracking_uri(),
},
},
)
results = tuner.fit()
If you hapen to have an MLFlow tracking URI, you can set it below in the mlflow_tracking_uri
variable and set
smoke_test=False
.
Otherwise, you can just run a quick test of the tune_function
and tune_decorated
functions without using MLflow.
smoke_test = True
if smoke_test:
mlflow_tracking_uri = os.path.join(tempfile.gettempdir(), "mlruns")
else:
mlflow_tracking_uri = "<MLFLOW_TRACKING_URI>"
tune_function(mlflow_tracking_uri, finish_fast=smoke_test)
if not smoke_test:
df = mlflow.search_runs(
[mlflow.get_experiment_by_name("example").experiment_id]
)
print(df)
tune_decorated(mlflow_tracking_uri, finish_fast=smoke_test)
if not smoke_test:
df = mlflow.search_runs(
[mlflow.get_experiment_by_name("mixin_example").experiment_id]
)
print(df)
2022-07-22 16:27:41,371 INFO services.py:1483 -- View the Ray dashboard at http://127.0.0.1:8271
2022-07-22 16:27:43,768 WARNING function_trainable.py:619 -- Function checkpointing is disabled. This may result in unexpected behavior when using checkpointing features or certain schedulers. To enable, set the train function arguments to be `func(config, checkpoint_dir=None)`.
Current time: 2022-07-22 16:27:50 (running for 00:00:06.29)
Memory usage on this node: 10.1/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/5.63 GiB heap, 0.0/2.0 GiB objects
Result logdir: /Users/kai/ray_results/mlflow
Number of trials: 5/5 (5 TERMINATED)
Trial name | status | loc | height | width | loss | iter | total time (s) | iterations | neg_mean_loss |
---|---|---|---|---|---|---|---|---|---|
easy_objective_d4e29_00000 | TERMINATED | 127.0.0.1:52551 | 38 | 23 | 4.78039 | 5 | 0.549093 | 4 | -4.78039 |
easy_objective_d4e29_00001 | TERMINATED | 127.0.0.1:52561 | 86 | 88 | 8.87624 | 5 | 0.548692 | 4 | -8.87624 |
easy_objective_d4e29_00002 | TERMINATED | 127.0.0.1:52562 | 22 | 95 | 2.45641 | 5 | 0.587558 | 4 | -2.45641 |
easy_objective_d4e29_00003 | TERMINATED | 127.0.0.1:52563 | 11 | 81 | 1.3994 | 5 | 0.560393 | 4 | -1.3994 |
easy_objective_d4e29_00004 | TERMINATED | 127.0.0.1:52564 | 21 | 27 | 2.94746 | 5 | 0.534 | 4 | -2.94746 |
2022-07-22 16:27:44,945 INFO plugin_schema_manager.py:52 -- Loading the default runtime env schemas: ['/Users/kai/coding/ray/python/ray/_private/runtime_env/../../runtime_env/schemas/working_dir_schema.json', '/Users/kai/coding/ray/python/ray/_private/runtime_env/../../runtime_env/schemas/pip_schema.json'].
Result for easy_objective_d4e29_00000:
date: 2022-07-22_16-27-47
done: false
experiment_id: 421feb6ca1cb40969430bd0ab995fe37
hostname: Kais-MacBook-Pro.local
iterations: 0
iterations_since_restore: 1
mean_loss: 13.8
neg_mean_loss: -13.8
node_ip: 127.0.0.1
pid: 52551
time_since_restore: 0.00015282630920410156
time_this_iter_s: 0.00015282630920410156
time_total_s: 0.00015282630920410156
timestamp: 1658503667
timesteps_since_restore: 0
training_iteration: 1
trial_id: d4e29_00000
warmup_time: 0.0036253929138183594
Result for easy_objective_d4e29_00000:
date: 2022-07-22_16-27-48
done: true
experiment_id: 421feb6ca1cb40969430bd0ab995fe37
experiment_tag: 0_height=38,width=23
hostname: Kais-MacBook-Pro.local
iterations: 4
iterations_since_restore: 5
mean_loss: 4.780392156862745
neg_mean_loss: -4.780392156862745
node_ip: 127.0.0.1
pid: 52551
time_since_restore: 0.5490927696228027
time_this_iter_s: 0.12111282348632812
time_total_s: 0.5490927696228027
timestamp: 1658503668
timesteps_since_restore: 0
training_iteration: 5
trial_id: d4e29_00000
warmup_time: 0.0036253929138183594
Result for easy_objective_d4e29_00001:
date: 2022-07-22_16-27-50
done: false
experiment_id: 40ac54d80e854437b4126dca98a7f995
hostname: Kais-MacBook-Pro.local
iterations: 0
iterations_since_restore: 1
mean_loss: 18.6
neg_mean_loss: -18.6
node_ip: 127.0.0.1
pid: 52561
time_since_restore: 0.00013113021850585938
time_this_iter_s: 0.00013113021850585938
time_total_s: 0.00013113021850585938
timestamp: 1658503670
timesteps_since_restore: 0
training_iteration: 1
trial_id: d4e29_00001
warmup_time: 0.002991914749145508
Result for easy_objective_d4e29_00002:
date: 2022-07-22_16-27-50
done: false
experiment_id: 23f2d0c4631e4a2abb5449ba68f80e8b
hostname: Kais-MacBook-Pro.local
iterations: 0
iterations_since_restore: 1
mean_loss: 12.2
neg_mean_loss: -12.2
node_ip: 127.0.0.1
pid: 52562
time_since_restore: 0.0001289844512939453
time_this_iter_s: 0.0001289844512939453
time_total_s: 0.0001289844512939453
timestamp: 1658503670
timesteps_since_restore: 0
training_iteration: 1
trial_id: d4e29_00002
warmup_time: 0.002949953079223633
Result for easy_objective_d4e29_00003:
date: 2022-07-22_16-27-50
done: false
experiment_id: 7cb23325d6044f0f995b338d2e15f31e
hostname: Kais-MacBook-Pro.local
iterations: 0
iterations_since_restore: 1
mean_loss: 11.1
neg_mean_loss: -11.1
node_ip: 127.0.0.1
pid: 52563
time_since_restore: 0.00010609626770019531
time_this_iter_s: 0.00010609626770019531
time_total_s: 0.00010609626770019531
timestamp: 1658503670
timesteps_since_restore: 0
training_iteration: 1
trial_id: d4e29_00003
warmup_time: 0.0026869773864746094
Result for easy_objective_d4e29_00004:
date: 2022-07-22_16-27-50
done: false
experiment_id: fc3b1add717842f4ae0b4882a1292f93
hostname: Kais-MacBook-Pro.local
iterations: 0
iterations_since_restore: 1
mean_loss: 12.1
neg_mean_loss: -12.1
node_ip: 127.0.0.1
pid: 52564
time_since_restore: 0.00011801719665527344
time_this_iter_s: 0.00011801719665527344
time_total_s: 0.00011801719665527344
timestamp: 1658503670
timesteps_since_restore: 0
training_iteration: 1
trial_id: d4e29_00004
warmup_time: 0.0028209686279296875
Result for easy_objective_d4e29_00001:
date: 2022-07-22_16-27-50
done: true
experiment_id: 40ac54d80e854437b4126dca98a7f995
experiment_tag: 1_height=86,width=88
hostname: Kais-MacBook-Pro.local
iterations: 4
iterations_since_restore: 5
mean_loss: 8.876243093922652
neg_mean_loss: -8.876243093922652
node_ip: 127.0.0.1
pid: 52561
time_since_restore: 0.548691987991333
time_this_iter_s: 0.12308692932128906
time_total_s: 0.548691987991333
timestamp: 1658503670
timesteps_since_restore: 0
training_iteration: 5
trial_id: d4e29_00001
warmup_time: 0.002991914749145508
Result for easy_objective_d4e29_00004:
date: 2022-07-22_16-27-50
done: true
experiment_id: fc3b1add717842f4ae0b4882a1292f93
experiment_tag: 4_height=21,width=27
hostname: Kais-MacBook-Pro.local
iterations: 4
iterations_since_restore: 5
mean_loss: 2.947457627118644
neg_mean_loss: -2.947457627118644
node_ip: 127.0.0.1
pid: 52564
time_since_restore: 0.5339996814727783
time_this_iter_s: 0.12359499931335449
time_total_s: 0.5339996814727783
timestamp: 1658503670
timesteps_since_restore: 0
training_iteration: 5
trial_id: d4e29_00004
warmup_time: 0.0028209686279296875
Result for easy_objective_d4e29_00003:
date: 2022-07-22_16-27-50
done: true
experiment_id: 7cb23325d6044f0f995b338d2e15f31e
experiment_tag: 3_height=11,width=81
hostname: Kais-MacBook-Pro.local
iterations: 4
iterations_since_restore: 5
mean_loss: 1.3994011976047904
neg_mean_loss: -1.3994011976047904
node_ip: 127.0.0.1
pid: 52563
time_since_restore: 0.5603930950164795
time_this_iter_s: 0.12318706512451172
time_total_s: 0.5603930950164795
timestamp: 1658503670
timesteps_since_restore: 0
training_iteration: 5
trial_id: d4e29_00003
warmup_time: 0.0026869773864746094
Result for easy_objective_d4e29_00002:
date: 2022-07-22_16-27-50
done: true
experiment_id: 23f2d0c4631e4a2abb5449ba68f80e8b
experiment_tag: 2_height=22,width=95
hostname: Kais-MacBook-Pro.local
iterations: 4
iterations_since_restore: 5
mean_loss: 2.4564102564102566
neg_mean_loss: -2.4564102564102566
node_ip: 127.0.0.1
pid: 52562
time_since_restore: 0.5875582695007324
time_this_iter_s: 0.12340712547302246
time_total_s: 0.5875582695007324
timestamp: 1658503670
timesteps_since_restore: 0
training_iteration: 5
trial_id: d4e29_00002
warmup_time: 0.002949953079223633
2022-07-22 16:27:51,033 INFO tune.py:738 -- Total run time: 7.27 seconds (6.28 seconds for the tuning loop).
2022/07/22 16:27:51 INFO mlflow.tracking.fluent: Experiment with name 'mixin_example' does not exist. Creating a new experiment.
Current time: 2022-07-22 16:27:58 (running for 00:00:07.03)
Memory usage on this node: 10.4/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/5.63 GiB heap, 0.0/2.0 GiB objects
Result logdir: /Users/kai/ray_results/mlflow
Number of trials: 5/5 (5 TERMINATED)
Trial name | status | loc | height | width | loss | iter | total time (s) | iterations | neg_mean_loss |
---|---|---|---|---|---|---|---|---|---|
decorated_easy_objective_d93b6_00000 | TERMINATED | 127.0.0.1:52581 | 45 | 51 | 4.96729 | 5 | 0.460993 | 4 | -4.96729 |
decorated_easy_objective_d93b6_00001 | TERMINATED | 127.0.0.1:52598 | 44 | 94 | 4.65907 | 5 | 0.434945 | 4 | -4.65907 |
decorated_easy_objective_d93b6_00002 | TERMINATED | 127.0.0.1:52599 | 93 | 25 | 10.2091 | 5 | 0.471808 | 4 | -10.2091 |
decorated_easy_objective_d93b6_00003 | TERMINATED | 127.0.0.1:52600 | 40 | 26 | 4.87719 | 5 | 0.437302 | 4 | -4.87719 |
decorated_easy_objective_d93b6_00004 | TERMINATED | 127.0.0.1:52601 | 16 | 65 | 1.97037 | 5 | 0.468027 | 4 | -1.97037 |
Result for decorated_easy_objective_d93b6_00000:
date: 2022-07-22_16-27-54
done: false
experiment_id: 2d0d9fbc13c64acfa27153a5fb9aeb68
hostname: Kais-MacBook-Pro.local
iterations: 0
iterations_since_restore: 1
mean_loss: 14.5
neg_mean_loss: -14.5
node_ip: 127.0.0.1
pid: 52581
time_since_restore: 0.001725912094116211
time_this_iter_s: 0.001725912094116211
time_total_s: 0.001725912094116211
timestamp: 1658503674
timesteps_since_restore: 0
training_iteration: 1
trial_id: d93b6_00000
warmup_time: 0.20471811294555664
Result for decorated_easy_objective_d93b6_00000:
date: 2022-07-22_16-27-54
done: true
experiment_id: 2d0d9fbc13c64acfa27153a5fb9aeb68
experiment_tag: 0_height=45,width=51
hostname: Kais-MacBook-Pro.local
iterations: 4
iterations_since_restore: 5
mean_loss: 4.9672897196261685
neg_mean_loss: -4.9672897196261685
node_ip: 127.0.0.1
pid: 52581
time_since_restore: 0.46099305152893066
time_this_iter_s: 0.10984206199645996
time_total_s: 0.46099305152893066
timestamp: 1658503674
timesteps_since_restore: 0
training_iteration: 5
trial_id: d93b6_00000
warmup_time: 0.20471811294555664
Result for decorated_easy_objective_d93b6_00001:
date: 2022-07-22_16-27-57
done: false
experiment_id: 4bec5377a38a47d7bae57f7502ff0312
hostname: Kais-MacBook-Pro.local
iterations: 0
iterations_since_restore: 1
mean_loss: 14.4
neg_mean_loss: -14.4
node_ip: 127.0.0.1
pid: 52598
time_since_restore: 0.0016498565673828125
time_this_iter_s: 0.0016498565673828125
time_total_s: 0.0016498565673828125
timestamp: 1658503677
timesteps_since_restore: 0
training_iteration: 1
trial_id: d93b6_00001
warmup_time: 0.18288898468017578
Result for decorated_easy_objective_d93b6_00003:
date: 2022-07-22_16-27-57
done: false
experiment_id: 6868d31636df4c4a8e9ed91927120269
hostname: Kais-MacBook-Pro.local
iterations: 0
iterations_since_restore: 1
mean_loss: 14.0
neg_mean_loss: -14.0
node_ip: 127.0.0.1
pid: 52600
time_since_restore: 0.0016481876373291016
time_this_iter_s: 0.0016481876373291016
time_total_s: 0.0016481876373291016
timestamp: 1658503677
timesteps_since_restore: 0
training_iteration: 1
trial_id: d93b6_00003
warmup_time: 0.17208290100097656
Result for decorated_easy_objective_d93b6_00004:
date: 2022-07-22_16-27-57
done: false
experiment_id: f021ddc2dc164413931c17cb593dfa12
hostname: Kais-MacBook-Pro.local
iterations: 0
iterations_since_restore: 1
mean_loss: 11.6
neg_mean_loss: -11.6
node_ip: 127.0.0.1
pid: 52601
time_since_restore: 0.0015459060668945312
time_this_iter_s: 0.0015459060668945312
time_total_s: 0.0015459060668945312
timestamp: 1658503677
timesteps_since_restore: 0
training_iteration: 1
trial_id: d93b6_00004
warmup_time: 0.1808018684387207
Result for decorated_easy_objective_d93b6_00002:
date: 2022-07-22_16-27-57
done: false
experiment_id: a341941781824ea9b1a072b587e42a84
hostname: Kais-MacBook-Pro.local
iterations: 0
iterations_since_restore: 1
mean_loss: 19.3
neg_mean_loss: -19.3
node_ip: 127.0.0.1
pid: 52599
time_since_restore: 0.0015799999237060547
time_this_iter_s: 0.0015799999237060547
time_total_s: 0.0015799999237060547
timestamp: 1658503677
timesteps_since_restore: 0
training_iteration: 1
trial_id: d93b6_00002
warmup_time: 0.1837329864501953
Result for decorated_easy_objective_d93b6_00001:
date: 2022-07-22_16-27-57
done: true
experiment_id: 4bec5377a38a47d7bae57f7502ff0312
experiment_tag: 1_height=44,width=94
hostname: Kais-MacBook-Pro.local
iterations: 4
iterations_since_restore: 5
mean_loss: 4.659067357512954
neg_mean_loss: -4.659067357512954
node_ip: 127.0.0.1
pid: 52598
time_since_restore: 0.43494510650634766
time_this_iter_s: 0.10719513893127441
time_total_s: 0.43494510650634766
timestamp: 1658503677
timesteps_since_restore: 0
training_iteration: 5
trial_id: d93b6_00001
warmup_time: 0.18288898468017578
Result for decorated_easy_objective_d93b6_00003:
date: 2022-07-22_16-27-57
done: true
experiment_id: 6868d31636df4c4a8e9ed91927120269
experiment_tag: 3_height=40,width=26
hostname: Kais-MacBook-Pro.local
iterations: 4
iterations_since_restore: 5
mean_loss: 4.87719298245614
neg_mean_loss: -4.87719298245614
node_ip: 127.0.0.1
pid: 52600
time_since_restore: 0.4373021125793457
time_this_iter_s: 0.10880899429321289
time_total_s: 0.4373021125793457
timestamp: 1658503677
timesteps_since_restore: 0
training_iteration: 5
trial_id: d93b6_00003
warmup_time: 0.17208290100097656
Result for decorated_easy_objective_d93b6_00004:
date: 2022-07-22_16-27-57
done: true
experiment_id: f021ddc2dc164413931c17cb593dfa12
experiment_tag: 4_height=16,width=65
hostname: Kais-MacBook-Pro.local
iterations: 4
iterations_since_restore: 5
mean_loss: 1.9703703703703703
neg_mean_loss: -1.9703703703703703
node_ip: 127.0.0.1
pid: 52601
time_since_restore: 0.46802687644958496
time_this_iter_s: 0.1077277660369873
time_total_s: 0.46802687644958496
timestamp: 1658503677
timesteps_since_restore: 0
training_iteration: 5
trial_id: d93b6_00004
warmup_time: 0.1808018684387207
Result for decorated_easy_objective_d93b6_00002:
date: 2022-07-22_16-27-57
done: true
experiment_id: a341941781824ea9b1a072b587e42a84
experiment_tag: 2_height=93,width=25
hostname: Kais-MacBook-Pro.local
iterations: 4
iterations_since_restore: 5
mean_loss: 10.209090909090909
neg_mean_loss: -10.209090909090909
node_ip: 127.0.0.1
pid: 52599
time_since_restore: 0.47180795669555664
time_this_iter_s: 0.10791492462158203
time_total_s: 0.47180795669555664
timestamp: 1658503677
timesteps_since_restore: 0
training_iteration: 5
trial_id: d93b6_00002
warmup_time: 0.1837329864501953
2022-07-22 16:27:58,211 INFO tune.py:738 -- Total run time: 7.15 seconds (7.01 seconds for the tuning loop).
This completes our Tune and MLflow walk-through. In the following sections you can find more details on the API of the Tune-MLflow integration.
MLflow AutoLogging#
You can also check out here for an example on how you can leverage MLflow auto-logging, in this case with Pytorch Lightning
MLflow Logger API#
- class ray.air.integrations.mlflow.MLflowLoggerCallback(tracking_uri: Optional[str] = None, registry_uri: Optional[str] = None, experiment_name: Optional[str] = None, tags: Optional[Dict] = None, save_artifact: bool = False)[source]
MLflow Logger to automatically log Tune results and config to MLflow.
MLflow (https://mlflow.org) Tracking is an open source library for recording and querying experiments. This Ray Tune
LoggerCallback
sends information (config parameters, training results & metrics, and artifacts) to MLflow for automatic experiment tracking.- Parameters
tracking_uri – The tracking URI for where to manage experiments and runs. This can either be a local file path or a remote server. This arg gets passed directly to mlflow initialization. When using Tune in a multi-node setting, make sure to set this to a remote server and not a local file path.
registry_uri – The registry URI that gets passed directly to mlflow initialization.
experiment_name – The experiment name to use for this Tune run. If the experiment with the name already exists with MLflow, it will be reused. If not, a new experiment will be created with that name.
tags – An optional dictionary of string keys and values to set as tags on the run
save_artifact – If set to True, automatically save the entire contents of the Tune local_dir as an artifact to the corresponding run in MlFlow.
Example:
from ray.air.integrations.mlflow import MLflowLoggerCallback tags = { "user_name" : "John", "git_commit_hash" : "abc123"} tune.run( train_fn, config={ # define search space here "parameter_1": tune.choice([1, 2, 3]), "parameter_2": tune.choice([4, 5, 6]), }, callbacks=[MLflowLoggerCallback( experiment_name="experiment1", tags=tags, save_artifact=True)])
MLflow Mixin API#
- ray.tune.integration.mlflow.mlflow_mixin(func: Callable)[source]
MLflow (https://mlflow.org) Tracking is an open source library for recording and querying experiments. This Ray Tune Trainable mixin helps initialize the MLflow API for use with the
Trainable
class or the@mlflow_mixin
function API. This mixin automatically configures MLflow and creates a run in the same process as each Tune trial. You can then use the mlflow API inside the your training function and it will automatically get reported to the correct run.For basic usage, just prepend your training function with the
@mlflow_mixin
decorator:from ray.tune.integration.mlflow import mlflow_mixin @mlflow_mixin def train_fn(config): ... mlflow.log_metric(...)
You can also use MlFlow’s autologging feature if using a training framework like Pytorch Lightning, XGBoost, etc. More information can be found here (https://mlflow.org/docs/latest/tracking.html#automatic-logging).
from ray.tune.integration.mlflow import mlflow_mixin @mlflow_mixin def train_fn(config): mlflow.autolog() xgboost_results = xgb.train(config, ...)
The MlFlow configuration is done by passing a
mlflow
key to theconfig
parameter oftune.Tuner()
(see example below).The content of the
mlflow
config entry is used to configure MlFlow. Here are the keys you can pass in to this config entry:- Parameters
tracking_uri – The tracking URI for MLflow tracking. If using Tune in a multi-node setting, make sure to use a remote server for tracking.
experiment_id – The id of an already created MLflow experiment. All logs from all trials in
tune.Tuner()
will be reported to this experiment. If this is not provided or the experiment with this id does not exist, you must provide an``experiment_name``. This parameter takes precedence overexperiment_name
.experiment_name – The name of an already existing MLflow experiment. All logs from all trials in
tune.Tuner()
will be reported to this experiment. If this is not provided, you must provide a validexperiment_id
.token – A token to use for HTTP authentication when logging to a remote tracking server. This is useful when you want to log to a Databricks server, for example. This value will be used to set the MLFLOW_TRACKING_TOKEN environment variable on all the remote training processes.
Example:
from ray import tune from ray.tune.integration.mlflow import mlflow_mixin import mlflow # Create the MlFlow expriment. mlflow.create_experiment("my_experiment") @mlflow_mixin def train_fn(config): for i in range(10): loss = config["a"] + config["b"] mlflow.log_metric(key="loss", value=loss) tune.report(loss=loss, done=True) tuner = tune.Tuner( train_fn, param_space={ # define search space here "a": tune.choice([1, 2, 3]), "b": tune.choice([4, 5, 6]), # mlflow configuration "mlflow": { "experiment_name": "my_experiment", "tracking_uri": mlflow.get_tracking_uri() } }) tuner.fit()
More MLflow Examples#
MLflow PyTorch Lightning Example: Example for using MLflow and Pytorch Lightning with Ray Tune.