Using MLflow with Tune

Warning

If you are using these MLflow integrations with Ray Client: Interactive Development, it is recommended that you setup a remote Mlflow tracking server instead of one that is backed by the local filesystem.

MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry. It currently offers four components, including MLflow Tracking to record and query experiments, including code, data, config, and results.

MLflow

Ray Tune currently offers two lightweight integrations for MLflow Tracking. One is the MLflowLoggerCallback, which automatically logs metrics reported to Tune to the MLflow Tracking API.

The other one is the @mlflow_mixin decorator, which can be used with the function API. It automatically initializes the MLflow API with Tune’s training information and creates a run for each Tune trial. Then within your training function, you can just use the MLflow like you would normally do, e.g. using mlflow.log_metrics() or even mlflow.autolog() to log to your training process.

Running an MLflow Example

In the following example we’re going to use both of the above methods, namely the MLflowLoggerCallback and the mlflow_mixin decorator to log metrics. Let’s start with a few crucial imports:

import os
import tempfile
import time

import mlflow

from ray import air, tune
from ray.air import session
from ray.air.integrations.mlflow import MLflowLoggerCallback
from ray.tune.integration.mlflow import mlflow_mixin

Next, let’s define an easy objective function (a Tune Trainable) that iteratively computes steps and evaluates intermediate scores that we report to Tune.

def evaluation_fn(step, width, height):
    return (0.1 + width * step / 100) ** (-1) + height * 0.1


def easy_objective(config):
    width, height = config["width"], config["height"]

    for step in range(config.get("steps", 100)):
        # Iterative training function - can be any arbitrary training procedure
        intermediate_score = evaluation_fn(step, width, height)
        # Feed the score back to Tune.
        session.report({"iterations": step, "mean_loss": intermediate_score})
        time.sleep(0.1)

Given an MLFlow tracking URI, you can now simply use the MLflowLoggerCallback as a callback argument to your RunConfig():

def tune_function(mlflow_tracking_uri, finish_fast=False):
    tuner = tune.Tuner(
        easy_objective,
        tune_config=tune.TuneConfig(
            num_samples=5
        ),
        run_config=air.RunConfig(
            name="mlflow",
            callbacks=[
                MLflowLoggerCallback(
                    tracking_uri=mlflow_tracking_uri,
                    experiment_name="example",
                    save_artifact=True,
                )
            ],
        ),
        param_space={
            "width": tune.randint(10, 100),
            "height": tune.randint(0, 100),
            "steps": 5 if finish_fast else 100,
        },
    )
    results = tuner.fit()

To use the mlflow_mixin decorator, you can simply decorate the objective function from earlier. Note that we also use mlflow.log_metrics(...) to log metrics to MLflow. Otherwise, the decorated version of our objective is identical to its original.

@mlflow_mixin
def decorated_easy_objective(config):
    # Hyperparameters
    width, height = config["width"], config["height"]

    for step in range(config.get("steps", 100)):
        # Iterative training function - can be any arbitrary training procedure
        intermediate_score = evaluation_fn(step, width, height)
        # Log the metrics to mlflow
        mlflow.log_metrics(dict(mean_loss=intermediate_score), step=step)
        # Feed the score back to Tune.
        session.report({"iterations": step, "mean_loss": intermediate_score})
        time.sleep(0.1)

With this new objective function ready, you can now create a Tune run with it as follows:

def tune_decorated(mlflow_tracking_uri, finish_fast=False):
    # Set the experiment, or create a new one if does not exist yet.
    mlflow.set_tracking_uri(mlflow_tracking_uri)
    mlflow.set_experiment(experiment_name="mixin_example")
    
    tuner = tune.Tuner(
        decorated_easy_objective,
        tune_config=tune.TuneConfig(
            num_samples=5
        ),
        run_config=air.RunConfig(
            name="mlflow",
        ),
        param_space={
            "width": tune.randint(10, 100),
            "height": tune.randint(0, 100),
            "steps": 5 if finish_fast else 100,
            "mlflow": {
                "experiment_name": "mixin_example",
                "tracking_uri": mlflow.get_tracking_uri(),
            },
        },
    )
    results = tuner.fit()

If you hapen to have an MLFlow tracking URI, you can set it below in the mlflow_tracking_uri variable and set smoke_test=False. Otherwise, you can just run a quick test of the tune_function and tune_decorated functions without using MLflow.

smoke_test = True

if smoke_test:
    mlflow_tracking_uri = os.path.join(tempfile.gettempdir(), "mlruns")
else:
    mlflow_tracking_uri = "<MLFLOW_TRACKING_URI>"

tune_function(mlflow_tracking_uri, finish_fast=smoke_test)
if not smoke_test:
    df = mlflow.search_runs(
        [mlflow.get_experiment_by_name("example").experiment_id]
    )
    print(df)

tune_decorated(mlflow_tracking_uri, finish_fast=smoke_test)
if not smoke_test:
    df = mlflow.search_runs(
        [mlflow.get_experiment_by_name("mixin_example").experiment_id]
    )
    print(df)
2022-07-22 16:27:41,371	INFO services.py:1483 -- View the Ray dashboard at http://127.0.0.1:8271
2022-07-22 16:27:43,768	WARNING function_trainable.py:619 -- Function checkpointing is disabled. This may result in unexpected behavior when using checkpointing features or certain schedulers. To enable, set the train function arguments to be `func(config, checkpoint_dir=None)`.
== Status ==
Current time: 2022-07-22 16:27:50 (running for 00:00:06.29)
Memory usage on this node: 10.1/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/5.63 GiB heap, 0.0/2.0 GiB objects
Result logdir: /Users/kai/ray_results/mlflow
Number of trials: 5/5 (5 TERMINATED)
Trial name status loc height width loss iter total time (s) iterations neg_mean_loss
easy_objective_d4e29_00000TERMINATED127.0.0.1:52551 38 234.78039 5 0.549093 4 -4.78039
easy_objective_d4e29_00001TERMINATED127.0.0.1:52561 86 888.87624 5 0.548692 4 -8.87624
easy_objective_d4e29_00002TERMINATED127.0.0.1:52562 22 952.45641 5 0.587558 4 -2.45641
easy_objective_d4e29_00003TERMINATED127.0.0.1:52563 11 811.3994 5 0.560393 4 -1.3994
easy_objective_d4e29_00004TERMINATED127.0.0.1:52564 21 272.94746 5 0.534 4 -2.94746


2022-07-22 16:27:44,945	INFO plugin_schema_manager.py:52 -- Loading the default runtime env schemas: ['/Users/kai/coding/ray/python/ray/_private/runtime_env/../../runtime_env/schemas/working_dir_schema.json', '/Users/kai/coding/ray/python/ray/_private/runtime_env/../../runtime_env/schemas/pip_schema.json'].
Result for easy_objective_d4e29_00000:
  date: 2022-07-22_16-27-47
  done: false
  experiment_id: 421feb6ca1cb40969430bd0ab995fe37
  hostname: Kais-MacBook-Pro.local
  iterations: 0
  iterations_since_restore: 1
  mean_loss: 13.8
  neg_mean_loss: -13.8
  node_ip: 127.0.0.1
  pid: 52551
  time_since_restore: 0.00015282630920410156
  time_this_iter_s: 0.00015282630920410156
  time_total_s: 0.00015282630920410156
  timestamp: 1658503667
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: d4e29_00000
  warmup_time: 0.0036253929138183594
  
Result for easy_objective_d4e29_00000:
  date: 2022-07-22_16-27-48
  done: true
  experiment_id: 421feb6ca1cb40969430bd0ab995fe37
  experiment_tag: 0_height=38,width=23
  hostname: Kais-MacBook-Pro.local
  iterations: 4
  iterations_since_restore: 5
  mean_loss: 4.780392156862745
  neg_mean_loss: -4.780392156862745
  node_ip: 127.0.0.1
  pid: 52551
  time_since_restore: 0.5490927696228027
  time_this_iter_s: 0.12111282348632812
  time_total_s: 0.5490927696228027
  timestamp: 1658503668
  timesteps_since_restore: 0
  training_iteration: 5
  trial_id: d4e29_00000
  warmup_time: 0.0036253929138183594
  
Result for easy_objective_d4e29_00001:
  date: 2022-07-22_16-27-50
  done: false
  experiment_id: 40ac54d80e854437b4126dca98a7f995
  hostname: Kais-MacBook-Pro.local
  iterations: 0
  iterations_since_restore: 1
  mean_loss: 18.6
  neg_mean_loss: -18.6
  node_ip: 127.0.0.1
  pid: 52561
  time_since_restore: 0.00013113021850585938
  time_this_iter_s: 0.00013113021850585938
  time_total_s: 0.00013113021850585938
  timestamp: 1658503670
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: d4e29_00001
  warmup_time: 0.002991914749145508
  
Result for easy_objective_d4e29_00002:
  date: 2022-07-22_16-27-50
  done: false
  experiment_id: 23f2d0c4631e4a2abb5449ba68f80e8b
  hostname: Kais-MacBook-Pro.local
  iterations: 0
  iterations_since_restore: 1
  mean_loss: 12.2
  neg_mean_loss: -12.2
  node_ip: 127.0.0.1
  pid: 52562
  time_since_restore: 0.0001289844512939453
  time_this_iter_s: 0.0001289844512939453
  time_total_s: 0.0001289844512939453
  timestamp: 1658503670
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: d4e29_00002
  warmup_time: 0.002949953079223633
  
Result for easy_objective_d4e29_00003:
  date: 2022-07-22_16-27-50
  done: false
  experiment_id: 7cb23325d6044f0f995b338d2e15f31e
  hostname: Kais-MacBook-Pro.local
  iterations: 0
  iterations_since_restore: 1
  mean_loss: 11.1
  neg_mean_loss: -11.1
  node_ip: 127.0.0.1
  pid: 52563
  time_since_restore: 0.00010609626770019531
  time_this_iter_s: 0.00010609626770019531
  time_total_s: 0.00010609626770019531
  timestamp: 1658503670
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: d4e29_00003
  warmup_time: 0.0026869773864746094
  
Result for easy_objective_d4e29_00004:
  date: 2022-07-22_16-27-50
  done: false
  experiment_id: fc3b1add717842f4ae0b4882a1292f93
  hostname: Kais-MacBook-Pro.local
  iterations: 0
  iterations_since_restore: 1
  mean_loss: 12.1
  neg_mean_loss: -12.1
  node_ip: 127.0.0.1
  pid: 52564
  time_since_restore: 0.00011801719665527344
  time_this_iter_s: 0.00011801719665527344
  time_total_s: 0.00011801719665527344
  timestamp: 1658503670
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: d4e29_00004
  warmup_time: 0.0028209686279296875
  
Result for easy_objective_d4e29_00001:
  date: 2022-07-22_16-27-50
  done: true
  experiment_id: 40ac54d80e854437b4126dca98a7f995
  experiment_tag: 1_height=86,width=88
  hostname: Kais-MacBook-Pro.local
  iterations: 4
  iterations_since_restore: 5
  mean_loss: 8.876243093922652
  neg_mean_loss: -8.876243093922652
  node_ip: 127.0.0.1
  pid: 52561
  time_since_restore: 0.548691987991333
  time_this_iter_s: 0.12308692932128906
  time_total_s: 0.548691987991333
  timestamp: 1658503670
  timesteps_since_restore: 0
  training_iteration: 5
  trial_id: d4e29_00001
  warmup_time: 0.002991914749145508
  
Result for easy_objective_d4e29_00004:
  date: 2022-07-22_16-27-50
  done: true
  experiment_id: fc3b1add717842f4ae0b4882a1292f93
  experiment_tag: 4_height=21,width=27
  hostname: Kais-MacBook-Pro.local
  iterations: 4
  iterations_since_restore: 5
  mean_loss: 2.947457627118644
  neg_mean_loss: -2.947457627118644
  node_ip: 127.0.0.1
  pid: 52564
  time_since_restore: 0.5339996814727783
  time_this_iter_s: 0.12359499931335449
  time_total_s: 0.5339996814727783
  timestamp: 1658503670
  timesteps_since_restore: 0
  training_iteration: 5
  trial_id: d4e29_00004
  warmup_time: 0.0028209686279296875
  
Result for easy_objective_d4e29_00003:
  date: 2022-07-22_16-27-50
  done: true
  experiment_id: 7cb23325d6044f0f995b338d2e15f31e
  experiment_tag: 3_height=11,width=81
  hostname: Kais-MacBook-Pro.local
  iterations: 4
  iterations_since_restore: 5
  mean_loss: 1.3994011976047904
  neg_mean_loss: -1.3994011976047904
  node_ip: 127.0.0.1
  pid: 52563
  time_since_restore: 0.5603930950164795
  time_this_iter_s: 0.12318706512451172
  time_total_s: 0.5603930950164795
  timestamp: 1658503670
  timesteps_since_restore: 0
  training_iteration: 5
  trial_id: d4e29_00003
  warmup_time: 0.0026869773864746094
  
Result for easy_objective_d4e29_00002:
  date: 2022-07-22_16-27-50
  done: true
  experiment_id: 23f2d0c4631e4a2abb5449ba68f80e8b
  experiment_tag: 2_height=22,width=95
  hostname: Kais-MacBook-Pro.local
  iterations: 4
  iterations_since_restore: 5
  mean_loss: 2.4564102564102566
  neg_mean_loss: -2.4564102564102566
  node_ip: 127.0.0.1
  pid: 52562
  time_since_restore: 0.5875582695007324
  time_this_iter_s: 0.12340712547302246
  time_total_s: 0.5875582695007324
  timestamp: 1658503670
  timesteps_since_restore: 0
  training_iteration: 5
  trial_id: d4e29_00002
  warmup_time: 0.002949953079223633
  
2022-07-22 16:27:51,033	INFO tune.py:738 -- Total run time: 7.27 seconds (6.28 seconds for the tuning loop).
2022/07/22 16:27:51 INFO mlflow.tracking.fluent: Experiment with name 'mixin_example' does not exist. Creating a new experiment.
== Status ==
Current time: 2022-07-22 16:27:58 (running for 00:00:07.03)
Memory usage on this node: 10.4/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/16 CPUs, 0/0 GPUs, 0.0/5.63 GiB heap, 0.0/2.0 GiB objects
Result logdir: /Users/kai/ray_results/mlflow
Number of trials: 5/5 (5 TERMINATED)
Trial name status loc height width loss iter total time (s) iterations neg_mean_loss
decorated_easy_objective_d93b6_00000TERMINATED127.0.0.1:52581 45 51 4.96729 5 0.460993 4 -4.96729
decorated_easy_objective_d93b6_00001TERMINATED127.0.0.1:52598 44 94 4.65907 5 0.434945 4 -4.65907
decorated_easy_objective_d93b6_00002TERMINATED127.0.0.1:52599 93 2510.2091 5 0.471808 4 -10.2091
decorated_easy_objective_d93b6_00003TERMINATED127.0.0.1:52600 40 26 4.87719 5 0.437302 4 -4.87719
decorated_easy_objective_d93b6_00004TERMINATED127.0.0.1:52601 16 65 1.97037 5 0.468027 4 -1.97037


Result for decorated_easy_objective_d93b6_00000:
  date: 2022-07-22_16-27-54
  done: false
  experiment_id: 2d0d9fbc13c64acfa27153a5fb9aeb68
  hostname: Kais-MacBook-Pro.local
  iterations: 0
  iterations_since_restore: 1
  mean_loss: 14.5
  neg_mean_loss: -14.5
  node_ip: 127.0.0.1
  pid: 52581
  time_since_restore: 0.001725912094116211
  time_this_iter_s: 0.001725912094116211
  time_total_s: 0.001725912094116211
  timestamp: 1658503674
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: d93b6_00000
  warmup_time: 0.20471811294555664
  
Result for decorated_easy_objective_d93b6_00000:
  date: 2022-07-22_16-27-54
  done: true
  experiment_id: 2d0d9fbc13c64acfa27153a5fb9aeb68
  experiment_tag: 0_height=45,width=51
  hostname: Kais-MacBook-Pro.local
  iterations: 4
  iterations_since_restore: 5
  mean_loss: 4.9672897196261685
  neg_mean_loss: -4.9672897196261685
  node_ip: 127.0.0.1
  pid: 52581
  time_since_restore: 0.46099305152893066
  time_this_iter_s: 0.10984206199645996
  time_total_s: 0.46099305152893066
  timestamp: 1658503674
  timesteps_since_restore: 0
  training_iteration: 5
  trial_id: d93b6_00000
  warmup_time: 0.20471811294555664
  
Result for decorated_easy_objective_d93b6_00001:
  date: 2022-07-22_16-27-57
  done: false
  experiment_id: 4bec5377a38a47d7bae57f7502ff0312
  hostname: Kais-MacBook-Pro.local
  iterations: 0
  iterations_since_restore: 1
  mean_loss: 14.4
  neg_mean_loss: -14.4
  node_ip: 127.0.0.1
  pid: 52598
  time_since_restore: 0.0016498565673828125
  time_this_iter_s: 0.0016498565673828125
  time_total_s: 0.0016498565673828125
  timestamp: 1658503677
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: d93b6_00001
  warmup_time: 0.18288898468017578
  
Result for decorated_easy_objective_d93b6_00003:
  date: 2022-07-22_16-27-57
  done: false
  experiment_id: 6868d31636df4c4a8e9ed91927120269
  hostname: Kais-MacBook-Pro.local
  iterations: 0
  iterations_since_restore: 1
  mean_loss: 14.0
  neg_mean_loss: -14.0
  node_ip: 127.0.0.1
  pid: 52600
  time_since_restore: 0.0016481876373291016
  time_this_iter_s: 0.0016481876373291016
  time_total_s: 0.0016481876373291016
  timestamp: 1658503677
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: d93b6_00003
  warmup_time: 0.17208290100097656
  
Result for decorated_easy_objective_d93b6_00004:
  date: 2022-07-22_16-27-57
  done: false
  experiment_id: f021ddc2dc164413931c17cb593dfa12
  hostname: Kais-MacBook-Pro.local
  iterations: 0
  iterations_since_restore: 1
  mean_loss: 11.6
  neg_mean_loss: -11.6
  node_ip: 127.0.0.1
  pid: 52601
  time_since_restore: 0.0015459060668945312
  time_this_iter_s: 0.0015459060668945312
  time_total_s: 0.0015459060668945312
  timestamp: 1658503677
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: d93b6_00004
  warmup_time: 0.1808018684387207
  
Result for decorated_easy_objective_d93b6_00002:
  date: 2022-07-22_16-27-57
  done: false
  experiment_id: a341941781824ea9b1a072b587e42a84
  hostname: Kais-MacBook-Pro.local
  iterations: 0
  iterations_since_restore: 1
  mean_loss: 19.3
  neg_mean_loss: -19.3
  node_ip: 127.0.0.1
  pid: 52599
  time_since_restore: 0.0015799999237060547
  time_this_iter_s: 0.0015799999237060547
  time_total_s: 0.0015799999237060547
  timestamp: 1658503677
  timesteps_since_restore: 0
  training_iteration: 1
  trial_id: d93b6_00002
  warmup_time: 0.1837329864501953
  
Result for decorated_easy_objective_d93b6_00001:
  date: 2022-07-22_16-27-57
  done: true
  experiment_id: 4bec5377a38a47d7bae57f7502ff0312
  experiment_tag: 1_height=44,width=94
  hostname: Kais-MacBook-Pro.local
  iterations: 4
  iterations_since_restore: 5
  mean_loss: 4.659067357512954
  neg_mean_loss: -4.659067357512954
  node_ip: 127.0.0.1
  pid: 52598
  time_since_restore: 0.43494510650634766
  time_this_iter_s: 0.10719513893127441
  time_total_s: 0.43494510650634766
  timestamp: 1658503677
  timesteps_since_restore: 0
  training_iteration: 5
  trial_id: d93b6_00001
  warmup_time: 0.18288898468017578
  
Result for decorated_easy_objective_d93b6_00003:
  date: 2022-07-22_16-27-57
  done: true
  experiment_id: 6868d31636df4c4a8e9ed91927120269
  experiment_tag: 3_height=40,width=26
  hostname: Kais-MacBook-Pro.local
  iterations: 4
  iterations_since_restore: 5
  mean_loss: 4.87719298245614
  neg_mean_loss: -4.87719298245614
  node_ip: 127.0.0.1
  pid: 52600
  time_since_restore: 0.4373021125793457
  time_this_iter_s: 0.10880899429321289
  time_total_s: 0.4373021125793457
  timestamp: 1658503677
  timesteps_since_restore: 0
  training_iteration: 5
  trial_id: d93b6_00003
  warmup_time: 0.17208290100097656
  
Result for decorated_easy_objective_d93b6_00004:
  date: 2022-07-22_16-27-57
  done: true
  experiment_id: f021ddc2dc164413931c17cb593dfa12
  experiment_tag: 4_height=16,width=65
  hostname: Kais-MacBook-Pro.local
  iterations: 4
  iterations_since_restore: 5
  mean_loss: 1.9703703703703703
  neg_mean_loss: -1.9703703703703703
  node_ip: 127.0.0.1
  pid: 52601
  time_since_restore: 0.46802687644958496
  time_this_iter_s: 0.1077277660369873
  time_total_s: 0.46802687644958496
  timestamp: 1658503677
  timesteps_since_restore: 0
  training_iteration: 5
  trial_id: d93b6_00004
  warmup_time: 0.1808018684387207
  
Result for decorated_easy_objective_d93b6_00002:
  date: 2022-07-22_16-27-57
  done: true
  experiment_id: a341941781824ea9b1a072b587e42a84
  experiment_tag: 2_height=93,width=25
  hostname: Kais-MacBook-Pro.local
  iterations: 4
  iterations_since_restore: 5
  mean_loss: 10.209090909090909
  neg_mean_loss: -10.209090909090909
  node_ip: 127.0.0.1
  pid: 52599
  time_since_restore: 0.47180795669555664
  time_this_iter_s: 0.10791492462158203
  time_total_s: 0.47180795669555664
  timestamp: 1658503677
  timesteps_since_restore: 0
  training_iteration: 5
  trial_id: d93b6_00002
  warmup_time: 0.1837329864501953
  
2022-07-22 16:27:58,211	INFO tune.py:738 -- Total run time: 7.15 seconds (7.01 seconds for the tuning loop).

This completes our Tune and MLflow walk-through. In the following sections you can find more details on the API of the Tune-MLflow integration.

MLflow AutoLogging

You can also check out here for an example on how you can leverage MLflow auto-logging, in this case with Pytorch Lightning

MLflow Logger API

class ray.air.integrations.mlflow.MLflowLoggerCallback(tracking_uri: Optional[str] = None, registry_uri: Optional[str] = None, experiment_name: Optional[str] = None, tags: Optional[Dict] = None, save_artifact: bool = False)[source]

MLflow Logger to automatically log Tune results and config to MLflow.

MLflow (https://mlflow.org) Tracking is an open source library for recording and querying experiments. This Ray Tune LoggerCallback sends information (config parameters, training results & metrics, and artifacts) to MLflow for automatic experiment tracking.

Parameters
  • tracking_uri – The tracking URI for where to manage experiments and runs. This can either be a local file path or a remote server. This arg gets passed directly to mlflow initialization. When using Tune in a multi-node setting, make sure to set this to a remote server and not a local file path.

  • registry_uri – The registry URI that gets passed directly to mlflow initialization.

  • experiment_name – The experiment name to use for this Tune run. If the experiment with the name already exists with MLflow, it will be reused. If not, a new experiment will be created with that name.

  • tags – An optional dictionary of string keys and values to set as tags on the run

  • save_artifact – If set to True, automatically save the entire contents of the Tune local_dir as an artifact to the corresponding run in MlFlow.

Example:

from ray.air.integrations.mlflow import MLflowLoggerCallback

tags = { "user_name" : "John",
         "git_commit_hash" : "abc123"}

tune.run(
    train_fn,
    config={
        # define search space here
        "parameter_1": tune.choice([1, 2, 3]),
        "parameter_2": tune.choice([4, 5, 6]),
    },
    callbacks=[MLflowLoggerCallback(
        experiment_name="experiment1",
        tags=tags,
        save_artifact=True)])

MLflow Mixin API

ray.tune.integration.mlflow.mlflow_mixin(func: Callable)[source]

MLflow (https://mlflow.org) Tracking is an open source library for recording and querying experiments. This Ray Tune Trainable mixin helps initialize the MLflow API for use with the Trainable class or the @mlflow_mixin function API. This mixin automatically configures MLflow and creates a run in the same process as each Tune trial. You can then use the mlflow API inside the your training function and it will automatically get reported to the correct run.

For basic usage, just prepend your training function with the @mlflow_mixin decorator:

from ray.tune.integration.mlflow import mlflow_mixin

@mlflow_mixin
def train_fn(config):
    ...
    mlflow.log_metric(...)

You can also use MlFlow’s autologging feature if using a training framework like Pytorch Lightning, XGBoost, etc. More information can be found here (https://mlflow.org/docs/latest/tracking.html#automatic-logging).

from ray.tune.integration.mlflow import mlflow_mixin

@mlflow_mixin
def train_fn(config):
    mlflow.autolog()
    xgboost_results = xgb.train(config, ...)

The MlFlow configuration is done by passing a mlflow key to the config parameter of tune.Tuner() (see example below).

The content of the mlflow config entry is used to configure MlFlow. Here are the keys you can pass in to this config entry:

Parameters
  • tracking_uri – The tracking URI for MLflow tracking. If using Tune in a multi-node setting, make sure to use a remote server for tracking.

  • experiment_id – The id of an already created MLflow experiment. All logs from all trials in tune.Tuner() will be reported to this experiment. If this is not provided or the experiment with this id does not exist, you must provide an``experiment_name``. This parameter takes precedence over experiment_name.

  • experiment_name – The name of an already existing MLflow experiment. All logs from all trials in tune.Tuner() will be reported to this experiment. If this is not provided, you must provide a valid experiment_id.

  • token – A token to use for HTTP authentication when logging to a remote tracking server. This is useful when you want to log to a Databricks server, for example. This value will be used to set the MLFLOW_TRACKING_TOKEN environment variable on all the remote training processes.

Example:

from ray import tune
from ray.tune.integration.mlflow import mlflow_mixin

import mlflow

# Create the MlFlow expriment.
mlflow.create_experiment("my_experiment")

@mlflow_mixin
def train_fn(config):
    for i in range(10):
        loss = config["a"] + config["b"]
        mlflow.log_metric(key="loss", value=loss)
    tune.report(loss=loss, done=True)

tuner = tune.Tuner(
    train_fn,
    param_space={
        # define search space here
        "a": tune.choice([1, 2, 3]),
        "b": tune.choice([4, 5, 6]),
        # mlflow configuration
        "mlflow": {
            "experiment_name": "my_experiment",
            "tracking_uri": mlflow.get_tracking_uri()
        }
    })

tuner.fit()

More MLflow Examples