ray.air.integrations.mlflow.setup_mlflow#
- ray.air.integrations.mlflow.setup_mlflow(config: Dict | None = None, tracking_uri: str | None = None, registry_uri: str | None = None, experiment_id: str | None = None, experiment_name: str | None = None, tracking_token: str | None = None, artifact_location: str | None = None, run_name: str | None = None, create_experiment_if_not_exists: bool = False, tags: Dict | None = None, rank_zero_only: bool = True) ModuleType | _NoopModule [source]#
Set up a MLflow session.
This function can be used to initialize an MLflow session in a (distributed) training or tuning run. The session will be created on the trainable.
By default, the MLflow experiment ID is the Ray trial ID and the MLlflow experiment name is the Ray trial name. These settings can be overwritten by passing the respective keyword arguments.
The
config
dict is automatically logged as the run parameters (excluding the mlflow settings).In distributed training with Ray Train, only the zero-rank worker will initialize mlflow. All other workers will return a noop client, so that logging is not duplicated in a distributed run. This can be disabled by passing
rank_zero_only=False
, which will then initialize mlflow in every training worker.This function will return the
mlflow
module or a noop module for non-rank zero workersif rank_zero_only=True
. By usingmlflow = setup_mlflow(config)
you can ensure that only the rank zero worker calls the mlflow API.- Parameters:
config – Configuration dict to be logged to mlflow as parameters.
tracking_uri – The tracking URI for MLflow tracking. If using Tune in a multi-node setting, make sure to use a remote server for tracking.
registry_uri – The registry URI for the MLflow model registry.
experiment_id – The id of an already created MLflow experiment. All logs from all trials in
tune.Tuner()
will be reported to this experiment. If this is not provided or the experiment with this id does not exist, you must provide an``experiment_name``. This parameter takes precedence overexperiment_name
.experiment_name – The name of an already existing MLflow experiment. All logs from all trials in
tune.Tuner()
will be reported to this experiment. If this is not provided, you must provide a validexperiment_id
.tracking_token – A token to use for HTTP authentication when logging to a remote tracking server. This is useful when you want to log to a Databricks server, for example. This value will be used to set the MLFLOW_TRACKING_TOKEN environment variable on all the remote training processes.
artifact_location – The location to store run artifacts. If not provided, MLFlow picks an appropriate default. Ignored if experiment already exists.
run_name – Name of the new MLflow run that will be created. If not set, will default to the
experiment_name
.create_experiment_if_not_exists – Whether to create an experiment with the provided name if it does not already exist. Defaults to False.
tags – Tags to set for the new run.
rank_zero_only – If True, will return an initialized session only for the rank 0 worker in distributed training. If False, will initialize a session for all workers. Defaults to True.
Example
Per default, you can just call
setup_mlflow
and continue to use MLflow like you would normally do:from ray.air.integrations.mlflow import setup_mlflow def training_loop(config): mlflow = setup_mlflow(config) # ... mlflow.log_metric(key="loss", val=0.123, step=0)
In distributed data parallel training, you can utilize the return value of
setup_mlflow
. This will make sure it is only invoked on the first worker in distributed training runs.from ray.air.integrations.mlflow import setup_mlflow def training_loop(config): mlflow = setup_mlflow(config) # ... mlflow.log_metric(key="loss", val=0.123, step=0)
You can also use MlFlow’s autologging feature if using a training framework like Pytorch Lightning, XGBoost, etc. More information can be found here (https://mlflow.org/docs/latest/tracking.html#automatic-logging).
from ray.air.integrations.mlflow import setup_mlflow def train_fn(config): mlflow = setup_mlflow(config) mlflow.autolog() xgboost_results = xgb.train(config, ...)
PublicAPI (alpha): This API is in alpha and may change before becoming stable.