Execution (Tuner, tune.Experiment)#


ray.tune.Tuner(trainable: Optional[Union[str, Callable, Type[ray.tune.trainable.trainable.Trainable], BaseTrainer]] = None, *, param_space: Optional[Dict[str, Any]] = None, tune_config: Optional[ray.tune.tune_config.TuneConfig] = None, run_config: Optional[ray.air.config.RunConfig] = None, _tuner_kwargs: Optional[Dict] = None, _tuner_internal: Optional[ray.tune.impl.tuner_internal.TunerInternal] = None)[source]#

Tuner is the recommended way of launching hyperparameter tuning jobs with Ray Tune.

  • trainable – The trainable to be tuned.

  • param_space – Search space of the tuning job. One thing to note is that both preprocessor and dataset can be tuned here.

  • tune_config – Tuning algorithm specific configs. Refer to ray.tune.tune_config.TuneConfig for more info.

  • run_config – Runtime configuration that is specific to individual trials. If passed, this will overwrite the run config passed to the Trainer, if applicable. Refer to ray.air.config.RunConfig for more info.

Usage pattern:

from sklearn.datasets import load_breast_cancer

from ray import tune
from ray.data import from_pandas
from ray.air.config import RunConfig, ScalingConfig
from ray.train.xgboost import XGBoostTrainer
from ray.tune.tuner import Tuner

def get_dataset():
    data_raw = load_breast_cancer(as_frame=True)
    dataset_df = data_raw["data"]
    dataset_df["target"] = data_raw["target"]
    dataset = from_pandas(dataset_df)
    return dataset

trainer = XGBoostTrainer(
    datasets={"train": get_dataset()},

param_space = {
    "scaling_config": ScalingConfig(
        num_workers=tune.grid_search([2, 4]),
            "CPU": tune.grid_search([1, 2]),
    # You can even grid search various datasets in Tune.
    # "datasets": {
    #     "train": tune.grid_search(
    #         [ds1, ds2]
    #     ),
    # },
    "params": {
        "objective": "binary:logistic",
        "tree_method": "approx",
        "eval_metric": ["logloss", "error"],
        "eta": tune.loguniform(1e-4, 1e-1),
        "subsample": tune.uniform(0.5, 1.0),
        "max_depth": tune.randint(1, 9),
tuner = Tuner(trainable=trainer, param_space=param_space,
analysis = tuner.fit()

To retry a failed tune run, you can then do

tuner = Tuner.restore(experiment_checkpoint_dir)

experiment_checkpoint_dir can be easily located near the end of the console output of your first failed run.

PublicAPI (beta): This API is in beta and may change before becoming stable.


ray.tune.run_experiments(experiments: Union[ray.tune.experiment.experiment.Experiment, Mapping, Sequence[Union[ray.tune.experiment.experiment.Experiment, Mapping]]], scheduler: Optional[ray.tune.schedulers.trial_scheduler.TrialScheduler] = None, server_port: Optional[int] = None, verbose: Union[int, ray.tune.utils.log.Verbosity] = Verbosity.V3_TRIAL_DETAILS, progress_reporter: Optional[ray.tune.progress_reporter.ProgressReporter] = None, resume: Union[bool, str] = False, reuse_actors: Optional[bool] = None, trial_executor: Optional[ray.tune.execution.ray_trial_executor.RayTrialExecutor] = None, raise_on_failed_trial: bool = True, concurrent: bool = True, callbacks: Optional[Sequence[ray.tune.callback.Callback]] = None, _remote: Optional[bool] = None)[source]#

Runs and blocks until all trials finish.


>>> from ray.tune.experiment import Experiment
>>> from ray.tune.tune import run_experiments
>>> def my_func(config): return {"score": 0}
>>> experiment_spec = Experiment("experiment", my_func) 
>>> run_experiments(experiments=experiment_spec) 
>>> experiment_spec = {"experiment": {"run": my_func}} 
>>> run_experiments(experiments=experiment_spec) 

List of Trial objects, holding data for each executed trial.

PublicAPI: This API is stable across Ray releases.


ray.tune.Experiment(name: str, run: Union[str, Callable, Type], *, stop: Optional[Union[Mapping, ray.tune.stopper.stopper.Stopper, Callable[[str, Mapping], bool]]] = None, time_budget_s: Optional[Union[int, float, datetime.timedelta]] = None, config: Optional[Dict[str, Any]] = None, resources_per_trial: Union[None, Mapping[str, Union[float, int, Mapping]], PlacementGroupFactory] = None, num_samples: int = 1, local_dir: Optional[str] = None, _experiment_checkpoint_dir: Optional[str] = None, sync_config: Optional[Union[ray.tune.syncer.SyncConfig, dict]] = None, checkpoint_config: Optional[Union[ray.air.config.CheckpointConfig, dict]] = None, trial_name_creator: Optional[Callable[[Trial], str]] = None, trial_dirname_creator: Optional[Callable[[Trial], str]] = None, log_to_file: bool = False, export_formats: Optional[Sequence] = None, max_failures: int = 0, restore: Optional[str] = None)[source]#

Tracks experiment specifications.

Implicitly registers the Trainable if needed. The args here take the same meaning as the arguments defined tune.py:run.

experiment_spec = Experiment(
    stop={"mean_accuracy": 100},
        "alpha": tune.grid_search([0.2, 0.4, 0.6]),
        "beta": tune.grid_search([1, 2]),
        "cpu": 1,
        "gpu": 0
  • TODO (xwjiang) – Add the whole list.

  • _experiment_checkpoint_dir – Internal use only. If present, use this as the root directory for experiment checkpoint. If not present, the directory path will be deduced from trainable name instead.

DeveloperAPI: This API may change across minor Ray releases.


ray.tune.SyncConfig(upload_dir: Optional[str] = None, syncer: Optional[Union[str, ray.tune.syncer.Syncer]] = 'auto', sync_on_checkpoint: bool = True, sync_period: int = 300, sync_timeout: int = 1800) None[source]#

Configuration object for syncing.

If an upload_dir is specified, both experiment and trial checkpoints will be stored on remote (cloud) storage. Synchronization then only happens via this remote storage.

  • upload_dir – Optional URI to sync training results and checkpoints to (e.g. s3://bucket, gs://bucket or hdfs://path). Specifying this will enable cloud-based checkpointing.

  • syncer – Syncer class to use for synchronizing checkpoints to/from cloud storage. If set to None, no syncing will take place. Defaults to "auto" (auto detect).

  • sync_on_checkpoint – Force sync-down of trial checkpoint to driver (only non cloud-storage). If set to False, checkpoint syncing from worker to driver is asynchronous and best-effort. This does not affect persistent storage syncing. Defaults to True.

  • sync_period – Syncing period for syncing between nodes.

  • sync_timeout – Timeout after which running sync processes are aborted. Currently only affects trial-to-cloud syncing.

PublicAPI: This API is stable across Ray releases.