Execution (Tuner, tune.Experiment)

Tuner

ray.tune.Tuner(trainable: Optional[Union[str, Callable, Type[ray.tune.trainable.trainable.Trainable], BaseTrainer]] = None, *, param_space: Optional[Dict[str, Any]] = None, tune_config: Optional[ray.tune.tune_config.TuneConfig] = None, run_config: Optional[ray.air.config.RunConfig] = None, _tuner_kwargs: Optional[Dict] = None, _tuner_internal: Optional[ray.tune.impl.tuner_internal.TunerInternal] = None)[source]

Tuner is the recommended way of launching hyperparameter tuning jobs with Ray Tune.

Parameters
  • trainable – The trainable to be tuned.

  • param_space – Search space of the tuning job. One thing to note is that both preprocessor and dataset can be tuned here.

  • tune_config – Tuning algorithm specific configs. Refer to ray.tune.tune_config.TuneConfig for more info.

  • run_config – Runtime configuration that is specific to individual trials. If passed, this will overwrite the run config passed to the Trainer, if applicable. Refer to ray.air.config.RunConfig for more info.

Usage pattern:

from sklearn.datasets import load_breast_cancer

from ray import tune
from ray.data import from_pandas
from ray.air.config import RunConfig, ScalingConfig
from ray.train.xgboost import XGBoostTrainer
from ray.tune.tuner import Tuner

def get_dataset():
    data_raw = load_breast_cancer(as_frame=True)
    dataset_df = data_raw["data"]
    dataset_df["target"] = data_raw["target"]
    dataset = from_pandas(dataset_df)
    return dataset

trainer = XGBoostTrainer(
    label_column="target",
    params={},
    datasets={"train": get_dataset()},
)

param_space = {
    "scaling_config": ScalingConfig(
        num_workers=tune.grid_search([2, 4]),
        resources_per_worker={
            "CPU": tune.grid_search([1, 2]),
        },
    ),
    # You can even grid search various datasets in Tune.
    # "datasets": {
    #     "train": tune.grid_search(
    #         [ds1, ds2]
    #     ),
    # },
    "params": {
        "objective": "binary:logistic",
        "tree_method": "approx",
        "eval_metric": ["logloss", "error"],
        "eta": tune.loguniform(1e-4, 1e-1),
        "subsample": tune.uniform(0.5, 1.0),
        "max_depth": tune.randint(1, 9),
    },
}
tuner = Tuner(trainable=trainer, param_space=param_space,
    run_config=RunConfig(name="my_tune_run"))
analysis = tuner.fit()

To retry a failed tune run, you can then do

tuner = Tuner.restore(experiment_checkpoint_dir)
tuner.fit()

experiment_checkpoint_dir can be easily located near the end of the console output of your first failed run.

PublicAPI (beta): This API is in beta and may change before becoming stable.

tune.run_experiments

ray.tune.run_experiments(experiments: Union[ray.tune.experiment.experiment.Experiment, Mapping, Sequence[Union[ray.tune.experiment.experiment.Experiment, Mapping]]], scheduler: Optional[ray.tune.schedulers.trial_scheduler.TrialScheduler] = None, server_port: Optional[int] = None, verbose: Union[int, ray.tune.utils.log.Verbosity] = Verbosity.V3_TRIAL_DETAILS, progress_reporter: Optional[ray.tune.progress_reporter.ProgressReporter] = None, resume: Union[bool, str] = False, reuse_actors: Optional[bool] = None, trial_executor: Optional[ray.tune.execution.ray_trial_executor.RayTrialExecutor] = None, raise_on_failed_trial: bool = True, concurrent: bool = True, callbacks: Optional[Sequence[ray.tune.callback.Callback]] = None, _remote: Optional[bool] = None)[source]

Runs and blocks until all trials finish.

Example

>>> from ray.tune.experiment import Experiment
>>> from ray.tune.tune import run_experiments
>>> def my_func(config): return {"score": 0}
>>> experiment_spec = Experiment("experiment", my_func) 
>>> run_experiments(experiments=experiment_spec) 
>>> experiment_spec = {"experiment": {"run": my_func}} 
>>> run_experiments(experiments=experiment_spec) 
Returns

List of Trial objects, holding data for each executed trial.

PublicAPI: This API is stable across Ray releases.

tune.Experiment

ray.tune.Experiment(name, run, stop=None, time_budget_s=None, config=None, resources_per_trial=None, num_samples=1, local_dir=None, _experiment_checkpoint_dir: Optional[str] = None, sync_config=None, trial_name_creator=None, trial_dirname_creator=None, log_to_file=False, checkpoint_freq=0, checkpoint_at_end=False, keep_checkpoints_num=None, checkpoint_score_attr=None, export_formats=None, max_failures=0, restore=None)[source]

Tracks experiment specifications.

Implicitly registers the Trainable if needed. The args here take the same meaning as the arguments defined tune.py:run.

experiment_spec = Experiment(
    "my_experiment_name",
    my_func,
    stop={"mean_accuracy": 100},
    config={
        "alpha": tune.grid_search([0.2, 0.4, 0.6]),
        "beta": tune.grid_search([1, 2]),
    },
    resources_per_trial={
        "cpu": 1,
        "gpu": 0
    },
    num_samples=10,
    local_dir="~/ray_results",
    checkpoint_freq=10,
    max_failures=2)
Parameters
  • TODO (xwjiang) – Add the whole list.

  • _experiment_checkpoint_dir – Internal use only. If present, use this as the root directory for experiment checkpoint. If not present, the directory path will be deduced from trainable name instead.

DeveloperAPI: This API may change across minor Ray releases.

tune.SyncConfig

ray.tune.SyncConfig(upload_dir: Optional[str] = None, syncer: Optional[Union[str, ray.tune.syncer.Syncer]] = 'auto', sync_on_checkpoint: bool = True, sync_period: int = 300, sync_timeout: int = 1800) None[source]

Configuration object for syncing.

If an upload_dir is specified, both experiment and trial checkpoints will be stored on remote (cloud) storage. Synchronization then only happens via this remote storage.

Parameters
  • upload_dir – Optional URI to sync training results and checkpoints to (e.g. s3://bucket, gs://bucket or hdfs://path). Specifying this will enable cloud-based checkpointing.

  • syncer – Syncer class to use for synchronizing checkpoints to/from cloud storage. If set to None, no syncing will take place. Defaults to "auto" (auto detect).

  • sync_on_checkpoint – Force sync-down of trial checkpoint to driver (only non cloud-storage). If set to False, checkpoint syncing from worker to driver is asynchronous and best-effort. This does not affect persistent storage syncing. Defaults to True.

  • sync_period – Syncing period for syncing between nodes.

  • sync_timeout – Timeout after which running sync processes are aborted. Currently only affects trial-to-cloud syncing.

PublicAPI: This API is stable across Ray releases.