Tune Execution (Tuner, tune.Experiment)#

Tuner#

ray.tune.Tuner(trainable: Optional[Union[str, Callable, Type[ray.tune.trainable.trainable.Trainable], BaseTrainer]] = None, *, param_space: Optional[Dict[str, Any]] = None, tune_config: Optional[ray.tune.tune_config.TuneConfig] = None, run_config: Optional[ray.air.config.RunConfig] = None, _tuner_kwargs: Optional[Dict] = None, _tuner_internal: Optional[ray.tune.impl.tuner_internal.TunerInternal] = None)[source]#

Tuner is the recommended way of launching hyperparameter tuning jobs with Ray Tune.

Parameters
  • trainable – The trainable to be tuned.

  • param_space – Search space of the tuning job. One thing to note is that both preprocessor and dataset can be tuned here.

  • tune_config – Tuning algorithm specific configs. Refer to ray.tune.tune_config.TuneConfig for more info.

  • run_config – Runtime configuration that is specific to individual trials. If passed, this will overwrite the run config passed to the Trainer, if applicable. Refer to ray.air.config.RunConfig for more info.

Usage pattern:

from sklearn.datasets import load_breast_cancer

from ray import tune
from ray.data import from_pandas
from ray.air.config import RunConfig, ScalingConfig
from ray.train.xgboost import XGBoostTrainer
from ray.tune.tuner import Tuner

def get_dataset():
    data_raw = load_breast_cancer(as_frame=True)
    dataset_df = data_raw["data"]
    dataset_df["target"] = data_raw["target"]
    dataset = from_pandas(dataset_df)
    return dataset

trainer = XGBoostTrainer(
    label_column="target",
    params={},
    datasets={"train": get_dataset()},
)

param_space = {
    "scaling_config": ScalingConfig(
        num_workers=tune.grid_search([2, 4]),
        resources_per_worker={
            "CPU": tune.grid_search([1, 2]),
        },
    ),
    # You can even grid search various datasets in Tune.
    # "datasets": {
    #     "train": tune.grid_search(
    #         [ds1, ds2]
    #     ),
    # },
    "params": {
        "objective": "binary:logistic",
        "tree_method": "approx",
        "eval_metric": ["logloss", "error"],
        "eta": tune.loguniform(1e-4, 1e-1),
        "subsample": tune.uniform(0.5, 1.0),
        "max_depth": tune.randint(1, 9),
    },
}
tuner = Tuner(trainable=trainer, param_space=param_space,
    run_config=RunConfig(name="my_tune_run"))
analysis = tuner.fit()

To retry a failed tune run, you can then do

tuner = Tuner.restore(experiment_checkpoint_dir)
tuner.fit()

experiment_checkpoint_dir can be easily located near the end of the console output of your first failed run.

PublicAPI (beta): This API is in beta and may change before becoming stable.

tune.run_experiments#

ray.tune.run_experiments(experiments: Union[ray.tune.experiment.experiment.Experiment, Mapping, Sequence[Union[ray.tune.experiment.experiment.Experiment, Mapping]]], scheduler: Optional[ray.tune.schedulers.trial_scheduler.TrialScheduler] = None, server_port: Optional[int] = None, verbose: Union[int, ray.tune.utils.log.Verbosity] = Verbosity.V3_TRIAL_DETAILS, progress_reporter: Optional[ray.tune.progress_reporter.ProgressReporter] = None, resume: Union[bool, str] = False, reuse_actors: Optional[bool] = None, trial_executor: Optional[ray.tune.execution.ray_trial_executor.RayTrialExecutor] = None, raise_on_failed_trial: bool = True, concurrent: bool = True, callbacks: Optional[Sequence[ray.tune.callback.Callback]] = None, _remote: Optional[bool] = None)[source]#

Runs and blocks until all trials finish.

Example

>>> from ray.tune.experiment import Experiment
>>> from ray.tune.tune import run_experiments
>>> def my_func(config): return {"score": 0}
>>> experiment_spec = Experiment("experiment", my_func) 
>>> run_experiments(experiments=experiment_spec) 
>>> experiment_spec = {"experiment": {"run": my_func}} 
>>> run_experiments(experiments=experiment_spec) 
Returns

List of Trial objects, holding data for each executed trial.

PublicAPI: This API is stable across Ray releases.

tune.Experiment#

ray.tune.Experiment(name: str, run: Union[str, Callable, Type], *, stop: Optional[Union[Mapping, ray.tune.stopper.stopper.Stopper, Callable[[str, Mapping], bool]]] = None, time_budget_s: Optional[Union[int, float, datetime.timedelta]] = None, config: Optional[Dict[str, Any]] = None, resources_per_trial: Union[None, Mapping[str, Union[float, int, Mapping]], PlacementGroupFactory] = None, num_samples: int = 1, local_dir: Optional[str] = None, _experiment_checkpoint_dir: Optional[str] = None, sync_config: Optional[Union[ray.tune.syncer.SyncConfig, dict]] = None, checkpoint_config: Optional[Union[ray.air.config.CheckpointConfig, dict]] = None, trial_name_creator: Optional[Callable[[Trial], str]] = None, trial_dirname_creator: Optional[Callable[[Trial], str]] = None, log_to_file: bool = False, export_formats: Optional[Sequence] = None, max_failures: int = 0, restore: Optional[str] = None)[source]#

Tracks experiment specifications.

Implicitly registers the Trainable if needed. The args here take the same meaning as the arguments defined tune.py:run.

experiment_spec = Experiment(
    "my_experiment_name",
    my_func,
    stop={"mean_accuracy": 100},
    config={
        "alpha": tune.grid_search([0.2, 0.4, 0.6]),
        "beta": tune.grid_search([1, 2]),
    },
    resources_per_trial={
        "cpu": 1,
        "gpu": 0
    },
    num_samples=10,
    local_dir="~/ray_results",
    checkpoint_freq=10,
    max_failures=2)
Parameters
  • TODO (xwjiang) – Add the whole list.

  • _experiment_checkpoint_dir – Internal use only. If present, use this as the root directory for experiment checkpoint. If not present, the directory path will be deduced from trainable name instead.

DeveloperAPI: This API may change across minor Ray releases.

tune.SyncConfig#

ray.tune.SyncConfig(upload_dir: Optional[str] = None, syncer: Optional[Union[str, ray.tune.syncer.Syncer]] = 'auto', sync_period: int = 300, sync_timeout: int = 1800, sync_on_checkpoint: bool = True) None[source]#

Configuration object for Tune syncing.

See Appendix: Types of Tune Experiment Data for an overview of what data is synchronized.

If an upload_dir is specified, both experiment and trial checkpoints will be stored on remote (cloud) storage. Synchronization then only happens via uploading/downloading from this remote storage – no syncing will happen between nodes.

There are a few scenarios where syncing takes place:

  1. The Tune driver (on the head node) syncing the experiment directory to the cloud (which includes experiment state such as searcher state, the list of trials and their statuses, and trial metadata)

  2. Workers directly syncing trial checkpoints to the cloud

  3. Workers syncing their trial directories to the head node (this is the default option when no cloud storage is used)

See How to Configure Storage Options for a Distributed Tune Experiment? for more details and examples.

Parameters
  • upload_dir – Optional URI to sync training results and checkpoints to (e.g. s3://bucket, gs://bucket or hdfs://path). Specifying this will enable cloud-based checkpointing.

  • syncer – If upload_dir is specified, then this config accepts a custom syncer subclassing Syncer which will be used to synchronize checkpoints to/from cloud storage. If no upload_dir is specified, this config can be set to None, which disables the default worker-to-head-node syncing. Defaults to "auto" (auto detect), which assigns a default syncer that uses pyarrow to handle cloud storage syncing when upload_dir is provided.

  • sync_period – Minimum time in seconds to wait between two sync operations. A smaller sync_period will have more up-to-date data at the sync location but introduces more syncing overhead. Defaults to 5 minutes. Note: This applies to (1) and (3). Trial checkpoints are uploaded to the cloud synchronously on every checkpoint.

  • sync_timeout – Maximum time in seconds to wait for a sync process to finish running. This is used to catch hanging sync operations so that experiment execution can continue and the syncs can be retried. Defaults to 30 minutes. Note: Currently, this timeout only affects cloud syncing: (1) and (2).

  • sync_on_checkpoint – If True, a sync from a worker’s remote trial directory to the head node will be forced on every trial checkpoint, regardless of the sync_period. Defaults to True. Note: This is ignored if upload_dir is specified, since this only applies to worker-to-head-node syncing (3).

PublicAPI: This API is stable across Ray releases.