ray.tune.schedulers.ResourceChangingScheduler#

class ray.tune.schedulers.ResourceChangingScheduler(base_scheduler: Optional[ray.tune.schedulers.trial_scheduler.TrialScheduler] = None, resources_allocation_function: Optional[Callable[[ray.tune.execution.trial_runner.TrialRunner, ray.tune.experiment.trial.Trial, Dict[str, Any], ray.tune.schedulers.resource_changing_scheduler.ResourceChangingScheduler], Optional[ray.tune.execution.placement_groups.PlacementGroupFactory]]] = <ray.tune.schedulers.resource_changing_scheduler.DistributeResources object>)[source]#

Bases: ray.tune.schedulers.trial_scheduler.TrialScheduler

A utility scheduler to dynamically change resources of live trials.

New in version 1.5.0.

Note

Experimental. API may change in future releases.

The ResourceChangingScheduler works by wrapping around any other scheduler and adjusting the resource requirements of live trials in response to the decisions of the wrapped scheduler through a user-specified resources_allocation_function.

An example of such a function can be found in XGBoost Dynamic Resources Example.

If the functional API is used, the current trial resources can be obtained by calling tune.get_trial_resources() inside the training function. The function should be able to load and save checkpoints (the latter preferably every iteration).

If the Trainable (class) API is used, you can obtain the current trial resources through the Trainable.trial_resources property.

Cannot be used if reuse_actors is True in tune.TuneConfig(). A ValueError will be raised in that case.

Parameters
  • base_scheduler – The scheduler to provide decisions about trials. If None, a default FIFOScheduler will be used.

  • resources_allocation_function – The callable used to change live trial resource requiements during tuning. This callable will be called on each trial as it finishes one step of training. The callable must take four arguments: TrialRunner, current Trial, current result dict and the ResourceChangingScheduler calling it. The callable must return a PlacementGroupFactory or None (signifying no need for an update). If resources_allocation_function is None, no resource requirements will be changed at any time. By default, DistributeResources will be used, distributing available CPUs and GPUs over all running trials in a robust way, without any prioritization.

Warning

If the resources_allocation_function sets trial resource requirements to values bigger than possible, the trial will not run. Ensure that your callable accounts for that possibility by setting upper limits. Consult DistributeResources to see how that may be done.

Example

base_scheduler = ASHAScheduler(max_t=16)
def my_resources_allocation_function(
    trial_runner: "trial_runner.TrialRunner",
    trial: Trial,
    result: Dict[str, Any],
    scheduler: "ResourceChangingScheduler"
) -> Optional[Union[PlacementGroupFactory, Resource]]:
    # logic here
    # usage of PlacementGroupFactory is strongly preferred
    return PlacementGroupFactory(...)
scheduler = ResourceChangingScheduler(
                base_scheduler,
                my_resources_allocation_function
            )

See XGBoost Dynamic Resources Example for a more detailed example.

PublicAPI (beta): This API is in beta and may change before becoming stable.

set_search_properties(metric: Optional[str], mode: Optional[str], **spec) bool[source]#

Pass search properties to scheduler.

This method acts as an alternative to instantiating schedulers that react to metrics with their own metric and mode parameters.

Parameters
  • metric – Metric to optimize

  • mode – One of [β€œmin”, β€œmax”]. Direction to optimize.

  • **spec – Any kwargs for forward compatiblity. Info like Experiment.PUBLIC_KEYS is provided through here.

on_trial_add(trial_runner: ray.tune.execution.trial_runner.TrialRunner, trial: ray.tune.experiment.trial.Trial, **kwargs)[source]#

Called when a new trial is added to the trial runner.

on_trial_error(trial_runner: ray.tune.execution.trial_runner.TrialRunner, trial: ray.tune.experiment.trial.Trial, **kwargs)[source]#

Notification for the error of trial.

This will only be called when the trial is in the RUNNING state.

on_trial_result(trial_runner: ray.tune.execution.trial_runner.TrialRunner, trial: ray.tune.experiment.trial.Trial, result: Dict) str[source]#

Called on each intermediate result returned by a trial.

At this point, the trial scheduler can make a decision by returning one of CONTINUE, PAUSE, and STOP. This will only be called when the trial is in the RUNNING state.

on_trial_complete(trial_runner: ray.tune.execution.trial_runner.TrialRunner, trial: ray.tune.experiment.trial.Trial, result: Dict, **kwargs)[source]#

Notification for the completion of trial.

This will only be called when the trial is in the RUNNING state and either completes naturally or by manual termination.

on_trial_remove(trial_runner: ray.tune.execution.trial_runner.TrialRunner, trial: ray.tune.experiment.trial.Trial, **kwargs)[source]#

Called to remove trial.

This is called when the trial is in PAUSED or PENDING state. Otherwise, call on_trial_complete.

choose_trial_to_run(trial_runner: ray.tune.execution.trial_runner.TrialRunner, **kwargs) Optional[ray.tune.experiment.trial.Trial][source]#

Called to choose a new trial to run.

This should return one of the trials in trial_runner that is in the PENDING or PAUSED state. This function must be idempotent.

If no trial is ready, return None.

debug_string() str[source]#

Returns a human readable message for printing to the console.

save(checkpoint_path: str)[source]#

Save trial scheduler to a checkpoint

restore(checkpoint_path: str)[source]#

Restore trial scheduler from checkpoint.

set_trial_resources(trial: ray.tune.experiment.trial.Trial, new_resources: Union[Dict, ray.tune.execution.placement_groups.PlacementGroupFactory]) bool[source]#

Returns True if new_resources were set.

reallocate_trial_resources_if_needed(trial_runner: ray.tune.execution.trial_runner.TrialRunner, trial: ray.tune.experiment.trial.Trial, result: Dict) Optional[Union[dict, ray.tune.execution.placement_groups.PlacementGroupFactory]][source]#

Calls user defined resources_allocation_function. If the returned resources are not none and not the same as currently present, returns them. Otherwise, returns None.