ray.tune.schedulers.ResourceChangingScheduler#

class ray.tune.schedulers.ResourceChangingScheduler(base_scheduler: ~ray.tune.schedulers.trial_scheduler.TrialScheduler | None = None, resources_allocation_function: ~typing.Callable[[TuneController, ~ray.tune.experiment.trial.Trial, ~typing.Dict[str, ~typing.Any], ResourceChangingScheduler], ~ray.tune.execution.placement_groups.PlacementGroupFactory | None] | None = <ray.tune.schedulers.resource_changing_scheduler.DistributeResources object>)[source]#

Bases: TrialScheduler

A utility scheduler to dynamically change resources of live trials.

Added in version 1.5.0.

Note

Experimental. API may change in future releases.

The ResourceChangingScheduler works by wrapping around any other scheduler and adjusting the resource requirements of live trials in response to the decisions of the wrapped scheduler through a user-specified resources_allocation_function.

An example of such a function can be found in XGBoost Dynamic Resources Example.

If the functional API is used, the current trial resources can be obtained by calling tune.get_trial_resources() inside the training function. The function should be able to load and save checkpoints (the latter preferably every iteration).

If the Trainable (class) API is used, you can obtain the current trial resources through the Trainable.trial_resources property.

Cannot be used if reuse_actors is True in tune.TuneConfig(). A ValueError will be raised in that case.

Parameters:
  • base_scheduler – The scheduler to provide decisions about trials. If None, a default FIFOScheduler will be used.

  • resources_allocation_function – The callable used to change live trial resource requiements during tuning. This callable will be called on each trial as it finishes one step of training. The callable must take four arguments: TrialRunner, current Trial, current result dict and the ResourceChangingScheduler calling it. The callable must return a PlacementGroupFactory or None (signifying no need for an update). If resources_allocation_function is None, no resource requirements will be changed at any time. By default, DistributeResources will be used, distributing available CPUs and GPUs over all running trials in a robust way, without any prioritization.

Warning

If the resources_allocation_function sets trial resource requirements to values bigger than possible, the trial will not run. Ensure that your callable accounts for that possibility by setting upper limits. Consult DistributeResources to see how that may be done.

Example

base_scheduler = ASHAScheduler(max_t=16)
def my_resources_allocation_function(
    tune_controller: "TuneController",
    trial: Trial,
    result: Dict[str, Any],
    scheduler: "ResourceChangingScheduler"
) -> Optional[Union[PlacementGroupFactory, Resource]]:
    # logic here
    # usage of PlacementGroupFactory is strongly preferred
    return PlacementGroupFactory(...)
scheduler = ResourceChangingScheduler(
                base_scheduler,
                my_resources_allocation_function
            )

See XGBoost Dynamic Resources Example for a more detailed example.

PublicAPI (beta): This API is in beta and may change before becoming stable.

Methods

reallocate_trial_resources_if_needed

Calls user defined resources_allocation_function.

set_trial_resources

Returns True if new_resources were set.

Attributes

CONTINUE

Status for continuing trial execution

NOOP

PAUSE

Status for pausing trial execution

STOP

Status for stopping trial execution

base_trial_resources

metric

supports_buffered_results