Schedules API

Schedules are used to compute values (in python, PyTorch or TensorFlow) based on a (int64) timestep input. The computed values are usually float32 types.

Base Schedule class (ray.rllib.utils.schedules.schedule.Schedule)

class ray.rllib.utils.schedules.schedule.Schedule(framework)[source]

Schedule classes implement various time-dependent scheduling schemas.

  • Constant behavior.

  • Linear decay.

  • Piecewise decay.

  • Exponential decay.

Useful for backend-agnostic rate/weight changes for learning rates, exploration epsilons, beta parameters for prioritized replay, loss weights decay, etc..

Each schedule can be called directly with the t (absolute time step) value and returns the value dependent on the Schedule and the passed time.

value(t: Union[int, Any]) Any[source]

Generates the value given a timestep (based on schedule’s logic).

Parameters

t – The time step. This could be a tf.Tensor.

Returns

The calculated value depending on the schedule and t.

All built-in Schedule classes

class ray.rllib.utils.schedules.constant_schedule.ConstantSchedule(value: float, framework: Optional[str] = None)[source]

A Schedule where the value remains constant over time.

__init__(value: float, framework: Optional[str] = None)[source]

Initializes a ConstantSchedule instance.

Parameters
  • value – The constant value to return, independently of time.

  • framework – The framework descriptor string, e.g. “tf”, “torch”, or None.

class ray.rllib.utils.schedules.linear_schedule.LinearSchedule(**kwargs)[source]

Linear interpolation between initial_p and final_p.

Uses PolynomialSchedule with power=1.0.

The formula is: value = final_p + (initial_p - final_p) * (1 - t/t_max)

__init__(**kwargs)[source]

Initializes a LinearSchedule instance.

class ray.rllib.utils.schedules.polynomial_schedule.PolynomialSchedule(schedule_timesteps: int, final_p: float, framework: Optional[str], initial_p: float = 1.0, power: float = 2.0)[source]

Polynomial interpolation between initial_p and final_p.

Over schedule_timesteps. After this many time steps, always returns final_p.

__init__(schedule_timesteps: int, final_p: float, framework: Optional[str], initial_p: float = 1.0, power: float = 2.0)[source]

Initializes a PolynomialSchedule instance.

Parameters
  • schedule_timesteps – Number of time steps for which to linearly anneal initial_p to final_p

  • final_p – Final output value.

  • framework – The framework descriptor string, e.g. “tf”, “torch”, or None.

  • initial_p – Initial output value.

  • power – The exponent to use (default: quadratic).

class ray.rllib.utils.schedules.exponential_schedule.ExponentialSchedule(schedule_timesteps: int, framework: Optional[str] = None, initial_p: float = 1.0, decay_rate: float = 0.1)[source]

Exponential decay schedule from initial_p to final_p.

Reduces output over schedule_timesteps. After this many time steps always returns final_p.

__init__(schedule_timesteps: int, framework: Optional[str] = None, initial_p: float = 1.0, decay_rate: float = 0.1)[source]

Initializes a ExponentialSchedule instance.

Parameters
  • schedule_timesteps – Number of time steps for which to linearly anneal initial_p to final_p.

  • framework – The framework descriptor string, e.g. “tf”, “torch”, or None.

  • initial_p – Initial output value.

  • decay_rate – The percentage of the original value after 100% of the time has been reached (see formula above). >0.0: The smaller the decay-rate, the stronger the decay. 1.0: No decay at all.

class ray.rllib.utils.schedules.piecewise_schedule.PiecewiseSchedule(endpoints: List[Tuple[int, float]], framework: Optional[str] = None, interpolation: Callable[[Any, Any, Any], Any] = <function _linear_interpolation>, outside_value: Optional[float] = None)[source]
__init__(endpoints: List[Tuple[int, float]], framework: Optional[str] = None, interpolation: Callable[[Any, Any, Any], Any] = <function _linear_interpolation>, outside_value: Optional[float] = None)[source]

Initializes a PiecewiseSchedule instance.

Parameters
  • endpoints – A list of tuples (t, value) such that the output is an interpolation (given by the interpolation callable) between two values. E.g. t=400 and endpoints=[(0, 20.0),(500, 30.0)] output=20.0 + 0.8 * (30.0 - 20.0) = 28.0 NOTE: All the values for time must be sorted in an increasing order.

  • framework – The framework descriptor string, e.g. “tf”, “torch”, or None.

  • interpolation – A function that takes the left-value, the right-value and an alpha interpolation parameter (0.0=only left value, 1.0=only right value), which is the fraction of distance from left endpoint to right endpoint.

  • outside_value – If t in call to value is outside of all the intervals in endpoints this value is returned. If None then an AssertionError is raised when outside value is requested.