Schedules API
Contents
Schedules API#
Schedules are used to compute values (in python, PyTorch or TensorFlow) based on a (int64) timestep input. The computed values are usually float32 types.
Base Schedule class (ray.rllib.utils.schedules.schedule.Schedule)#
- class ray.rllib.utils.schedules.schedule.Schedule(framework)[source]#
Schedule classes implement various time-dependent scheduling schemas.
Constant behavior.
Linear decay.
Piecewise decay.
Exponential decay.
Useful for backend-agnostic rate/weight changes for learning rates, exploration epsilons, beta parameters for prioritized replay, loss weights decay, etc..
Each schedule can be called directly with the
t
(absolute time step) value and returns the value dependent on the Schedule and the passed time.
All built-in Schedule classes#
- class ray.rllib.utils.schedules.constant_schedule.ConstantSchedule(value: float, framework: Optional[str] = None)[source]#
A Schedule where the value remains constant over time.
- class ray.rllib.utils.schedules.linear_schedule.LinearSchedule(**kwargs)[source]#
Linear interpolation between
initial_p
andfinal_p
.Uses
PolynomialSchedule
with power=1.0.The formula is: value =
final_p
+ (initial_p
-final_p
) * (1 -t
/t_max)
- class ray.rllib.utils.schedules.polynomial_schedule.PolynomialSchedule(schedule_timesteps: int, final_p: float, framework: Optional[str], initial_p: float = 1.0, power: float = 2.0)[source]#
Polynomial interpolation between
initial_p
andfinal_p
.Over
schedule_timesteps
. After this many time steps, always returnsfinal_p
.- __init__(schedule_timesteps: int, final_p: float, framework: Optional[str], initial_p: float = 1.0, power: float = 2.0)[source]#
Initializes a PolynomialSchedule instance.
- Parameters
schedule_timesteps – Number of time steps for which to linearly anneal initial_p to final_p
final_p – Final output value.
framework – The framework descriptor string, e.g. “tf”, “torch”, or None.
initial_p – Initial output value.
power – The exponent to use (default: quadratic).
- class ray.rllib.utils.schedules.exponential_schedule.ExponentialSchedule(schedule_timesteps: int, framework: Optional[str] = None, initial_p: float = 1.0, decay_rate: float = 0.1)[source]#
Exponential decay schedule from
initial_p
tofinal_p
.Reduces output over
schedule_timesteps
. After this many time steps always returnsfinal_p
.- __init__(schedule_timesteps: int, framework: Optional[str] = None, initial_p: float = 1.0, decay_rate: float = 0.1)[source]#
Initializes a ExponentialSchedule instance.
- Parameters
schedule_timesteps – Number of time steps for which to linearly anneal initial_p to final_p.
framework – The framework descriptor string, e.g. “tf”, “torch”, or None.
initial_p – Initial output value.
decay_rate – The percentage of the original value after 100% of the time has been reached (see formula above). >0.0: The smaller the decay-rate, the stronger the decay. 1.0: No decay at all.
- class ray.rllib.utils.schedules.piecewise_schedule.PiecewiseSchedule(endpoints: List[Tuple[int, float]], framework: Optional[str] = None, interpolation: Callable[[Union[numpy.array, tf.Tensor, torch.Tensor], Union[numpy.array, tf.Tensor, torch.Tensor], Union[numpy.array, tf.Tensor, torch.Tensor]], Union[numpy.array, tf.Tensor, torch.Tensor]] = <function _linear_interpolation>, outside_value: Optional[float] = None)[source]#
- __init__(endpoints: List[Tuple[int, float]], framework: Optional[str] = None, interpolation: Callable[[Union[numpy.array, tf.Tensor, torch.Tensor], Union[numpy.array, tf.Tensor, torch.Tensor], Union[numpy.array, tf.Tensor, torch.Tensor]], Union[numpy.array, tf.Tensor, torch.Tensor]] = <function _linear_interpolation>, outside_value: Optional[float] = None)[source]#
Initializes a PiecewiseSchedule instance.
- Parameters
endpoints – A list of tuples
(t, value)
such that the output is an interpolation (given by theinterpolation
callable) between two values. E.g. t=400 and endpoints=[(0, 20.0),(500, 30.0)] output=20.0 + 0.8 * (30.0 - 20.0) = 28.0 NOTE: All the values for time must be sorted in an increasing order.framework – The framework descriptor string, e.g. “tf”, “torch”, or None.
interpolation – A function that takes the left-value, the right-value and an alpha interpolation parameter (0.0=only left value, 1.0=only right value), which is the fraction of distance from left endpoint to right endpoint.
outside_value – If t in call to
value
is outside of all the intervals inendpoints
this value is returned. If None then an AssertionError is raised when outside value is requested.