ray.rllib.algorithms.algorithm_config.AlgorithmConfig.training#

AlgorithmConfig.training(*, gamma: float | None = <ray.rllib.utils.from_config._NotProvided object>, lr: float | ~typing.List[~typing.List[int | float]] | None = <ray.rllib.utils.from_config._NotProvided object>, grad_clip: float | None = <ray.rllib.utils.from_config._NotProvided object>, grad_clip_by: str | None = <ray.rllib.utils.from_config._NotProvided object>, train_batch_size: int | None = <ray.rllib.utils.from_config._NotProvided object>, train_batch_size_per_learner: int | None = <ray.rllib.utils.from_config._NotProvided object>, model: dict | None = <ray.rllib.utils.from_config._NotProvided object>, optimizer: dict | None = <ray.rllib.utils.from_config._NotProvided object>, max_requests_in_flight_per_sampler_worker: int | None = <ray.rllib.utils.from_config._NotProvided object>, learner_class: ~typing.Type[Learner] | None = <ray.rllib.utils.from_config._NotProvided object>, learner_connector: ~typing.Callable[[RLModule], ConnectorV2 | ~typing.List[ConnectorV2]] | None = <ray.rllib.utils.from_config._NotProvided object>, add_default_connectors_to_learner_pipeline: bool | None = <ray.rllib.utils.from_config._NotProvided object>, _enable_learner_api: bool | None = <ray.rllib.utils.from_config._NotProvided object>) AlgorithmConfig[source]#

Sets the training related configuration.

Parameters:
  • gamma – Float specifying the discount factor of the Markov Decision process.

  • lr – The learning rate (float) or learning rate schedule in the format of [[timestep, lr-value], [timestep, lr-value], …] In case of a schedule, intermediary timesteps will be assigned to linearly interpolated learning rate values. A schedule config’s first entry must start with timestep 0, i.e.: [[0, initial_value], […]]. Note: If you require a) more than one optimizer (per RLModule), b) optimizer types that are not Adam, c) a learning rate schedule that is not a linearly interpolated, piecewise schedule as described above, or d) specifying c’tor arguments of the optimizer that are not the learning rate (e.g. Adam’s epsilon), then you must override your Learner’s configure_optimizer_for_module() method and handle lr-scheduling yourself.

  • grad_clip – If None, no gradient clipping will be applied. Otherwise, depending on the setting of grad_clip_by, the (float) value of grad_clip will have the following effect: If grad_clip_by=value: Will clip all computed gradients individually inside the interval [-grad_clip, +`grad_clip`]. If grad_clip_by=norm, will compute the L2-norm of each weight/bias gradient tensor individually and then clip all gradients such that these L2-norms do not exceed grad_clip. The L2-norm of a tensor is computed via: sqrt(SUM(w0^2, w1^2, ..., wn^2)) where w[i] are the elements of the tensor (no matter what the shape of this tensor is). If grad_clip_by=global_norm, will compute the square of the L2-norm of each weight/bias gradient tensor individually, sum up all these squared L2-norms across all given gradient tensors (e.g. the entire module to be updated), square root that overall sum, and then clip all gradients such that this global L2-norm does not exceed the given value. The global L2-norm over a list of tensors (e.g. W and V) is computed via: sqrt[SUM(w0^2, w1^2, ..., wn^2) + SUM(v0^2, v1^2, ..., vm^2)], where w[i] and v[j] are the elements of the tensors W and V (no matter what the shapes of these tensors are).

  • grad_clip_by – See grad_clip for the effect of this setting on gradient clipping. Allowed values are value, norm, and global_norm.

  • train_batch_size_per_learner – Train batch size per individual Learner worker. This setting only applies to the new API stack. The number of Learner workers can be set via config.resources( num_learner_workers=...). The total effective batch size is then num_learner_workers x train_batch_size_per_learner and can be accessed via the property AlgorithmConfig.total_train_batch_size.

  • train_batch_size – Training batch size, if applicable. When on the new API stack, this setting should no longer be used. Instead, use train_batch_size_per_learner (in combination with num_learner_workers).

  • model – Arguments passed into the policy model. See models/catalog.py for a full list of the available model options. TODO: Provide ModelConfig objects instead of dicts.

  • optimizer – Arguments to pass to the policy optimizer. This setting is not used when _enable_new_api_stack=True.

  • max_requests_in_flight_per_sampler_worker – Max number of inflight requests to each sampling worker. See the FaultTolerantActorManager class for more details. Tuning these values is important when running experimens with large sample batches, where there is the risk that the object store may fill up, causing spilling of objects to disk. This can cause any asynchronous requests to become very slow, making your experiment run slow as well. You can inspect the object store during your experiment via a call to ray memory on your headnode, and by using the ray dashboard. If you’re seeing that the object store is filling up, turn down the number of remote requests in flight, or enable compression in your experiment of timesteps.

  • learner_class – The Learner class to use for (distributed) updating of the RLModule. Only used when _enable_new_api_stack=True.

  • learner_connector – A callable taking an env observation space and an env action space as inputs and returning a learner ConnectorV2 (might be a pipeline) object.

  • add_default_connectors_to_learner_pipeline – If True (default), RLlib’s Learners will automatically add the default Learner ConnectorV2 pieces to the LearnerPipeline. These automatically perform: a) adding observations from episodes to the train batch, if this has not already been done by a user-provided connector piece b) if RLModule is stateful, add a time rank to the train batch, zero-pad the data, and add the correct state inputs, if this has not already been done by a user-provided connector piece. c) add all other information (actions, rewards, terminateds, etc..) to the train batch, if this has not already been done by a user-provided connector piece. Only if you know exactly what you are doing, you should set this setting to False. Note that this setting is only relevant if the new API stack is used (including the new EnvRunner classes).

Returns:

This updated AlgorithmConfig object.