AlgorithmConfig.training(gamma: Optional[float] = <ray.rllib.utils.from_config._NotProvided object>, lr: Optional[float] = <ray.rllib.utils.from_config._NotProvided object>, train_batch_size: Optional[int] = <ray.rllib.utils.from_config._NotProvided object>, model: Optional[dict] = <ray.rllib.utils.from_config._NotProvided object>, optimizer: Optional[dict] = <ray.rllib.utils.from_config._NotProvided object>, max_requests_in_flight_per_sampler_worker: Optional[int] = <ray.rllib.utils.from_config._NotProvided object>, _enable_learner_api: Optional[bool] = <ray.rllib.utils.from_config._NotProvided object>, learner_class: Optional[Type[Learner]] = <ray.rllib.utils.from_config._NotProvided object>) AlgorithmConfig[source]#

Sets the training related configuration.

  • gamma – Float specifying the discount factor of the Markov Decision process.

  • lr – The default learning rate.

  • train_batch_size – Training batch size, if applicable.

  • model – Arguments passed into the policy model. See models/catalog.py for a full list of the available model options. TODO: Provide ModelConfig objects instead of dicts.

  • optimizer – Arguments to pass to the policy optimizer.

  • max_requests_in_flight_per_sampler_worker – Max number of inflight requests to each sampling worker. See the FaultTolerantActorManager class for more details. Tuning these values is important when running experimens with large sample batches, where there is the risk that the object store may fill up, causing spilling of objects to disk. This can cause any asynchronous requests to become very slow, making your experiment run slow as well. You can inspect the object store during your experiment via a call to ray memory on your headnode, and by using the ray dashboard. If you’re seeing that the object store is filling up, turn down the number of remote requests in flight, or enable compression in your experiment of timesteps.

  • _enable_learner_api – Whether to enable the LearnerGroup and Learner for training. This API uses ray.train to run the training loop which allows for a more flexible distributed training.


This updated AlgorithmConfig object.