ray.rllib.algorithms.algorithm_config.AlgorithmConfig.reporting#

AlgorithmConfig.reporting(*, keep_per_episode_custom_metrics: bool | None = <ray.rllib.utils.from_config._NotProvided object>, metrics_episode_collection_timeout_s: float | None = <ray.rllib.utils.from_config._NotProvided object>, metrics_num_episodes_for_smoothing: int | None = <ray.rllib.utils.from_config._NotProvided object>, min_time_s_per_iteration: float | None = <ray.rllib.utils.from_config._NotProvided object>, min_train_timesteps_per_iteration: int | None = <ray.rllib.utils.from_config._NotProvided object>, min_sample_timesteps_per_iteration: int | None = <ray.rllib.utils.from_config._NotProvided object>, log_gradients: bool | None = <ray.rllib.utils.from_config._NotProvided object>) → AlgorithmConfig[source]#

Sets the config’s reporting settings.

Parameters:

keep_per_episode_custom_metrics – Store raw custom metrics without calculating max, min, mean
metrics_episode_collection_timeout_s – Wait for metric batches for at most this many seconds. Those that have not returned in time are collected in the next train iteration.
metrics_num_episodes_for_smoothing – Smooth rollout metrics over this many episodes, if possible. In case rollouts (sample collection) just started, there may be fewer than this many episodes in the buffer and we’ll compute metrics over this smaller number of available episodes. In case there are more than this many episodes collected in a single training iteration, use all of these episodes for metrics computation, meaning don’t ever cut any “excess” episodes. Set this to 1 to disable smoothing and to always report only the most recently collected episode’s return.
min_time_s_per_iteration – Minimum time (in sec) to accumulate within a single Algorithm.train() call. This value does not affect learning, only the number of times Algorithm.training_step() is called by Algorithm.train(). If - after one such step attempt, the time taken has not reached min_time_s_per_iteration, performs n more Algorithm.training_step() calls until the minimum time has been consumed. Set to 0 or None for no minimum time.
min_train_timesteps_per_iteration – Minimum training timesteps to accumulate within a single train() call. This value does not affect learning, only the number of times Algorithm.training_step() is called by Algorithm.train(). If - after one such step attempt, the training timestep count has not been reached, performs n more training_step() calls until the minimum timesteps have been executed. Set to 0 or None for no minimum timesteps.
min_sample_timesteps_per_iteration – Minimum env sampling timesteps to accumulate within a single train() call. This value does not affect learning, only the number of times Algorithm.training_step() is called by Algorithm.train(). If - after one such step attempt, the env sampling timestep count has not been reached, performs n more training_step() calls until the minimum timesteps have been executed. Set to 0 or None for no minimum timesteps.
log_gradients – Log gradients to results. If this is True the global norm of the gradients dictionariy for each optimizer is logged to results. The default is True.

Returns:

This updated AlgorithmConfig object.