ray.rllib.algorithms.algorithm_config.AlgorithmConfig.reporting#

AlgorithmConfig.reporting(*, keep_per_episode_custom_metrics: bool | None = <ray.rllib.utils.from_config._NotProvided object>, metrics_episode_collection_timeout_s: float | None = <ray.rllib.utils.from_config._NotProvided object>, metrics_num_episodes_for_smoothing: int | None = <ray.rllib.utils.from_config._NotProvided object>, min_time_s_per_iteration: float | None = <ray.rllib.utils.from_config._NotProvided object>, min_train_timesteps_per_iteration: int | None = <ray.rllib.utils.from_config._NotProvided object>, min_sample_timesteps_per_iteration: int | None = <ray.rllib.utils.from_config._NotProvided object>, log_gradients: bool | None = <ray.rllib.utils.from_config._NotProvided object>, custom_stats_cls_lookup: ~typing.Dict[str, ~typing.Type[~ray.rllib.utils.metrics.stats.base.StatsBase]] | None = <ray.rllib.utils.from_config._NotProvided object>) Self[source]#

Sets the config’s reporting settings.

Parameters:
  • keep_per_episode_custom_metrics – Store raw custom metrics without calculating max, min, mean

  • metrics_episode_collection_timeout_s – Wait for metric batches for at most this many seconds. Those that have not returned in time are collected in the next train iteration.

  • metrics_num_episodes_for_smoothing – Smooth rollout metrics over this many episodes, if possible. In case rollouts (sample collection) just started, there may be fewer than this many episodes in the buffer and we’ll compute metrics over this smaller number of available episodes. In case there are more than this many episodes collected in a single training iteration, use all of these episodes for metrics computation, meaning don’t ever cut any “excess” episodes. Set this to 1 to disable smoothing and to always report only the most recently collected episode’s return.

  • min_time_s_per_iteration – Minimum time (in sec) to accumulate within a single Algorithm.train() call. This value does not affect learning, only the number of times Algorithm.training_step() is called by Algorithm.train(). If - after one such step attempt, the time taken has not reached min_time_s_per_iteration, performs n more Algorithm.training_step() calls until the minimum time has been consumed. Set to 0 or None for no minimum time.

  • min_train_timesteps_per_iteration – Minimum training timesteps to accumulate within a single train() call. This value does not affect learning, only the number of times Algorithm.training_step() is called by Algorithm.train(). If - after one such step attempt, the training timestep count has not been reached, performs n more training_step() calls until the minimum timesteps have been executed. Set to 0 or None for no minimum timesteps.

  • min_sample_timesteps_per_iteration – Minimum env sampling timesteps to accumulate within a single train() call. This value does not affect learning, only the number of times Algorithm.training_step() is called by Algorithm.train(). If - after one such step attempt, the env sampling timestep count has not been reached, performs n more training_step() calls until the minimum timesteps have been executed. Set to 0 or None for no minimum timesteps.

  • log_gradients – Log gradients to results. If this is True the global norm of the gradients dictionary for each optimizer is logged to results. The default is False.

  • custom_stats_cls_lookup – A dictionary mapping stat names to their corresponding Stats classes. The Stats classes should be subclasses of StatsBase. The keys of the dictionary are the stat names, and the values are the corresponding Stats classes. This allows you to use your own Stats classes for logging metrics. You can replace existing values to override some behaviour of RLlib. You can add key-value-pairs to the dictionary to add new stats classes that will be available when logging values with the MetricsLogger throughout RLlib.

Returns:

This updated AlgorithmConfig object.