AlgorithmConfig.evaluation(*, evaluation_interval: int | None = <ray.rllib.utils.from_config._NotProvided object>, evaluation_duration: int | str | None = <ray.rllib.utils.from_config._NotProvided object>, evaluation_duration_unit: str | None = <ray.rllib.utils.from_config._NotProvided object>, evaluation_sample_timeout_s: float | None = <ray.rllib.utils.from_config._NotProvided object>, evaluation_parallel_to_training: bool | None = <ray.rllib.utils.from_config._NotProvided object>, evaluation_config: ~ray.rllib.algorithms.algorithm_config.AlgorithmConfig | dict | None = <ray.rllib.utils.from_config._NotProvided object>, off_policy_estimation_methods: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, ope_split_batch_by_episode: bool | None = <ray.rllib.utils.from_config._NotProvided object>, evaluation_num_workers: int | None = <ray.rllib.utils.from_config._NotProvided object>, custom_evaluation_function: ~typing.Callable | None = <ray.rllib.utils.from_config._NotProvided object>, always_attach_evaluation_results: bool | None = <ray.rllib.utils.from_config._NotProvided object>, enable_async_evaluation: bool | None = <ray.rllib.utils.from_config._NotProvided object>, evaluation_num_episodes=-1) AlgorithmConfig[source]#

Sets the config’s evaluation settings.

  • evaluation_interval – Evaluate with every evaluation_interval training iterations. The evaluation stats will be reported under the “evaluation” metric key. Note that for Ape-X metrics are already only reported for the lowest epsilon workers (least random workers). Set to None (or 0) for no evaluation.

  • evaluation_duration – Duration for which to run evaluation each evaluation_interval. The unit for the duration can be set via evaluation_duration_unit to either “episodes” (default) or “timesteps”. If using multiple evaluation workers (evaluation_num_workers > 1), the load to run will be split amongst these. If the value is “auto”: - For evaluation_parallel_to_training=True: Will run as many episodes/timesteps that fit into the (parallel) training step. - For evaluation_parallel_to_training=False: Error.

  • evaluation_duration_unit – The unit, with which to count the evaluation duration. Either “episodes” (default) or “timesteps”.

  • evaluation_sample_timeout_s – The timeout (in seconds) for the ray.get call to the remote evaluation worker(s) sample() method. After this time, the user will receive a warning and instructions on how to fix the issue. This could be either to make sure the episode ends, increasing the timeout, or switching to evaluation_duration_unit=timesteps.

  • evaluation_parallel_to_training – Whether to run evaluation in parallel to a Algorithm.train() call using threading. Default=False. E.g. evaluation_interval=2 -> For every other training iteration, the Algorithm.train() and Algorithm.evaluate() calls run in parallel. Note: This is experimental. Possible pitfalls could be race conditions for weight synching at the beginning of the evaluation loop.

  • evaluation_config – Typical usage is to pass extra args to evaluation env creator and to disable exploration by computing deterministic actions. IMPORTANT NOTE: Policy gradient algorithms are able to find the optimal policy, even if this is a stochastic one. Setting “explore=False” here will result in the evaluation workers not using this optimal policy!

  • off_policy_estimation_methods – Specify how to evaluate the current policy, along with any optional config parameters. This only has an effect when reading offline experiences (“input” is not “sampler”). Available keys: {ope_method_name: {“type”: ope_type, …}} where ope_method_name is a user-defined string to save the OPE results under, and ope_type can be any subclass of OffPolicyEstimator, e.g. ray.rllib.offline.estimators.is::ImportanceSampling or your own custom subclass, or the full class path to the subclass. You can also add additional config arguments to be passed to the OffPolicyEstimator in the dict, e.g. {“qreg_dr”: {“type”: DoublyRobust, “q_model_type”: “qreg”, “k”: 5}}

  • ope_split_batch_by_episode – Whether to use SampleBatch.split_by_episode() to split the input batch to episodes before estimating the ope metrics. In case of bandits you should make this False to see improvements in ope evaluation speed. In case of bandits, it is ok to not split by episode, since each record is one timestep already. The default is True.

  • evaluation_num_workers – Number of parallel workers to use for evaluation. Note that this is set to zero by default, which means evaluation will be run in the algorithm process (only if evaluation_interval is not None). If you increase this, it will increase the Ray resource usage of the algorithm since evaluation workers are created separately from rollout workers (used to sample data for training).

  • custom_evaluation_function – Customize the evaluation method. This must be a function of signature (algo: Algorithm, eval_workers: WorkerSet) -> metrics: dict. See the Algorithm.evaluate() method to see the default implementation. The Algorithm guarantees all eval workers have the latest policy state before this function is called.

  • always_attach_evaluation_results – Make sure the latest available evaluation results are always attached to a step result dict. This may be useful if Tune or some other meta controller needs access to evaluation metrics all the time.

  • enable_async_evaluation – If True, use an AsyncRequestsManager for the evaluation workers and use this manager to send sample() requests to the evaluation workers. This way, the Algorithm becomes more robust against long running episodes and/or failing (and restarting) workers.


This updated AlgorithmConfig object.