ray.rllib.algorithms.algorithm_config.AlgorithmConfig.experimental#

AlgorithmConfig.experimental(*, _enable_new_api_stack: bool | None = <ray.rllib.utils.from_config._NotProvided object>, _tf_policy_handles_more_than_one_loss: bool | None = <ray.rllib.utils.from_config._NotProvided object>, _disable_preprocessor_api: bool | None = <ray.rllib.utils.from_config._NotProvided object>, _disable_action_flattening: bool | None = <ray.rllib.utils.from_config._NotProvided object>, _disable_initialize_loss_from_dummy_batch: bool | None = <ray.rllib.utils.from_config._NotProvided object>, _disable_execution_plan_api=None) AlgorithmConfig[source]#

Sets the config’s experimental settings.

Parameters:
  • _enable_new_api_stack – Enables the new API stack, which will use RLModule (instead of ModelV2) as well as the multi-GPU capable Learner API (instead of using Policy to compute loss and update the model).

  • _tf_policy_handles_more_than_one_loss – Experimental flag. If True, TFPolicy will handle more than one loss/optimizer. Set this to True, if you would like to return more than one loss term from your loss_fn and an equal number of optimizers from your optimizer_fn. In the future, the default for this will be True.

  • _disable_preprocessor_api – Experimental flag. If True, no (observation) preprocessor will be created and observations will arrive in model as they are returned by the env. In the future, the default for this will be True.

  • _disable_action_flattening – Experimental flag. If True, RLlib will no longer flatten the policy-computed actions into a single tensor (for storage in SampleCollectors/output files/etc..), but leave (possibly nested) actions as-is. Disabling flattening affects: - SampleCollectors: Have to store possibly nested action structs. - Models that have the previous action(s) as part of their input. - Algorithms reading from offline files (incl. action information).

Returns:

This updated AlgorithmConfig object.