ray.rllib.algorithms.algorithm_config.AlgorithmConfig.experimental#
- AlgorithmConfig.experimental(*, _torch_grad_scaler_class: ~typing.Type | None = <ray.rllib.utils.from_config._NotProvided object>, _torch_lr_scheduler_classes: ~typing.List[~typing.Type] | ~typing.Dict[str, ~typing.List[~typing.Type]] | None = <ray.rllib.utils.from_config._NotProvided object>, _tf_policy_handles_more_than_one_loss: bool | None = <ray.rllib.utils.from_config._NotProvided object>, _disable_preprocessor_api: bool | None = <ray.rllib.utils.from_config._NotProvided object>, _disable_action_flattening: bool | None = <ray.rllib.utils.from_config._NotProvided object>, _disable_initialize_loss_from_dummy_batch: bool | None = <ray.rllib.utils.from_config._NotProvided object>, _enable_new_api_stack=-1) AlgorithmConfig [source]#
Sets the config’s experimental settings.
- Parameters:
_torch_grad_scaler_class – Class to use for torch loss scaling (and gradient unscaling). The class must implement the following methods to be compatible with a
TorchLearner
. These methods/APIs match exactly those of torch’s owntorch.amp.GradScaler
(see here for more details https://pytorch.org/docs/stable/amp.html#gradient-scaling):scale([loss])
to scale the loss by some factor.get_scale()
to get the current scale factor value.step([optimizer])
to unscale the grads (divide by the scale factor) and step the given optimizer.update()
to update the scaler after an optimizer step (for example to adjust the scale factor)._torch_lr_scheduler_classes – A list of
torch.lr_scheduler.LRScheduler
(see here for more details https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate) classes or a dictionary mapping module IDs to such a list of respective scheduler classes. Multiple scheduler classes can be applied in sequence and are stepped in the same sequence as defined here. Note, most learning rate schedulers need arguments to be configured, that is, you might have to partially initialize the schedulers in the list(s) usingfunctools.partial
._tf_policy_handles_more_than_one_loss – Experimental flag. If True, TFPolicy handles more than one loss or optimizer. Set this to True, if you would like to return more than one loss term from your
loss_fn
and an equal number of optimizers from youroptimizer_fn
._disable_preprocessor_api – Experimental flag. If True, no (observation) preprocessor is created and observations arrive in model as they are returned by the env.
_disable_action_flattening – Experimental flag. If True, RLlib doesn’t flatten the policy-computed actions into a single tensor (for storage in SampleCollectors/output files/etc..), but leave (possibly nested) actions as-is. Disabling flattening affects: - SampleCollectors: Have to store possibly nested action structs. - Models that have the previous action(s) as part of their input. - Algorithms reading from offline files (incl. action information).
- Returns:
This updated AlgorithmConfig object.