ray.rllib.algorithms.algorithm_config.AlgorithmConfig.framework#

AlgorithmConfig.framework(framework: str | None = <ray.rllib.utils.from_config._NotProvided object>, *, eager_tracing: bool | None = <ray.rllib.utils.from_config._NotProvided object>, eager_max_retraces: int | None = <ray.rllib.utils.from_config._NotProvided object>, tf_session_args: ~typing.Dict[str, ~typing.Any] | None = <ray.rllib.utils.from_config._NotProvided object>, local_tf_session_args: ~typing.Dict[str, ~typing.Any] | None = <ray.rllib.utils.from_config._NotProvided object>, torch_compile_learner: bool | None = <ray.rllib.utils.from_config._NotProvided object>, torch_compile_learner_what_to_compile: str | None = <ray.rllib.utils.from_config._NotProvided object>, torch_compile_learner_dynamo_mode: str | None = <ray.rllib.utils.from_config._NotProvided object>, torch_compile_learner_dynamo_backend: str | None = <ray.rllib.utils.from_config._NotProvided object>, torch_compile_worker: bool | None = <ray.rllib.utils.from_config._NotProvided object>, torch_compile_worker_dynamo_backend: str | None = <ray.rllib.utils.from_config._NotProvided object>, torch_compile_worker_dynamo_mode: str | None = <ray.rllib.utils.from_config._NotProvided object>, torch_ddp_kwargs: ~typing.Dict[str, ~typing.Any] | None = <ray.rllib.utils.from_config._NotProvided object>, torch_skip_nan_gradients: bool | None = <ray.rllib.utils.from_config._NotProvided object>) AlgorithmConfig[source]#

Sets the config’s DL framework settings.

Parameters:
  • framework – torch: PyTorch; tf2: TensorFlow 2.x (eager execution or traced if eager_tracing=True); tf: TensorFlow (static-graph);

  • eager_tracing – Enable tracing in eager mode. This greatly improves performance (speedup ~2x), but makes it slightly harder to debug since Python code won’t be evaluated after the initial eager pass. Only possible if framework=tf2.

  • eager_max_retraces – Maximum number of tf.function re-traces before a runtime error is raised. This is to prevent unnoticed retraces of methods inside the ..._eager_traced Policy, which could slow down execution by a factor of 4, without the user noticing what the root cause for this slowdown could be. Only necessary for framework=tf2. Set to None to ignore the re-trace count and never throw an error.

  • tf_session_args – Configures TF for single-process operation by default.

  • local_tf_session_args – Override the following tf session args on the local worker

  • torch_compile_learner – If True, forward_train methods on TorchRLModules on the learner are compiled. If not specified, the default is to compile forward train on the learner.

  • torch_compile_learner_what_to_compile – A TorchCompileWhatToCompile mode specifying what to compile on the learner side if torch_compile_learner is True. See TorchCompileWhatToCompile for details and advice on its usage.

  • torch_compile_learner_dynamo_backend – The torch dynamo backend to use on the learner.

  • torch_compile_learner_dynamo_mode – The torch dynamo mode to use on the learner.

  • torch_compile_worker – If True, forward exploration and inference methods on TorchRLModules on the workers are compiled. If not specified, the default is to not compile forward methods on the workers because retracing can be expensive.

  • torch_compile_worker_dynamo_backend – The torch dynamo backend to use on the workers.

  • torch_compile_worker_dynamo_mode – The torch dynamo mode to use on the workers.

  • torch_ddp_kwargs – The kwargs to pass into torch.nn.parallel.DistributedDataParallel when using num_learners > 1. This is specifically helpful when searching for unused parameters that are not used in the backward pass. This can give hints for errors in custom models where some parameters do not get touched in the backward pass although they should.

  • torch_skip_nan_gradients – If updates with nan gradients should be entirely skipped. This skips updates in the optimizer entirely if they contain any nan gradient. This can help to avoid biasing moving-average based optimizers - like Adam. This can help in training phases where policy updates can be highly unstable such as during the early stages of training or with highly exploratory policies. In such phases many gradients might turn nan and setting them to zero could corrupt the optimizer’s internal state. The default is False and turns nan gradients to zero. If many nan gradients are encountered consider (a) monitoring gradients by setting log_gradients in AlgorithmConfig to True, (b) use proper weight initialization (e.g. Xavier, Kaiming) via the model_config_dict in AlgorithmConfig.rl_module and/or (c) gradient clipping via grad_clip in AlgorithmConfig.training.

Returns:

This updated AlgorithmConfig object.