ray.rllib.algorithms.algorithm_config.AlgorithmConfig.framework#
- AlgorithmConfig.framework(framework: str | None = <ray.rllib.utils.from_config._NotProvided object>, *, eager_tracing: bool | None = <ray.rllib.utils.from_config._NotProvided object>, eager_max_retraces: int | None = <ray.rllib.utils.from_config._NotProvided object>, tf_session_args: ~typing.Dict[str, ~typing.Any] | None = <ray.rllib.utils.from_config._NotProvided object>, local_tf_session_args: ~typing.Dict[str, ~typing.Any] | None = <ray.rllib.utils.from_config._NotProvided object>, torch_compile_learner: bool | None = <ray.rllib.utils.from_config._NotProvided object>, torch_compile_learner_what_to_compile: str | None = <ray.rllib.utils.from_config._NotProvided object>, torch_compile_learner_dynamo_mode: str | None = <ray.rllib.utils.from_config._NotProvided object>, torch_compile_learner_dynamo_backend: str | None = <ray.rllib.utils.from_config._NotProvided object>, torch_compile_worker: bool | None = <ray.rllib.utils.from_config._NotProvided object>, torch_compile_worker_dynamo_backend: str | None = <ray.rllib.utils.from_config._NotProvided object>, torch_compile_worker_dynamo_mode: str | None = <ray.rllib.utils.from_config._NotProvided object>, torch_ddp_kwargs: ~typing.Dict[str, ~typing.Any] | None = <ray.rllib.utils.from_config._NotProvided object>, torch_skip_nan_gradients: bool | None = <ray.rllib.utils.from_config._NotProvided object>) AlgorithmConfig [source]#
Sets the config’s DL framework settings.
- Parameters:
framework – torch: PyTorch; tf2: TensorFlow 2.x (eager execution or traced if eager_tracing=True); tf: TensorFlow (static-graph);
eager_tracing – Enable tracing in eager mode. This greatly improves performance (speedup ~2x), but makes it slightly harder to debug since Python code won’t be evaluated after the initial eager pass. Only possible if framework=tf2.
eager_max_retraces – Maximum number of tf.function re-traces before a runtime error is raised. This is to prevent unnoticed retraces of methods inside the
..._eager_traced
Policy, which could slow down execution by a factor of 4, without the user noticing what the root cause for this slowdown could be. Only necessary for framework=tf2. Set to None to ignore the re-trace count and never throw an error.tf_session_args – Configures TF for single-process operation by default.
local_tf_session_args – Override the following tf session args on the local worker
torch_compile_learner – If True, forward_train methods on TorchRLModules on the learner are compiled. If not specified, the default is to compile forward train on the learner.
torch_compile_learner_what_to_compile – A TorchCompileWhatToCompile mode specifying what to compile on the learner side if torch_compile_learner is True. See TorchCompileWhatToCompile for details and advice on its usage.
torch_compile_learner_dynamo_backend – The torch dynamo backend to use on the learner.
torch_compile_learner_dynamo_mode – The torch dynamo mode to use on the learner.
torch_compile_worker – If True, forward exploration and inference methods on TorchRLModules on the workers are compiled. If not specified, the default is to not compile forward methods on the workers because retracing can be expensive.
torch_compile_worker_dynamo_backend – The torch dynamo backend to use on the workers.
torch_compile_worker_dynamo_mode – The torch dynamo mode to use on the workers.
torch_ddp_kwargs – The kwargs to pass into
torch.nn.parallel.DistributedDataParallel
when usingnum_learners > 1
. This is specifically helpful when searching for unused parameters that are not used in the backward pass. This can give hints for errors in custom models where some parameters do not get touched in the backward pass although they should.torch_skip_nan_gradients – If updates with
nan
gradients should be entirely skipped. This skips updates in the optimizer entirely if they contain anynan
gradient. This can help to avoid biasing moving-average based optimizers - like Adam. This can help in training phases where policy updates can be highly unstable such as during the early stages of training or with highly exploratory policies. In such phases many gradients might turnnan
and setting them to zero could corrupt the optimizer’s internal state. The default isFalse
and turnsnan
gradients to zero. If manynan
gradients are encountered consider (a) monitoring gradients by settinglog_gradients
inAlgorithmConfig
toTrue
, (b) use proper weight initialization (e.g. Xavier, Kaiming) via themodel_config_dict
inAlgorithmConfig.rl_module
and/or (c) gradient clipping viagrad_clip
inAlgorithmConfig.training
.
- Returns:
This updated AlgorithmConfig object.