Note

Ray 2.10.0 introduces the alpha stage of RLlib’s “new API stack”. The team is currently transitioning algorithms, example scripts, and documentation to the new code base throughout the subsequent minor releases leading up to Ray 3.0.

See here for more details on how to activate and use the new API stack.

Building Custom Policy Classes#

Warning

As of Ray >= 1.9, it is no longer recommended to use the build_policy_class() or build_tf_policy() utility functions for creating custom Policy sub-classes. Instead, follow the simple guidelines here for directly sub-classing from either one of the built-in types: EagerTFPolicyV2 or TorchPolicyV2

In order to create a custom Policy, sub-class Policy (for a generic, framework-agnostic policy), TorchPolicyV2 (for a PyTorch specific policy), or EagerTFPolicyV2 (for a TensorFlow specific policy) and override one or more of their methods. Those are in particular:

compute_actions_from_input_dict()
postprocess_trajectory()
loss()

See here for an example on how to override TorchPolicy.