Note

Ray 2.10.0 introduces the alpha stage of RLlib’s “new API stack”. The Ray Team plans to transition algorithms, example scripts, and documentation to the new code base thereby incrementally replacing the “old API stack” (e.g., ModelV2, Policy, RolloutWorker) throughout the subsequent minor releases leading up to Ray 3.0.

Note, however, that so far only PPO (single- and multi-agent) and SAC (single-agent only) support the “new API stack” and continue to run by default with the old APIs. You can continue to use the existing custom (old stack) classes.

See here for more details on how to use the new API stack.

Building Custom Policy Classes#

Warning

As of Ray >= 1.9, it is no longer recommended to use the build_policy_class() or build_tf_policy() utility functions for creating custom Policy sub-classes. Instead, follow the simple guidelines here for directly sub-classing from either one of the built-in types: EagerTFPolicyV2 or TorchPolicyV2

In order to create a custom Policy, sub-class Policy (for a generic, framework-agnostic policy), TorchPolicyV2 (for a PyTorch specific policy), or EagerTFPolicyV2 (for a TensorFlow specific policy) and override one or more of their methods. Those are in particular:

See here for an example on how to override TorchPolicy.