Catalog (Alpha)
Contents

Note
This doc is related to the RLModule API and therefore experimental.
Note
Interacting with Catalogs mainly covers advanced use cases.
Catalog (Alpha)#
Catalogs are where RLModules primarily get their models and action distributions from.
Each RLModule
has its own default
Catalog
. For example,
PPOTorchRLModule
has the
PPOCatalog
.
You can override Catalogs’ methods to alter the behavior of existing RLModules.
This makes Catalogs a means of configuration for RLModules.
You interact with Catalogs when making deeper customization to what Model
and Distribution
RLlib creates by default.
Note
If you simply want to modify a Model
by changing its default values,
have a look at the model config dict
:
While Catalogs have a base class Catalog
, you mostly interact with
Algorithm-specific Catalogs.
Therefore, this doc also includes examples around PPO from which you can extrapolate to other algorithms.
Prerequisites for this user guide is a rough understanding of RLModules.
This user guide covers the following topics:
Basic usage
What are Catalogs
Inject your custom models into RLModules
Inject your custom action distributions into RLModules
Catalog and AlgorithmConfig#
Since Catalogs effectively control what models
and distributions
RLlib uses under the hood,
they are also part of RLlib’s configurations. As the primary entry point for configuring RLlib,
AlgorithmConfig
is the place where you can configure the
Catalogs of the RLModules that are created.
You set the catalog class
by going through the SingleAgentRLModuleSpec
or MultiAgentRLModuleSpec
of an AlgorithmConfig.
For example, in heterogeneous multi-agent cases, you modify the MultiAgentRLModuleSpec.
The following example shows how to configure the Catalog of an RLModule
created by PPO.
from ray.rllib.algorithms.ppo.ppo_catalog import PPOCatalog
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.core.rl_module.rl_module import SingleAgentRLModuleSpec
class MyPPOCatalog(PPOCatalog):
def __init__(self, *args, **kwargs):
print("Hi from within PPORLModule!")
super().__init__(*args, **kwargs)
config = (
PPOConfig()
.environment("CartPole-v1")
.framework("torch")
.rl_module(_enable_rl_module_api=True)
.training(_enable_learner_api=True)
)
# Specify the catalog to use for the PPORLModule.
config = config.rl_module(
rl_module_spec=SingleAgentRLModuleSpec(catalog_class=MyPPOCatalog)
)
# This is how RLlib constructs a PPORLModule
# It will say "Hi from within PPORLModule!".
ppo = config.build()
Basic usage#
The following three examples illustrate three basic usage patterns of Catalogs.
The first example showcases the general API for interacting with Catalogs.
import gymnasium as gym
from ray.rllib.algorithms.ppo.ppo_catalog import PPOCatalog
env = gym.make("CartPole-v1")
catalog = PPOCatalog(env.observation_space, env.action_space, model_config_dict={})
# Build an encoder that fits CartPole's observation space.
encoder = catalog.build_actor_critic_encoder(framework="torch")
policy_head = catalog.build_pi_head(framework="torch")
# We expect a categorical distribution for CartPole.
action_dist_class = catalog.get_action_dist_cls(framework="torch")
The second example showcases how to use the PPOCatalog
to create a model
and an action distribution
.
This is more similar to what RLlib does internally.
The third example showcases how to use the base Catalog
to create an encoder
and an action distribution
.
Besides these, we create a head network
that fits these two by hand to show how you can combine RLlib’s
ModelConfig
API and Catalog.
Extending Catalog to also build this head is how Catalog
is meant to be
extended, which we cover later in this guide.
What are Catalogs#
Catalogs have two primary roles: Choosing the right Model
and choosing the right Distribution
.
By default, all catalogs implement decision trees that decide model architecture based on a combination of input configurations.
These mainly include the observation space
and action space
of the RLModule
, the model config dict
and the deep learning framework backend
.
The following diagram shows the break down of the information flow towards models
and distributions
within an RLModule
.
An RLModule
creates an instance of the Catalog class they receive as part of their constructor.
It then create its internal models
and distributions
with the help of this Catalog.
Note
You can also modify Model
or Distribution
in an RLModule
directly by overriding the RLModule’s constructor!
The following diagram shows a concrete case in more detail.
Inject your custom model or action distributions into RLModules#
You can make a Catalog
build custom models
by overriding the Catalog’s methods used by RLModules to build models
.
Have a look at these lines from the constructor of the PPOTorchRLModule
to see how Catalogs are being used by an RLModule
:
catalog = self.config.get_catalog()
# Build models from catalog
self.encoder = catalog.build_actor_critic_encoder(framework=self.framework)
self.pi = catalog.build_pi_head(framework=self.framework)
self.vf = catalog.build_vf_head(framework=self.framework)
self.action_dist_cls = catalog.get_action_dist_cls(framework=self.framework)
Consequently, in order to build a custom Model
compatible with a PPORLModule,
you can override methods by inheriting from PPOCatalog
or write a Catalog
that implements them from scratch.
The following showcases such modifications.
This example shows two modifications:
How to write a custom
Distribution
How to inject a custom action distribution into a
Catalog
import torch
import gymnasium as gym
from ray.rllib.algorithms.ppo.ppo import PPOConfig
from ray.rllib.algorithms.ppo.ppo_catalog import PPOCatalog
from ray.rllib.core.rl_module.rl_module import SingleAgentRLModuleSpec
from ray.rllib.models.distributions import Distribution
from ray.rllib.models.torch.torch_distributions import TorchDeterministic
# Define a simple categorical distribution that can be used for PPO
class CustomTorchCategorical(Distribution):
def __init__(self, logits):
self.torch_dist = torch.distributions.categorical.Categorical(logits=logits)
def sample(self, sample_shape=torch.Size()):
return self.torch_dist.sample(sample_shape)
def rsample(self, sample_shape=torch.Size()):
return self._dist.rsample(sample_shape)
def logp(self, value):
return self.torch_dist.log_prob(value)
def entropy(self):
return self.torch_dist.entropy()
def kl(self, other):
return torch.distributions.kl.kl_divergence(self.torch_dist, other.torch_dist)
@staticmethod
def required_input_dim(space):
return int(space.n)
@classmethod
# This method is used to create distributions from logits inside RLModules.
# You can use this to inject arguments into the constructor of this distribution
# that are not the logits themselves.
def from_logits(cls, logits):
return CustomTorchCategorical(logits=logits)
# This method is used to create a deterministic distribution for the
# PPORLModule.forward_inference.
def to_deterministic(self):
return TorchDeterministic(loc=torch.argmax(self.logits, dim=-1))
# See if we can create this distribution and sample from it to interact with our
# target environment
env = gym.make("CartPole-v1")
dummy_logits = torch.randn([env.action_space.n])
dummy_dist = CustomTorchCategorical.from_logits(dummy_logits)
action = dummy_dist.sample()
env = gym.make("CartPole-v1")
env.reset()
env.step(action.numpy())
# Define a simple catalog that returns our custom distribution when
# get_action_dist_cls is called
class CustomPPOCatalog(PPOCatalog):
def get_action_dist_cls(self, framework):
# The distribution we wrote will only work with torch
assert framework == "torch"
return CustomTorchCategorical
# Train with our custom action distribution
algo = (
PPOConfig()
.environment("CartPole-v1")
.rl_module(rl_module_spec=SingleAgentRLModuleSpec(catalog_class=CustomPPOCatalog))
.build()
)
results = algo.train()
print(results)
Notable TODOs#
Add cross references to Model and Distribution API docs
Add example that shows how to inject own model
Add more instructions on how to write a catalog from scratch
Add section “Extend RLlib’s selection of Models and Distributions with your own”
Add section “Write a Catalog from scratch”