ray.rllib.core.rl_module.default_model_config.DefaultModelConfig#

class ray.rllib.core.rl_module.default_model_config.DefaultModelConfig(fcnet_hiddens: ~typing.List[int] = <factory>, fcnet_activation: str = 'tanh', fcnet_kernel_initializer: str | ~typing.Callable | None = None, fcnet_kernel_initializer_kwargs: dict | None = None, fcnet_bias_initializer: str | ~typing.Callable | None = None, fcnet_bias_initializer_kwargs: dict | None = None, conv_filters: ~typing.List[~typing.Tuple[int, int | ~typing.Tuple[int, int], int | ~typing.Tuple[int, int]]] | None = None, conv_activation: str = 'relu', conv_kernel_initializer: str | ~typing.Callable | None = None, conv_kernel_initializer_kwargs: dict | None = None, conv_bias_initializer: str | ~typing.Callable | None = None, conv_bias_initializer_kwargs: dict | None = None, head_fcnet_hiddens: ~typing.List[int] = <factory>, head_fcnet_activation: str = 'relu', head_fcnet_kernel_initializer: str | ~typing.Callable | None = None, head_fcnet_kernel_initializer_kwargs: dict | None = None, head_fcnet_bias_initializer: str | ~typing.Callable | None = None, head_fcnet_bias_initializer_kwargs: dict | None = None, free_log_std: bool = False, log_std_clip_param: float = 20.0, vf_share_layers: bool = True, use_lstm: bool = False, max_seq_len: int = 20, lstm_cell_size: int = 256, lstm_use_prev_action: bool = False, lstm_use_prev_reward: bool = False, lstm_kernel_initializer: str | ~typing.Callable | None = None, lstm_kernel_initializer_kwargs: dict | None = None, lstm_bias_initializer: str | ~typing.Callable | None = None, lstm_bias_initializer_kwargs: dict | None = None)[source]#

Dataclass to configure all default RLlib RLModules.

Users should NOT use this class for configuring their own custom RLModules, but use a custom model_config dict with arbitrary (str) keys passed into the RLModuleSpec used to define the custom RLModule. For example:

import gymnasium as gym
import numpy as np
from ray.rllib.core.rl_module.rl_module import RLModuleSpec
from ray.rllib.examples.rl_modules.classes.tiny_atari_cnn_rlm import (
    TinyAtariCNN
)

my_rl_module = RLModuleSpec(
    module_class=TinyAtariCNN,
    observation_space=gym.spaces.Box(-1.0, 1.0, (64, 64, 4), np.float32),
    action_space=gym.spaces.Discrete(7),
    # DreamerV3-style stack working on a 64x64, color or 4x-grayscale-stacked,
    # normalized image.
    model_config={
        "conv_filters": [[16, 4, 2], [32, 4, 2], [64, 4, 2], [128, 4, 2]],
    },
).build()

Only RLlib’s default RLModules (defined by the various algorithms) should use this dataclass. Pass an instance of it into your algorithm config like so:

from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.core.rl_module.default_model_config import DefaultModelConfig

config = (
    PPOConfig()
    .rl_module(
        model_config=DefaultModelConfig(fcnet_hiddens=[32, 32]),
    )
)

DeveloperAPI: This API may change across minor Ray releases.

Methods

Attributes

`conv_activation`	Activation function descriptor for the stack configured by `conv_filters`.
`conv_bias_initializer`	Initializer function or class descriptor for the bias vectors in the stack configured by `conv_filters`.
`conv_bias_initializer_kwargs`	Kwargs passed into the initializer function defined through `conv_bias_initializer`.
`conv_filters`	List of lists of format [num_out_channels, kernel, stride] defining a Conv2D stack if the input space is 2D.
`conv_kernel_initializer`	Initializer function or class descriptor for the weight/kernel matrices in the stack configured by `conv_filters`.
`conv_kernel_initializer_kwargs`	Kwargs passed into the initializer function defined through `conv_kernel_initializer`.
`fcnet_activation`	Activation function descriptor for the stack configured by `fcnet_hiddens`.
`fcnet_bias_initializer`	Initializer function or class descriptor for the bias vectors in the stack configured by `fcnet_hiddens`.
`fcnet_bias_initializer_kwargs`	Kwargs passed into the initializer function defined through `fcnet_bias_initializer`.
`fcnet_kernel_initializer`	Initializer function or class descriptor for the weight/kernel matrices in the stack configured by `fcnet_hiddens`.
`fcnet_kernel_initializer_kwargs`	Kwargs passed into the initializer function defined through `fcnet_kernel_initializer`.
`free_log_std`	If True, for DiagGaussian action distributions (or any other continuous control distribution), make the second half of the policy's outputs a "free" bias parameter, rather than state-/NN-dependent nodes.
`head_fcnet_activation`	Activation function descriptor for the stack configured by `head_fcnet_hiddens`.
`head_fcnet_bias_initializer`	Initializer function or class descriptor for the bias vectors in the stack configured by `head_fcnet_hiddens`.
`head_fcnet_bias_initializer_kwargs`	Kwargs passed into the initializer function defined through `head_fcnet_bias_initializer`.
`head_fcnet_kernel_initializer`	Initializer function or class descriptor for the weight/kernel matrices in the stack configured by `head_fcnet_hiddens`.
`head_fcnet_kernel_initializer_kwargs`	Kwargs passed into the initializer function defined through `head_fcnet_kernel_initializer`.
`log_std_clip_param`	Whether to clip the log(stddev) when using a DiagGaussian action distribution (or any other continuous control distribution).
`lstm_bias_initializer`	Initializer function or class descriptor for the bias vectors in the stack configured by the LSTM layer.
`lstm_bias_initializer_kwargs`	Kwargs passed into the initializer function defined through `lstm_bias_initializer`.
`lstm_cell_size`	The size of the LSTM cell.
`lstm_kernel_initializer`	Initializer function or class descriptor for the weight/kernel matrices in the LSTM layer.
`lstm_kernel_initializer_kwargs`	Kwargs passed into the initializer function defined through `lstm_kernel_initializer`.
`lstm_use_prev_action`
`lstm_use_prev_reward`
`max_seq_len`	The maximum seq len for building the train batch for an LSTM model.
`use_lstm`	Whether to wrap the encoder component (defined by `fcnet_hiddens` or `conv_filters`) with an LSTM.
`vf_share_layers`	Whether encoder layers (defined by `fcnet_hiddens` or `conv_filters`) should be shared between policy- and value function.
`fcnet_hiddens`	List containing the sizes (number of nodes) of a fully connected (MLP) stack.
`head_fcnet_hiddens`	List containing the sizes (number of nodes) of a fully connected (MLP) head (ex.