MetricsLogger API#
Note
Ray 2.40 uses RLlib’s new API stack by default. The Ray team has mostly completed transitioning algorithms, example scripts, and documentation to the new code base.
If you’re still using the old API stack, see New API stack migration guide for details on how to migrate.
The RLlib team designed the MetricsLogger
API
to unify and make accessible the logging and processing of stats and metrics during
reinforcement learning (RL) experiments. RLlib’s Algorithm
class and all its sub-components each have one MetricsLogger
instance managing metrics and statistics for this component. When a subcomponent reports back to its
parent component, it “reduces” the logged results and sends them upstream.
The RLlib team recommends this API for all your custom code, like in
EnvRunner
-based callbacks,
in custom loss functions, or in custom training_step()
implementations.
Note
So far, RLlib components owning a MetricsLogger
instance are Algorithm
, EnvRunner
,
Learner
, all ConnectorV2
classes,
and all ~ray.rllib.utils.replay_buffers.EpisodeReplayBuffer
classes.
The Ray team is considering expanding access to this API on other components as well.
Features of MetricsLogger#
The MetricsLogger
API offers the following functionalities:
Log scalar values over time, such as losses, individual rewards, or episode returns.
Configure different reduction types, in particular
mean
,min
,max
, orsum
. Also, users can chose to not reduce at all through thereduce=None
setting, leaving the logged values untouched. A separateclear_on_reduce=True
setting allows for automatically clearing all logged values on eachreduce
event.Specify sliding windows, over which reductions take place, for example
window=100
to average over the last 100 logged values, or specify exponential moving average (EMA) coefficients, through which the weight of older values in the computed mean should decay over time.Merge
n
result dicts fromn
parallel subcomponents into the localMetricsLogger
. Each of thesen
dicts is the result of areduce
operation on each subcomponent’s ownMetricsLogger
instance.Log execution times for distinct code blocks through convenient
with ...
blocks.Add up lifetime counts and automatically compute the corresponding throughput metrics per second along the way.
Built-in usages of MetricsLogger#
RLlib uses the MetricsLogger
API extensively in the
existing code-base. The following is an overview of a typical information flow resulting from this:
The
Algorithm
sends parallel sample requests to itsn
EnvRunner
actors.Each
EnvRunner
collects training data by stepping through its RL environment and logs standard stats to itsMetricsLogger
, such as episode return or episode length.Each
EnvRunner
callsreduce()
on its ownMetricsLogger
instance and returns the resulting stats dict.The
Algorithm
merges then
received stats dicts into its ownMetricsLogger
instance under the top-level key “env_runners”, thereby keeping all log-settings chosen by theEnvRunner
actors.The
Algorithm
sends parallel update requests to itsm
Learner
actors.Each
Learner
performs a model update through computing losses and gradients and logs standard stats to itsMetricsLogger
, such as total loss or mean gradients.Each
Learner
callsreduce()
on its ownMetricsLogger
instance and returns the resulting stats dict.The
Algorithm
merges them
received stats dicts into its ownMetricsLogger
instance under the top-level key “learners”, thereby keeping all log-settings chosen by theLearner
actors.The
Algorithm
may add standard stats to its ownMetricsLogger
instance, for example the average time of a parallel sample request.The
Algorithm
callsreduce()
on its ownMetricsLogger
instance, compiling and returning a complete and final stats dict to the user or Ray Tune.
The MetricsLogger APIs in detail#
Before you can use MetricsLogger
in your custom code, familiarize
yourself with how to actually use its APIs.
Logging scalar values#
To log a scalar value under some string key in your MetricsLogger
,
use the log_value()
method:
from ray.rllib.utils.metrics.metrics_logger import MetricsLogger
logger = MetricsLogger()
# Log a scalar float value under the `loss` key. By default, all logged
# values under that key are averaged, once `reduce()` is called.
logger.log_value("loss", 0.01, reduce="mean", window=2)
By default, MetricsLogger
reduces values through averaging them (reduce="mean"
).
Other available reduce types are reduce="min"
, reduce="max"
, and reduce="sum"
.
Specifying a window
causes the reduction to take place over the last window
logged values.
For example, you can continue logging new values under the loss
key:
logger.log_value("loss", 0.02) # don't have to repeat `reduce` or `window` args,
# because the key already exists.
logger.log_value("loss", 0.03)
logger.log_value("loss", 0.04)
logger.log_value("loss", 0.05)
Because you specified a window of 2, MetricsLogger
only uses the last 2 values to compute the reduced result.
You can peek()
at the currently reduced result throug the peek()
method:
# Peek at the current, reduced value.
# Note that in the underlying structure, the internal values list still
# contains all logged values: 0.01, 0.02, 0.03, 0.04, and 0.05.
print(logger.peek("loss")) # Expect: 0.045, which is the average over the last 2 values
The peek()
method allows you to
check the current underlying reduced result for some key, without actually having to call
reduce()
.
Warning
Don’t call the reduce() method yourself on any
MetricsLogger
from your custom code.
The only time RLlib invokes this API is at the end of a task cycle.
RLlib controls all of these “hand over” points entirely, so unless you write your own subcomponent that reports to a parent component, such as
Algorithm
, refrain from calling the
reduce()
method.
To get the current reduced results, use the peek()
method instead,
which doesn’t alter any underlying values.
Instead of providing a flat key, you can also log a value under some nested key through passing in a tuple:
# Log a value under a deeper nested key.
logger.log_value(("some", "nested", "key"), -1.0)
print(logger.peek(("some", "nested", "key"))) # expect: -1.0
To use reduce methods, other than “mean”, specify the reduce
argument in
log_value()
:
# Log a maximum value.
logger.log_value(key="max_value", value=0.0, reduce="max")
Because you didn’t specify a window
and are using reduce="max"
, RLlib uses the infinite window,
meaning MetricsLogger
reports the lifetime maximum value,
whenever reduction takes place or you peek at the current value:
for i in range(1000, 0, -1):
logger.log_value(key="max_value", value=float(i))
logger.peek("max_value") # Expect: 1000.0, which is the lifetime max (infinite window)
You can also chose to not reduce at all, but to simply collect individual values, for example a set of images you receive from your environment over time and for which it doesn’t make sense to reduce them in any way.
Use the reduce=None
argument for achieving this. However, it’s strongly advised that you should also
set the clear_on_reduce=True
flag, because this setting may cause memory leaks otherwise.
This flag assures that MetricsLogger
clears out the underlying list of values after every
reduce()
handover operation, for example from EnvRunner
to Algorithm
:
logger.log_value("some_items", value="a", reduce=None, clear_on_reduce=True)
logger.log_value("some_items", value="b")
logger.log_value("some_items", value="c")
logger.log_value("some_items", value="d")
logger.peek("some_items") # expect a list: ["a", "b", "c", "d"]
logger.reduce()
logger.peek("some_items") # expect an empty list: []
Logging a set of nested scalar values#
If you’re logging a nested structure of values, for example
{"time_s": 0.1, "lives": 5, "rounds_played": {"player1": 10, "player2": 4}}
and all values have the exact same log settings
in terms of the reduce
, clear_on_reduce
, window
, etc arguments, you can also call the shortcut
log_dict()
method to do so:
from ray.rllib.utils.metrics.metrics_logger import MetricsLogger
logger = MetricsLogger()
# Log a bunch of scalar values within a nested dict.
stats = {"player1": 100.0, "player2": 105.0}
logger.log_dict(stats, key="mean_scores", reduce="mean", window=10)
# Later, do the same again.
stats = {"player1": 150.0, "player2": 110.0}
logger.log_dict(stats, key="mean_scores")
print(logger.peek(("mean_scores", "player1"))) # <- expect 125.0
Logging non-scalar data#
MetricsLogger
isn’t limited to scalar values.
You can also use it to log images, videos, or any other complex data.
Normally, you would chose the previously described reduce=None
argument. For example, to
log three consecutive image frames from a CartPole
environment, do the following:
import gymnasium as gym
env = gym.make("CartPole-v1")
# Log three consecutive render frames from the env.
# Make sure to set ``clear_on_reduce=True`` to avoid memory leaks.
env.reset()
logger.log_value("some_images", value=env.render(), reduce=None, clear_on_reduce=True)
env.step(0)
logger.log_value("some_images", value=env.render())
env.step(1)
logger.log_value("some_images", value=env.render())
Timers#
MetricsLogger
is context capable and offers the following
simple API to log timer results.
Notice that you can now time all your code blocks of interest inside your custom code through a single with-
line:
import time
from ray.rllib.utils.metrics.metrics_logger import MetricsLogger
logger = MetricsLogger()
# First delta measurement:
with logger.log_time("my_block_to_be_timed", reduce="mean", ema_coeff=0.1):
time.sleep(1.0)
# EMA should be ~1sec.
assert 1.1 > logger.peek("my_block_to_be_timed") > 0.9
# Second delta measurement:
with logger.log_time("my_block_to_be_timed"):
time.sleep(2.0)
# EMA should be ~1.1sec.
assert 1.15 > logger.peek("my_block_to_be_timed") > 1.05
Note
The default logging behavior is through exponential mean averaging (EMA), with a default coefficient of 0.01. This default is usually a good choice for averaging timer results over the course of the experiment.
Counters#
In case you want to count things, for example the number of environment steps taken in a sample phase, and add up those
counts either over the lifetime or over some particular phase, use the reduce="sum"
argument in the call to
log_value()
.
Combine this with clear_on_reduce=True
, if you want the count to only accumulate until the next “reduce” event.
Set clear_on_reduce=False
, which is the default, if you want the count to accumulate over the lifetime.
from ray.rllib.utils.metrics.metrics_logger import MetricsLogger
logger = MetricsLogger()
logger.log_value("my_counter", 50, reduce="sum", window=None)
logger.log_value("my_counter", 25)
logger.peek("my_counter") # expect: 75
# Even if your logger gets "reduced" from time to time, the counter keeps increasing
# because we set clear_on_reduce=False (default behavior):
logger.reduce()
logger.peek("my_counter") # still expect: 75
# To clear the sum after each "reduce" event, set `clear_on_reduce=True`:
logger.log_value("my_temp_counter", 50, reduce="sum", window=None, clear_on_reduce=True)
logger.log_value("my_temp_counter", 25)
logger.peek("my_counter") # expect: 75
logger.reduce()
logger.peek("my_counter") # expect: 0 (upon reduction, all values are cleared)
Automatic throughput measurements#
A metrics logged with the settings reduce="sum"
and clear_on_reduce=False
is considered
a lifetime
counter, accumulating counts over the entire course of the experiment without ever resetting
the value back to 0. If you also add the with_throughput=True
flag, the underlying metric automatically computes the throughput per second
on each reduce()
operation.
The Algorithm
automatically compiles an extra key for each such metric, adding the suffix _throughput
to the original key and assigning it the value for the throughput per second.
You can use the peek()
method with the call argument throughput=True
to access the throughput value. For example:
import time
from ray.rllib.utils.metrics.metrics_logger import MetricsLogger
logger = MetricsLogger()
for _ in range(3):
logger.log_value("lifetime_count", 5, reduce="sum", with_throughput=True)
# RLlib triggers a new throughput computation at each `reduce()` call
logger.reduce()
time.sleep(1.0)
# Expect the first call to return NaN because we don't have a proper start time for the time delta.
# From the second call on, expect a value of roughly 5/sec.
print(logger.peek("lifetime_count", throughput=True))
Example 1: How to use MetricsLogger in EnvRunner callbacks#
To demonstrate how to use the MetricsLogger
on an EnvRunner
, take a look at this end-to-end example here, which
makes use of the RLlibCallback
API to inject custom code into the RL environment loop.
The example computes the average “first-joint angle” of the
Acrobot-v1 RL environment
and logs the results through the MetricsLogger
API.
Note that this example is identical to the one described here, but the focus has shifted to explain
only the MetricsLogger
aspects of the code.
import math
import numpy as np
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.callbacks.callbacks import RLlibCallback
# Define a custom RLlibCallback.
class LogAcrobotAngle(RLlibCallback):
def on_episode_step(self, *, episode, env, **kwargs):
# Compute the angle at every episode step and store it temporarily in episode:
state = env.envs[0].unwrapped.state
deg_theta1 = math.degrees(math.atan2(state[1], state[0]))
episode.add_temporary_timestep_data("theta1", deg_theta1)
def on_episode_end(self, *, episode, metrics_logger, **kwargs):
theta1s = episode.get_temporary_timestep_data("theta1")
avg_theta1 = np.mean(theta1s)
# Log the resulting average angle - per episode - to the MetricsLogger.
# Report with a sliding window of 50.
metrics_logger.log_value("theta1_mean", avg_theta1, reduce="mean", window=50)
config = (
PPOConfig()
.environment("Acrobot-v1")
.callbacks(
callbacks_class=LogAcrobotAngle,
)
)
ppo = config.build()
# Train n times. Expect `theta1_mean` to be found in the results under:
# `env_runners/theta1_mean`
for i in range(10):
results = ppo.train()
print(
f"iter={i} "
f"theta1_mean={results['env_runners']['theta1_mean']} "
f"R={results['env_runners']['episode_return_mean']}"
)
Also take a look at this more complex example on how to generate and log a PacMan heatmap (image) to WandB here.
Example 2: How to use MetricsLogger in a custom loss function#
You can log metrics inside your custom loss functions. Use the Learner’s self.metrics
attribute for this.
Note
When logging loss values, the RLlib team recommends to use window=1
to always report the exact
current loss value, rather than a smoothened result over time. This way, you notice strange spikes or unstable
behavior in your loss math right away and can pinpoint problems to a particular iteration.
@override(TorchLearner)
def compute_loss_for_module(self, *, module_id, config, batch, fwd_out):
...
loss_xyz = ...
# Log a specific loss term.
self.metrics.log_value("special_loss_term", loss_xyz, window=1)
total_loss = loss_abc + loss_xyz
return total_loss
Take a look at this running end-to-end example for logging custom values inside a loss function here.
Example 3: How to use MetricsLogger in a custom Algorithm#
You can log metrics inside your custom Algorithm training_step()
method.
Use the Algorithm’s own self.metrics
attribute for this.
@override(Algorithm)
def training_step(self) -> None:
...
# Log some value.
self.metrics.log_value("some_mean_result", 1.5, window=5)
...
with self.metrics.log_time(("timers", "some_code")):
... # time some code
See this running end-to-end example for logging inside training_step().