Logging & Outputs in Tune

Tune by default will log results for TensorBoard, CSV, and JSON formats. If you need to log something lower level like model weights or gradients, see Trainable Logging. You can learn more about logging and customizations here: Loggers (tune.logger).

How to configure logging in Tune?

Tune will log the results of each trial to a sub-folder under a specified local dir, which defaults to ~/ray_results.

# This logs to two different trial folders:
# ~/ray_results/trainable_name/trial_name_1 and ~/ray_results/trainable_name/trial_name_2
# trainable_name and trial_name are autogenerated.
tuner = tune.Tuner(trainable, run_config=air.RunConfig(num_samples=2))
results = tuner.fit()

You can specify the local_dir and trainable_name:

# This logs to 2 different trial folders:
# ./results/test_experiment/trial_name_1 and ./results/test_experiment/trial_name_2
# Only trial_name is autogenerated.
tuner = tune.Tuner(trainable,
    tune_config=tune.TuneConfig(num_samples=2),
    run_config=air.RunConfig(local_dir="./results", name="test_experiment"))
results = tuner.fit()

To learn more about Trials, see its detailed API documentation: Trial.

How to log to TensorBoard?

Tune automatically outputs TensorBoard files during Tuner.fit(). To visualize learning in tensorboard, install tensorboardX:

$ pip install tensorboardX

Then, after you run an experiment, you can visualize your experiment with TensorBoard by specifying the output directory of your results.

$ tensorboard --logdir=~/ray_results/my_experiment

If you are running Ray on a remote multi-user cluster where you do not have sudo access, you can run the following commands to make sure tensorboard is able to write to the tmp directory:

$ export TMPDIR=/tmp/$USER; mkdir -p $TMPDIR; tensorboard --logdir=~/ray_results
../../_images/ray-tune-tensorboard.png

If using TensorFlow 2.x, Tune also automatically generates TensorBoard HParams output, as shown below:

tuner = tune.Tuner(
    ...,
    param_space={
        "lr": tune.grid_search([1e-5, 1e-4]),
        "momentum": tune.grid_search([0, 0.9])
    }
)
results = tuner.fit()
../../_images/tune-hparams.png

How to control console output?

User-provided fields will be outputted automatically on a best-effort basis. You can use a Reporter object to customize the console output.

== Status ==
Memory usage on this node: 11.4/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 4/12 CPUs, 0/0 GPUs, 0.0/3.17 GiB heap, 0.0/1.07 GiB objects
Result logdir: /Users/foo/ray_results/myexp
Number of trials: 4 (4 RUNNING)
+----------------------+----------+---------------------+-----------+--------+--------+----------------+-------+
| Trial name           | status   | loc                 |    param1 | param2 |    acc | total time (s) |  iter |
|----------------------+----------+---------------------+-----------+--------+--------+----------------+-------|
| MyTrainable_a826033a | RUNNING  | 10.234.98.164:31115 | 0.303706  | 0.0761 | 0.1289 |        7.54952 |    15 |
| MyTrainable_a8263fc6 | RUNNING  | 10.234.98.164:31117 | 0.929276  | 0.158  | 0.4865 |        7.0501  |    14 |
| MyTrainable_a8267914 | RUNNING  | 10.234.98.164:31111 | 0.068426  | 0.0319 | 0.9585 |        7.0477  |    14 |
| MyTrainable_a826b7bc | RUNNING  | 10.234.98.164:31112 | 0.729127  | 0.0748 | 0.1797 |        7.05715 |    14 |
+----------------------+----------+---------------------+-----------+--------+--------+----------------+-------+

How to redirect stdout and stderr to files?

The stdout and stderr streams are usually printed to the console. For remote actors, Ray collects these logs and prints them to the head process.

However, if you would like to collect the stream outputs in files for later analysis or troubleshooting, Tune offers an utility parameter, log_to_file, for this.

By passing log_to_file=True to air.RunConfig, which is taken in by Tuner, stdout and stderr will be logged to trial_logdir/stdout and trial_logdir/stderr, respectively:

tuner = tune.Tuner(
    trainable,
    run_config=air.RunConfig(log_to_file=True)
)
results = tuner.fit()

If you would like to specify the output files, you can either pass one filename, where the combined output will be stored, or two filenames, for stdout and stderr, respectively:

tuner = tune.Tuner(
    trainable,
    run_config=air.RunConfig(log_to_file="std_combined.log")
)
tuner.fit()

tuner = tune.Tuner(
    trainable,
    run_config=air.RunConfig(log_to_file=("my_stdout.log", "my_stderr.log")))
results = tuner.fit()

The file names are relative to the trial’s logdir. You can pass absolute paths, too.

If log_to_file is set, Tune will automatically register a new logging handler for Ray’s base logger and log the output to the specified stderr output file.

How to Configure Trainable Logging?

By default, Tune only logs the training result dictionaries from your Trainable. However, you may want to visualize the model weights, model graph, or use a custom logging library that requires multi-process logging. For example, you may want to do this if you’re trying to log images to TensorBoard.

You can do this in the trainable, as shown below:

Tip

Make sure that any logging calls or objects stay within scope of the Trainable. You may see pickling or other serialization errors or inconsistent logs otherwise.

library refers to whatever 3rd party logging library you are using.

from ray.air import session

def trainable(config):
    library.init(
        name=trial_id,
        id=trial_id,
        resume=trial_id,
        reinit=True,
        allow_val_change=True)
    library.set_log_path(os.getcwd())

    for step in range(100):
        library.log_model(...)
        library.log(results, step=step)
        session.report(results)
class CustomLogging(tune.Trainable)
    def setup(self, config):
        trial_id = self.trial_id
        library.init(
            name=trial_id,
            id=trial_id,
            resume=trial_id,
            reinit=True,
            allow_val_change=True)
        library.set_log_path(os.getcwd())

    def step(self):
        library.log_model(...)

    def log_result(self, result):
        res_dict = {
            str(k): v
            for k, v in result.items()
            if (v and "config" not in k and not isinstance(v, str))
        }
        step = result["training_iteration"]
        library.log(res_dict, step=step)

Note: For both functional and class trainables, the current working directory is changed to something specific to that trainable once it’s launched on a remote actor.

In the distributed case, these logs will be sync’ed back to the driver under your logger path. This will allow you to visualize and analyze logs of all distributed training workers on a single machine.

How to Build Custom Loggers?

You can create a custom logger by inheriting the LoggerCallback interface (LoggerCallback):

from typing import Dict, List

import json
import os

from ray.tune.logger import LoggerCallback


class CustomLoggerCallback(LoggerCallback):
    """Custom logger interface"""

    def __init__(self, filename: str = "log.txt):
        self._trial_files = {}
        self._filename = filename

    def log_trial_start(self, trial: "Trial"):
        trial_logfile = os.path.join(trial.logdir, self._filename)
        self._trial_files[trial] = open(trial_logfile, "at")

    def log_trial_result(self, iteration: int, trial: "Trial", result: Dict):
        if trial in self._trial_files:
            self._trial_files[trial].write(json.dumps(result))

    def on_trial_complete(self, iteration: int, trials: List["Trial"],
                          trial: "Trial", **info):
        if trial in self._trial_files:
            self._trial_files[trial].close()
            del self._trial_files[trial]

You can then pass in your own logger as follows:

from ray import tune

tuner = tune.Tuner(
    MyTrainableClass,
    run_config=air.RunConfig(name="experiment_name", callbacks=[CustomLoggerCallback("log_test.txt")])
)
results = tuner.fit()

Per default, Ray Tune creates JSON, CSV and TensorBoardX logger callbacks if you don’t pass them yourself. You can disable this behavior by setting the TUNE_DISABLE_AUTO_CALLBACK_LOGGERS environment variable to "1".

An example of creating a custom logger can be found in Logging Example.