ray.train.huggingface.transformers.RayTrainReportCallback#

class ray.train.huggingface.transformers.RayTrainReportCallback(*args: Any, **kwargs: Any)[source]#

Bases: TrainerCallback

A simple callback to report checkpoints and metrics to Ray Tarin.

This callback is a subclass of transformers.TrainerCallback and overrides the TrainerCallback.on_save() method. After a new checkpoint get saved, it fetches the latest metric dictionary from TrainerState.log_history and reports it with the latest checkpoint to Ray Train.

Checkpoints will be saved in the following structure:

checkpoint_00000*/   Ray Train Checkpoint
└─ checkpoint/       Hugging Face Transformers Checkpoint

For customized reporting and checkpointing logic, implement your own transformers.TrainerCallback following this user guide: Saving and Loading Checkpoints.

Note that users should ensure that the logging, evaluation, and saving frequencies are properly configured so that the monitoring metric is always up-to-date when transformers.Trainer saves a checkpoint.

Suppose the monitoring metric is reported from evaluation stage:

Some valid configurations:
  • evaluation_strategy == save_strategy == “epoch”

  • evaluation_strategy == save_strategy == “steps”, save_steps % eval_steps == 0

Some invalid configurations:
  • evaluation_strategy != save_strategy

  • evaluation_strategy == save_strategy == “steps”, save_steps % eval_steps != 0

PublicAPI (beta): This API is in beta and may change before becoming stable.

Methods

on_save

Event called after a checkpoint save.

Attributes

CHECKPOINT_NAME