ray.data.checkpoint.interfaces.CheckpointConfig#
- class ray.data.checkpoint.interfaces.CheckpointConfig(id_column: str | None = None, checkpoint_path: str | None = None, *, delete_checkpoint_on_success: bool = True, override_filesystem: pyarrow.fs.FileSystem | None = None, override_backend: CheckpointBackend | None = None, filter_num_threads: int = 3, write_num_threads: int = 3, checkpoint_path_partition_filter: PathPartitionFilter | None = None)[source]#
Configuration for checkpointing.
- Parameters:
id_column – Name of the ID column in the input dataset. ID values must be unique across all rows in the dataset and must persist during all operators.
checkpoint_path – Path to store the checkpoint data. It can be a path to a cloud object storage (e.g.
s3://bucket/path) or a file system path. If the latter, the path must be a network-mounted file system (e.g./mnt/cluster_storage/) that is accessible to the entire cluster. If not set, defaults toRAY_DATA_CHECKPOINT_PATH_BUCKET/ray_data_checkpoint.delete_checkpoint_on_success – If true, automatically delete checkpoint data when the dataset execution succeeds. Only supported for batch-based backend currently.
override_filesystem – Override the
pyarrow.fs.FileSystemobject used to read/write checkpoint data. Use this when you want to use custom credentials.override_backend – Override the
CheckpointBackendobject used to access the checkpoint backend storage.filter_num_threads – Number of threads used to filter checkpointed rows.
write_num_threads – Number of threads used to write checkpoint files for completed rows.
checkpoint_path_partition_filter – Filter for checkpoint files to load during restoration when reading from
checkpoint_path.
PublicAPI (beta): This API is in beta and may change before becoming stable.