ray.air.checkpoint.Checkpoint
ray.air.checkpoint.Checkpoint#
- class ray.air.checkpoint.Checkpoint(local_path: Optional[Union[str, os.PathLike]] = None, data_dict: Optional[dict] = None, uri: Optional[str] = None)[source]#
Bases:
object
Ray AIR Checkpoint.
An AIR Checkpoint are a common interface for accessing models across different AIR components and libraries. A Checkpoint can have its data represented in one of three ways:
as a directory on local (on-disk) storage
as a directory on an external storage (e.g., cloud storage)
as an in-memory dictionary
The Checkpoint object also has methods to translate between different checkpoint storage locations. These storage representations provide flexibility in distributed environments, where you may want to recreate an instance of the same model on multiple nodes or across different Ray clusters.
Example:
from ray.air.checkpoint import Checkpoint # Create checkpoint data dict checkpoint_data = {"data": 123} # Create checkpoint object from data checkpoint = Checkpoint.from_dict(checkpoint_data) # Save checkpoint to a directory on the file system. path = checkpoint.to_directory() # This path can then be passed around, # # e.g. to a different function or a different script. # You can also use `checkpoint.to_uri/from_uri` to # read from/write to cloud storage # In another function or script, recover Checkpoint object from path checkpoint = Checkpoint.from_directory(path) # Convert into dictionary again recovered_data = checkpoint.to_dict() # It is guaranteed that the original data has been recovered assert recovered_data == checkpoint_data
Checkpoints can be used to instantiate a
Predictor
,BatchPredictor
, orPredictorDeployment
class.The constructor is a private API, instead the
from_
methods should be used to create checkpoint objects (e.g.Checkpoint.from_directory()
).Other implementation notes:
When converting between different checkpoint formats, it is guaranteed that a full round trip of conversions (e.g. directory –> dict –> –> directory) will recover the original checkpoint data. There are no guarantees made about compatibility of intermediate representations.
New data can be added to a Checkpoint during conversion. Consider the following conversion: directory –> dict (adding dict[“foo”] = “bar”) –> directory –> dict (expect to see dict[“foo”] = “bar”). Note that the second directory will contain pickle files with the serialized additional field data in them.
Similarly with a dict as a source: dict –> directory (add file “foo.txt”) –> dict –> directory (will have “foo.txt” in it again). Note that the second dict representation will contain an extra field with the serialized additional files in it.
Checkpoints can be pickled and sent to remote processes. Please note that checkpoints pointing to local directories will be pickled as data representations, so the full checkpoint data will be contained in the checkpoint object. If you want to avoid this, consider passing only the checkpoint directory to the remote task and re-construct your checkpoint object in that function. Note that this will only work if the “remote” task is scheduled on the same node or a node that also has access to the local data path (e.g. on a shared file system like NFS).
If you need persistence across clusters, use the
to_uri()
orto_directory()
methods to persist your checkpoints to disk.PublicAPI (beta): This API is in beta and may change before becoming stable.
Methods
__init__
([local_path, data_dict, uri])DeveloperAPI: This API may change across minor Ray releases.
Return checkpoint directory path in a context.
from_bytes
(data)Create a checkpoint from the given byte string.
from_checkpoint
(other)Create a checkpoint from a generic
Checkpoint
.from_dict
(data)Create checkpoint object from dictionary.
from_directory
(path)Create checkpoint object from directory.
from_uri
(uri)Create checkpoint object from location URI (e.g.
Return tuple of (type, data) for the internal representation.
Return the saved preprocessor, if one exists.
set_preprocessor
(preprocessor)Saves the provided preprocessor to this Checkpoint.
to_bytes
()Return Checkpoint serialized as bytes object.
to_dict
()Return checkpoint data as dictionary.
to_directory
([path])Write checkpoint data to directory.
to_uri
(uri)Write checkpoint data to location URI (e.g.
Attributes
Return checkpoint URI, if available.