Syncing

SyncConfig

class ray.tune.syncer.SyncConfig(upload_dir: Optional[str] = None, syncer: Optional[Union[str, ray.tune.syncer.Syncer]] = 'auto', sync_on_checkpoint: bool = True, sync_period: int = 300)[source]

Configuration object for syncing.

If an upload_dir is specified, both experiment and trial checkpoints will be stored on remote (cloud) storage. Synchronization then only happens via this remote storage.

Parameters
  • upload_dir – Optional URI to sync training results and checkpoints to (e.g. s3://bucket, gs://bucket or hdfs://path). Specifying this will enable cloud-based checkpointing.

  • syncer – Syncer class to use for synchronizing checkpoints to/from cloud storage. If set to None, no syncing will take place. Defaults to "auto" (auto detect).

  • sync_on_checkpoint – Force sync-down of trial checkpoint to driver (only non cloud-storage). If set to False, checkpoint syncing from worker to driver is asynchronous and best-effort. This does not affect persistent storage syncing. Defaults to True.

  • sync_period – Syncing period for syncing between nodes.

PublicAPI: This API is stable across Ray releases.

Syncer

class ray.tune.syncer.Syncer(sync_period: float = 300.0)[source]

Syncer class for synchronizing data between Ray nodes and external storage.

This class handles data transfer for two cases:

  1. Synchronizing data from the driver to external storage. This affects experiment-level checkpoints and trial-level checkpoints if no cloud storage is used.

  2. Synchronizing data from remote trainables to external storage.

Synchronizing tasks are usually asynchronous and can be awaited using wait(). The base class implements a wait_or_retry() API that will retry a failed sync command.

The base class also exposes an API to only kick off syncs every sync_period seconds.

DeveloperAPI: This API may change across minor Ray releases.

abstract sync_up(local_dir: str, remote_dir: str, exclude: Optional[List] = None) bool[source]

Synchronize local directory to remote directory.

This function can spawn an asynchronous process that can be awaited in wait().

Parameters
  • local_dir – Local directory to sync from.

  • remote_dir – Remote directory to sync up to. This is an URI (protocol://remote/path).

  • exclude – Pattern of files to exclude, e.g. ["*/checkpoint_*] to exclude trial checkpoints.

Returns

True if sync process has been spawned, False otherwise.

abstract sync_down(remote_dir: str, local_dir: str, exclude: Optional[List] = None) bool[source]

Synchronize remote directory to local directory.

This function can spawn an asynchronous process that can be awaited in wait().

Parameters
  • remote_dir – Remote directory to sync down from. This is an URI (protocol://remote/path).

  • local_dir – Local directory to sync to.

  • exclude – Pattern of files to exclude, e.g. ["*/checkpoint_*] to exclude trial checkpoints.

Returns

True if sync process has been spawned, False otherwise.

abstract delete(remote_dir: str) bool[source]

Delete directory on remote storage.

This function can spawn an asynchronous process that can be awaited in wait().

Parameters

remote_dir – Remote directory to delete. This is an URI (protocol://remote/path).

Returns

True if sync process has been spawned, False otherwise.

retry()[source]

Retry the last sync up, sync down, or delete command.

You should implement this method if you spawn asynchronous syncing processes.

wait()[source]

Wait for asynchronous sync command to finish.

You should implement this method if you spawn asynchronous syncing processes.

sync_up_if_needed(local_dir: str, remote_dir: str, exclude: Optional[List] = None) bool[source]

Syncs up if time since last sync up is greater than sync_period.

Parameters
  • local_dir – Local directory to sync from.

  • remote_dir – Remote directory to sync up to. This is an URI (protocol://remote/path).

  • exclude – Pattern of files to exclude, e.g. ["*/checkpoint_*] to exclude trial checkpoints.

sync_down_if_needed(remote_dir: str, local_dir: str, exclude: Optional[List] = None)[source]

Syncs down if time since last sync down is greater than sync_period.

Parameters
  • remote_dir – Remote directory to sync down from. This is an URI (protocol://remote/path).

  • local_dir – Local directory to sync to.

  • exclude – Pattern of files to exclude, e.g. ["*/checkpoint_*] to exclude trial checkpoints.