Object Spilling#

Ray 1.3+ spills objects to external storage once the object store is full. By default, objects are spilled to Ray’s temporary directory in the local filesystem.

Single node#

Ray uses object spilling by default. Without any setting, objects are spilled to [temp_folder]/spill. On Linux and MacOS, the temp_folder is /tmp by default.

To configure the directory where objects are spilled to, use:

import json
import ray

ray.init(
    _system_config={
        "object_spilling_config": json.dumps(
            {"type": "filesystem", "params": {"directory_path": "/tmp/spill"}},
        )
    },
)

You can also specify multiple directories for spilling to spread the IO load and disk space usage across multiple physical devices if needed (e.g., SSD devices):

import json
import ray

ray.init(
    _system_config={
        "max_io_workers": 4,  # More IO workers for parallelism.
        "object_spilling_config": json.dumps(
            {
              "type": "filesystem",
              "params": {
                # Multiple directories can be specified to distribute
                # IO across multiple mounted physical devices.
                "directory_path": [
                  "/tmp/spill",
                  "/tmp/spill_1",
                  "/tmp/spill_2",
                ]
              },
            }
        )
    },
)

Note

To optimize the performance, it is recommended to use an SSD instead of an HDD when using object spilling for memory-intensive workloads.

If you are using an HDD, it is recommended that you specify a large buffer size (> 1MB) to reduce IO requests during spilling.

import json
import ray

ray.init(
    _system_config={
        "object_spilling_config": json.dumps(
            {
              "type": "filesystem",
              "params": {
                "directory_path": "/tmp/spill",
                "buffer_size": 1_000_000,
              }
            },
        )
    },
)

To prevent running out of disk space, local object spilling will throw OutOfDiskError if the disk utilization exceeds the predefined threshold. If multiple physical devices are used, any physical device’s over-usage will trigger the OutOfDiskError. The default threshold is 0.95 (95%). You can adjust the threshold by setting local_fs_capacity_threshold, or set it to 1 to disable the protection.

import json
import ray

ray.init(
    _system_config={
        # Allow spilling until the local disk is 99% utilized.
        # This only affects spilling to the local file system.
        "local_fs_capacity_threshold": 0.99,
        "object_spilling_config": json.dumps(
            {
              "type": "filesystem",
              "params": {
                "directory_path": "/tmp/spill",
              }
            },
        )
    },
)

To enable object spilling to remote storage (any URI supported by smart_open):

  import json
  import ray

  ray.init(
      _system_config={
          "max_io_workers": 4,  # More IO workers for remote storage.
          "min_spilling_size": 100 * 1024 * 1024,  # Spill at least 100MB at a time.
          "object_spilling_config": json.dumps(
              {
                "type": "smart_open",
                "params": {
                  "uri": "s3://bucket/path"
                },
                "buffer_size": 100 * 1024 * 1024,  # Use a 100MB buffer for writes
              },
          )
      },
  )

It is recommended that you specify a large buffer size (> 1MB) to reduce IO requests during spilling.

Spilling to multiple remote storages is also supported.

  import json
  import ray

  ray.init(
      _system_config={
          "max_io_workers": 4,  # More IO workers for remote storage.
          "min_spilling_size": 100 * 1024 * 1024,  # Spill at least 100MB at a time.
          "object_spilling_config": json.dumps(
              {
                "type": "smart_open",
                "params": {
                  "uri": ["s3://bucket/path1", "s3://bucket/path2", "s3://bucket/path3"],
                },
                "buffer_size": 100 * 1024 * 1024, # Use a 100MB buffer for writes
              },
          )
      },
  )

Remote storage support is still experimental.

Cluster mode#

To enable object spilling in multi node clusters:

# Note that `object_spilling_config`'s value should be json format.
# You only need to specify the config when starting the head node, all the worker nodes will get the same config from the head node.
ray start --head --system-config='{"object_spilling_config":"{\"type\":\"filesystem\",\"params\":{\"directory_path\":\"/tmp/spill\"}}"}'

Stats#

When spilling is happening, the following INFO level messages will be printed to the raylet logs (e.g., /tmp/ray/session_latest/logs/raylet.out):

local_object_manager.cc:166: Spilled 50 MiB, 1 objects, write throughput 230 MiB/s
local_object_manager.cc:334: Restored 50 MiB, 1 objects, read throughput 505 MiB/s

You can also view cluster-wide spill stats by using the ray memory command:

--- Aggregate object store stats across all nodes ---
Plasma memory usage 50 MiB, 1 objects, 50.0% full
Spilled 200 MiB, 4 objects, avg write throughput 570 MiB/s
Restored 150 MiB, 3 objects, avg read throughput 1361 MiB/s

If you only want to display cluster-wide spill stats, use ray memory --stats-only.