ray.experimental.register_nixl_memory_pool#

ray.experimental.register_nixl_memory_pool(size: int, device: torch.device) → None[source]#

Pre-allocates a memory pool and registers it with NIXL.

This enables pool-based memory management for NIXL transfers, which can improve performance by avoiding repeated memory registration/deregistration. The pool is registered once with NIXL and individual tensors are copied into it on ray.put.

Within a single ray.put call, tensors sharing the same underlying storage (including views) are automatically deduplicated — only one copy of each unique storage is allocated. Across multiple ray.put calls, if the same storage appears again, the existing pool slot is reused without re-copying the data. As a result, data can be potentially stale once you ray.put the storage tensor — subsequent mutations to that storage may not be reflected in outstanding refs. Clone the tensor before ray.put if snapshot semantics are required.

If the pool has insufficient space for an allocation, NixlOutOfMemoryError is raised.

Parameters:

size – Size of the memory pool in bytes.
device – Device to allocate the pool on (e.g., torch.device("cpu") or torch.device("cuda")).

Example

import torch
import ray
from ray.experimental import register_nixl_memory_pool

@ray.remote(num_gpus=1, enable_tensor_transport=True)
class Trainer:
    def __init__(self):
        # Pre-allocate a 1GB GPU memory pool for NIXL transfers
        register_nixl_memory_pool(1024 * 1024 * 1024, torch.device("cuda"))

    def get_weight_ref(self):
        weight = torch.randn(1000, 1000, device="cuda")
        return ray.put(weight, _tensor_transport="nixl")

PublicAPI (alpha): This API is in alpha and may change before becoming stable.