Resources#

Ray allows you to seamlessly scale your applications from a laptop to a cluster without code change. Ray resources are key to this capability. They abstract away physical machines and let you express your computation in terms of resources, while the system manages scheduling and autoscaling based on resource requests.

A resource in Ray is a key-value pair where the key denotes a resource name, and the value is a float quantity. For convenience, Ray has native support for CPU, GPU, and memory resource types; CPU, GPU and memory are called pre-defined resources. Besides those, Ray also supports custom resources.

Physical Resources and Logical Resources#

Physical resources are resources that a machine physically has such as physical CPUs and GPUs and logical resources are virtual resources defined by a system.

Ray resources are logical and don’t need to have 1-to-1 mapping with physical resources. For example, you can start a Ray head node with 0 logical CPUs via ray start --head --num-cpus=0 even if it physically has eight (This signals the Ray scheduler to not schedule any tasks or actors that require logical CPU resources on the head node, mainly to reserve the head node for running Ray system processes.). They are mainly used for admission control during scheduling.

The fact that resources are logical has several implications:

Resource requirements of tasks or actors do NOT impose limits on actual physical resource usage. For example, Ray doesn’t prevent a num_cpus=1 task from launching multiple threads and using multiple physical CPUs. It’s your responsibility to make sure tasks or actors use no more resources than specified via resource requirements.
Ray doesn’t provide CPU isolation for tasks or actors. For example, Ray won’t reserve a physical CPU exclusively and pin a num_cpus=1 task to it. Ray will let the operating system schedule and run the task instead. If needed, you can use operating system APIs like sched_setaffinity to pin a task to a physical CPU.
Ray does provide GPU isolation in the form of visible devices by automatically setting the CUDA_VISIBLE_DEVICES environment variable, which most ML frameworks will respect for purposes of GPU assignment.

Note

Ray sets the environment variable OMP_NUM_THREADS=<num_cpus> if num_cpus is set on the task/actor via ray.remote() and task.options()/actor.options(). Ray sets OMP_NUM_THREADS=1 if num_cpus is not specified; this is done to avoid performance degradation with many workers (issue #6998). You can also override this by explicitly setting OMP_NUM_THREADS to override anything Ray sets by default. OMP_NUM_THREADS is commonly used in numpy, PyTorch, and Tensorflow to perform multi-threaded linear algebra. In multi-worker setting, we want one thread per worker instead of many threads per worker to avoid contention. Some other libraries may have their own way to configure parallelism. For example, if you’re using OpenCV, you should manually set the number of threads using cv2.setNumThreads(num_threads) (set to 0 to disable multi-threading).

../../_images/physical_resources_vs_logical_resources.svg — Physical resources vs logical resources#

Custom Resources#

You can specify custom resources for a Ray node and reference them to control scheduling for your tasks or actors.

Use custom resources when you need to manage scheduling using numeric values. If you need simple label-based scheduling, use labels instead. See Use labels to control scheduling.

Specifying Node Resources#

By default, Ray nodes start with pre-defined CPU, GPU, and memory resources. The quantities of these logical resources on each node are set to the physical quantities auto detected by Ray. By default, logical resources are configured by the following rule.

Warning

Ray does not permit dynamic updates of resource capacities after Ray has been started on a node.

Number of logical CPUs (num_cpus): Set to the number of CPUs of the machine/container.
Number of logical GPUs (num_gpus): Set to the number of GPUs of the machine/container.
Memory (memory): Set to 70% of “available memory” when ray runtime starts.
Object Store Memory (object_store_memory): Set to 30% of “available memory” when ray runtime starts. Note that the object store memory is not logical resource, and users cannot use it for scheduling.

However, you can always override that by manually specifying the quantities of pre-defined resources and adding custom resources. There are several ways to do that depending on how you start the Ray cluster:

ray.init()

If you are using ray.init() to start a single node Ray cluster, you can do the following to manually specify node resources:

# This will start a Ray node with 3 logical cpus, 4 logical gpus,
# 1 special_hardware resource and 1 custom_label resource.
ray.init(num_cpus=3, num_gpus=4, resources={"special_hardware": 1, "custom_label": 1})

ray start

If you are using ray start to start a Ray node, you can run:

ray start --head --num-cpus=3 --num-gpus=4 --resources='{"special_hardware": 1, "custom_label": 1}'

ray up

If you are using ray up to start a Ray cluster, you can set the resources field in the yaml file:

available_node_types:
  head:
    ...
    resources:
      CPU: 3
      GPU: 4
      special_hardware: 1
      custom_label: 1

KubeRay

If you are using KubeRay to start a Ray cluster, you can set the rayStartParams field in the yaml file:

headGroupSpec:
  rayStartParams:
    num-cpus: "3"
    num-gpus: "4"
    resources: '"{\"special_hardware\": 1, \"custom_label\": 1}"'

Specifying Task or Actor Resource Requirements#

Ray allows specifying a task or actor’s logical resource requirements (e.g., CPU, GPU, and custom resources). The task or actor will only run on a node if there are enough required logical resources available to execute the task or actor.

By default, Ray tasks use 1 logical CPU resource and Ray actors use 1 logical CPU for scheduling, and 0 logical CPU for running. (This means, by default, actors cannot get scheduled on a zero-cpu node, but an infinite number of them can run on any non-zero cpu node. The default resource requirements for actors was chosen for historical reasons. It’s recommended to always explicitly set num_cpus for actors to avoid any surprises. If resources are specified explicitly, they are required both at schedule time and at execution time.)

You can also explicitly specify a task’s or actor’s logical resource requirements (for example, one task may require a GPU) instead of using default ones via ray.remote() and task.options()/actor.options().

Python

# Specify the default resource requirements for this remote function.
@ray.remote(num_cpus=2, num_gpus=2, resources={"special_hardware": 1})
def func():
    return 1


# You can override the default resource requirements.
func.options(num_cpus=3, num_gpus=1, resources={"special_hardware": 0}).remote()


@ray.remote(num_cpus=0, num_gpus=1)
class Actor:
    pass


# You can override the default resource requirements for actors as well.
actor = Actor.options(num_cpus=1, num_gpus=0).remote()

Java

// Specify required resources.
Ray.task(MyRayApp::myFunction).setResource("CPU", 1.0).setResource("GPU", 1.0).setResource("special_hardware", 1.0).remote();

Ray.actor(Counter::new).setResource("CPU", 2.0).setResource("GPU", 1.0).remote();

C++

// Specify required resources.
ray::Task(MyFunction).SetResource("CPU", 1.0).SetResource("GPU", 1.0).SetResource("special_hardware", 1.0).Remote();

ray::Actor(CreateCounter).SetResource("CPU", 2.0).SetResource("GPU", 1.0).Remote();

Task and actor resource requirements have implications for the Ray’s scheduling concurrency. In particular, the sum of the logical resource requirements of all of the concurrently executing tasks and actors on a given node cannot exceed the node’s total logical resources. This property can be used to limit the number of concurrently running tasks or actors to avoid issues like OOM.

Fractional Resource Requirements#

Ray supports fractional resource requirements. For example, if your task or actor is IO bound and has low CPU usage, you can specify fractional CPU num_cpus=0.5 or even zero CPU num_cpus=0. The precision of the fractional resource requirement is 0.0001 so you should avoid specifying a double that’s beyond that precision.

@ray.remote(num_cpus=0.5)
def io_bound_task():
    import time

    time.sleep(1)
    return 2


io_bound_task.remote()


@ray.remote(num_gpus=0.5)
class IOActor:
    def ping(self):
        import os

        print(f"CUDA_VISIBLE_DEVICES: {os.environ['CUDA_VISIBLE_DEVICES']}")


# Two actors can share the same GPU.
io_actor1 = IOActor.remote()
io_actor2 = IOActor.remote()
ray.get(io_actor1.ping.remote())
ray.get(io_actor2.ping.remote())
# Output:
# (IOActor pid=96328) CUDA_VISIBLE_DEVICES: 1
# (IOActor pid=96329) CUDA_VISIBLE_DEVICES: 1

Note

GPU, TPU, and neuron_cores resource requirements that are greater than 1, need to be whole numbers. For example, num_gpus=1.5 is invalid.

Tip

Besides resource requirements, you can also specify an environment for a task or actor to run in, which can include Python packages, local files, environment variables, and more. See Runtime Environments for details.