Use labels to control scheduling#
In Ray version 2.49.0 and above, you can use labels to control scheduling for KubeRay. Labels are a beta feature.
This page provides a conceptual overview and usage instructions for labels. Labels are key-value pairs that provide a human-readable configuration for users to control how Ray schedules tasks, actors, and placement group bundles to specific nodes.
Note
Ray labels share the same syntax and formatting restrictions as Kubernetes labels, but are conceptually distinct. See the Kubernetes docs on labels and selectors.
How do labels work?#
The following is a high-level overview of how you use labels to control scheduling:
Ray sets default labels that describe the underlying compute. See Default node labels.
You define custom labels as key-value pairs. See Define custom labels.
You specify label selectors in your Ray code to define label requirements. You can specify these requirements at the task, actor, or placement group bundle level. See Specify label selectors.
Ray schedules tasks, actors, or placement group bundles based on the specified label selectors.
In Ray 2.50.0 and above, if you’re using a dynamic cluster with autoscaler V2 enabled, the cluster scales up to add new nodes from a designated worker group to fulfill label requirements.
Default node labels#
Note
Ray reserves all labels under ray.io namespace.
During cluster initialization or as autoscaling events add nodes to your cluster, Ray assigns the following default labels to each node:
Label |
Description |
---|---|
|
A unique ID generated for the node. |
|
The accelerator type of the node, for example |
Note
You can override default values using ray start
parameters.
The following are examples of default labels:
"ray.io/accelerator-type": "" # Default label indicating the machine is CPU-only.
Define custom labels#
You can add custom labels to your nodes using the --labels
or --labels-file
parameter when running ray start
.
# Examples 1: Start a head node with cpu-family and test-label labels
ray start --head --labels="cpu-family=amd,test-label=test-value"
# Example 2: Start a head node with labels from a label file
ray start --head --labels-files='./test-labels-file'
# The file content can be the following (should be a valid YAML file):
# "test-label": "test-value"
# "test-label-2": "test-value-2"
Note
You can’t set labels using ray.init()
. Local Ray clusters don’t support labels.
Specify label selectors#
You add label selector logic to your Ray code when defining Ray tasks, actors, or placement group bundles. Label selectors define the label requirements for matching your Ray code to a node in your Ray cluster.
Label selectors specify the following:
The key of the label.
Operator logic for matching.
The value or values to match on.
The following table shows the basic syntax for label selector operator logic:
Operator |
Description |
Example syntax |
---|---|---|
Equals |
Label matches exactly one value. |
|
Not equal |
Label matches anything by one value. |
|
In |
Label matches one of the provided values. |
|
Not in |
Label matches none of the provided values. |
|
You can specify one or more label selectors as a dict. When specifying multiple label selectors, the candidate node must meet all requirements. The following example configuration uses a custom label to require an m5.16xlarge
EC2 instance and a default label to require node ID to be 123:
label_selector={"instance_type": "m5.16xlarge", "ray.io/node-id": "123"}
Specify label requirements for tasks and actors#
Use the following syntax to add label selectors to tasks and actors:
# An example for specifing label_selector in task's @ray.remote annotation
@ray.remote(label_selector={"label_name":"label_value"})
def f():
pass
# An example of specifying label_selector in actor's @ray.remote annotation
@ray.remote(label_selector={"ray.io/accelerator-type": "nvidia-h100"})
class Actor:
pass
# An example of specifying label_selector in task's options
@ray.remote
def test_task_label_in_options():
pass
test_task_label_in_options.options(label_selector={"test-lable-key": "test-label-value"}).remote()
# An example of specifying label_selector in actor's options
@ray.remote
class Actor:
pass
actor_1 = Actor.options(
label_selector={"ray.io/accelerator-type": "nvidia-h100"},
).remote()
Specify label requirements for placement group bundles#
Use the bundle_label_selector
option to add label selector to placement group bundles. See the following examples:
# All bundles require the same labels:
ray.util.placement_group(
bundles=[{"GPU": 1}, {"GPU": 1}],
bundle_label_selector=[{"ray.io/accelerator-type": "H100"} * 2],
)
# Bundles require different labels:
ray.util.placement_group(
bundles=[{"CPU": 1}] + [{"GPU": 1} * 2],
bundle_label_selector=[{"ray.io/market-type": "spot"}] + [{"ray.io/accelerator-type": "H100"} * 2]
)
Using labels with autoscaler#
Autoscaler V2 supports label-based scheduling. To enable autoscaler to scale up nodes to fulfill label requirements, you need to create multiple worker groups for different label requirement combinations and specify all the corresponding labels in the rayStartParams
field in the Ray cluster configuration. For example:
rayStartParams: {
labels: "region=me-central1,ray.io/accelerator-type=nvidia-h100"
}
Monitor nodes using labels#
The Ray dashboard automatically shows the following information:
Labels for each node. See
ray.util.state.common.NodeState.labels
.Label selectors set for each task, actor, or placement group bundle. See
ray.util.state.common.TaskState.label_selector
andray.util.state.common.ActorState.label_selector
.
Within a task, you can programmatically obtain the node label from the RuntimeContextAPI using ray.get_runtime_context().get_node_labels()
. This returns a Python dict. See the following example:
@ray.remote
def test_task_label():
node_labels = ray.get_runtime_context().get_node_labels()
print(f"[test_task_label] node labels: {node_labels}")
"""
Example output:
(test_task_label pid=68487) [test_task_label] node labels: {'test-label-1': 'test-value-1', 'test-label-key': 'test-label-value', 'test-label-2': 'test-value-2'}
"""
You can also access information about node label and label selector information using the state API and state CLI.