Getting Started#

This quick start demonstrates the capabilities of the Ray cluster. Using the Ray cluster, we’ll take a sample application designed to run on a laptop and scale it up in the cloud. Ray will launch clusters and scale Python with just a few commands.

For launching a Ray cluster manually, you can refer to the on-premise cluster setup guide.

Create a (basic) Python application#

We will write a simple Python application that tracks the IP addresses of the machines that its tasks are executed on:

from collections import Counter
import socket
import time

def f():
    time.sleep(0.001)
    # Return IP address.
    return socket.gethostbyname("localhost")

ip_addresses = [f() for _ in range(10000)]
print(Counter(ip_addresses))

Save this application as script.py and execute it by running the command python script.py. The application should take 10 seconds to run and output something similar to Counter({'127.0.0.1': 10000}).

With some small changes, we can make this application run on Ray (for more information on how to do this, refer to the Ray Core Walkthrough):

from collections import Counter
import socket
import time

import ray

ray.init()

@ray.remote
def f():
    time.sleep(0.001)
    # Return IP address.
    return socket.gethostbyname("localhost")

object_ids = [f.remote() for _ in range(10000)]
ip_addresses = ray.get(object_ids)
print(Counter(ip_addresses))

Finally, let’s add some code to make the output more interesting:

from collections import Counter
import socket
import time

import ray

ray.init()

print('''This cluster consists of
    {} nodes in total
    {} CPU resources in total
'''.format(len(ray.nodes()), ray.cluster_resources()['CPU']))

@ray.remote
def f():
    time.sleep(0.001)
    # Return IP address.
    return socket.gethostbyname("localhost")

object_ids = [f.remote() for _ in range(10000)]
ip_addresses = ray.get(object_ids)

print('Tasks executed')
for ip_address, num_tasks in Counter(ip_addresses).items():
    print('    {} tasks on {}'.format(num_tasks, ip_address))

Running python script.py should now output something like:

This cluster consists of
    1 nodes in total
    4.0 CPU resources in total

Tasks executed
    10000 tasks on 127.0.0.1

Launch a cluster on a cloud provider#

To start a Ray Cluster, first we need to define the cluster configuration. The cluster configuration is defined within a YAML file that will be used by the Cluster Launcher to launch the head node, and by the Autoscaler to launch worker nodes.

A minimal sample cluster configuration file looks as follows:

Ray Team Supported

AWS

# An unique identifier for the head node and workers of this cluster.
cluster_name: aws-example-minimal

# Cloud-provider specific configuration.
provider:
    type: aws
    region: us-west-2

# The maximum number of workers nodes to launch in addition to the head
# node.
max_workers: 3

# Tell the autoscaler the allowed node types and the resources they provide.
# The key is the name of the node type, which is for debugging purposes.
# The node config specifies the launch config and physical instance type.
available_node_types:
    ray.head.default:
        # The node type's CPU and GPU resources are auto-detected based on AWS instance type.
        # If desired, you can override the autodetected CPU and GPU resources advertised to the autoscaler.
        # You can also set custom resources.
        # For example, to mark a node type as having 1 CPU, 1 GPU, and 5 units of a resource called "custom", set
        # resources: {"CPU": 1, "GPU": 1, "custom": 5}
        resources: {}
        # Provider-specific config for this node type, e.g., instance type. By default
        # Ray auto-configures unspecified fields such as SubnetId and KeyName.
        # For more documentation on available fields, see
        # http://boto3.readthedocs.io/en/latest/reference/services/ec2.html#EC2.ServiceResource.create_instances
        node_config:
            InstanceType: m5.large
    ray.worker.default:
        # The minimum number of worker nodes of this type to launch.
        # This number should be >= 0.
        min_workers: 3
        # The maximum number of worker nodes of this type to launch.
        # This parameter takes precedence over min_workers.
        max_workers: 3
        # The node type's CPU and GPU resources are auto-detected based on AWS instance type.
        # If desired, you can override the autodetected CPU and GPU resources advertised to the autoscaler.
        # You can also set custom resources.
        # For example, to mark a node type as having 1 CPU, 1 GPU, and 5 units of a resource called "custom", set
        # resources: {"CPU": 1, "GPU": 1, "custom": 5}
        resources: {}
        # Provider-specific config for this node type, e.g., instance type. By default
        # Ray auto-configures unspecified fields such as SubnetId and KeyName.
        # For more documentation on available fields, see
        # http://boto3.readthedocs.io/en/latest/reference/services/ec2.html#EC2.ServiceResource.create_instances
        node_config:
            InstanceType: m5.large

Azure

# An unique identifier for the head node and workers of this cluster.
cluster_name: minimal

# Cloud-provider specific configuration.
provider:
    type: azure
    location: westus2
    resource_group: ray-cluster

# How Ray will authenticate with newly launched nodes.
auth:
    ssh_user: ubuntu
    # you must specify paths to matching private and public key pair files
    # use `ssh-keygen -t rsa -b 4096` to generate a new ssh key pair
    ssh_private_key: ~/.ssh/id_rsa
    # changes to this should match what is specified in file_mounts
    ssh_public_key: ~/.ssh/id_rsa.pub

GCP

# A unique identifier for the head node and workers of this cluster.
cluster_name: minimal

# Cloud-provider specific configuration.
provider:
    type: gcp
    region: us-west1

Community Supported

Aliyun

Please refer to example-full.yaml.

Make sure your account balance is not less than 100 RMB, otherwise you will receive the error InvalidAccountStatus.NotEnoughBalance.

vSphere

# An unique identifier for the head node and workers of this cluster.
cluster_name: default

# The maximum number of workers nodes to launch in addition to the head
# node.
max_workers: 5

# Cloud-provider specific configuration.
provider:
    type: vsphere

# How Ray will authenticate with newly launched nodes.
auth:
    ssh_user: ray
# By default Ray creates a new private keypair, but you can also use your own.
# If you do so, make sure to also set "KeyName" in the head and worker node
# configurations below.
    ssh_private_key: ~/ray-bootstrap-key.pem

# Tell the autoscaler the allowed node types and the resources they provide.
# The key is the name of the node type, which is just for debugging purposes.
# The node config specifies the launch config and physical instance type.
available_node_types:
    ray.head.default:
        # You can override the resources here. Adding GPU to the head node is not recommended.
        # resources: { "CPU": 2, "Memory": 4096}
        resources: {}
    ray.worker.default:
        # The minimum number of nodes of this type to launch.
        # This number should be >= 0.
        min_workers: 1
        max_workers: 3
        # You can override the resources here. For GPU, currently only NVIDIA GPU is supported. If no ESXi host can
        # fulfill the requirement, the Ray node creation will fail. The number of created nodes may not meet the desired
        # minimum number. The vSphere node provider will not distinguish the GPU type. It will just count the quantity:
        # mount the first k random available NVIDIA GPU to the VM, if the user set {"GPU": k}.
        # resources: {"CPU": 2, "Memory": 4096, "GPU": 1}
        resources: {}

# Specify the node type of the head node (as configured above).
head_node_type: ray.head.default

Save this configuration file as config.yaml. You can specify a lot more details in the configuration file: instance types to use, minimum and maximum number of workers to start, autoscaling strategy, files to sync, and more. For a full reference on the available configuration properties, please refer to the cluster YAML configuration options reference.

After defining our configuration, we will use the Ray cluster launcher to start a cluster on the cloud, creating a designated “head node” and worker nodes. To start the Ray cluster, we will use the Ray CLI. Run the following command:

$ ray up -y config.yaml

Running applications on a Ray Cluster#

We are now ready to execute an application on our Ray Cluster. ray.init() will now automatically connect to the newly created cluster.

As a quick example, we execute a Python command on the Ray Cluster that connects to Ray and exits:

$ ray exec config.yaml 'python -c "import ray; ray.init()"'
2022-08-10 11:23:17,093 INFO worker.py:1312 -- Connecting to existing Ray cluster at address: <remote IP address>:6379...
2022-08-10 11:23:17,097 INFO worker.py:1490 -- Connected to Ray cluster.

You can also optionally get a remote shell using ray attach and run commands directly on the cluster. This command will create an SSH connection to the head node of the Ray Cluster.

# From a remote client:
$ ray attach config.yaml

# Now on the head node...
$ python -c "import ray; ray.init()"

For a full reference on the Ray Cluster CLI tools, please refer to the cluster commands reference.

While these tools are useful for ad-hoc execution on the Ray Cluster, the recommended way to execute an application on a Ray Cluster is to use Ray Jobs. Check out the quickstart guide to get started!

Deleting a Ray Cluster#

To shut down your cluster, run the following command:

$ ray down -y config.yaml

Getting Started#

About the demo#

Requirements#

Setup#

Create a (basic) Python application#

Launch a cluster on a cloud provider#

Running applications on a Ray Cluster#

Deleting a Ray Cluster#