Launching Ray Clusters on Azure#
This guide details the steps needed to start a Ray cluster on Azure.
There are two ways to start an Azure Ray cluster.
Launch through Ray cluster launcher.
Deploy a cluster using Azure portal.
Note
The Azure integration is community-maintained. Please reach out to the integration maintainers on GitHub if you run into any problems: gramhagen, eisber, ijrsvt.
Using Ray cluster launcher#
Install Ray cluster launcher#
The Ray cluster launcher is part of the ray
CLI. Use the CLI to start, stop and attach to a running ray cluster using commands such as ray up
, ray down
and ray attach
. You can use pip to install the ray CLI with cluster launcher support. Follow the Ray installation documentation for more detailed instructions.
# install ray
pip install -U ray[default]
Install and Configure Azure CLI#
Next, install the Azure CLI (pip install -U azure-cli azure-identity
) and login using az login
.
# Install packages to use azure CLI.
pip install azure-cli azure-identity
# Login to azure. This will redirect you to your web browser.
az login
Install Azure SDK libraries#
Now, install the Azure SDK libraries that enable the Ray cluster launcher to build Azure infrastructure.
# Install azure SDK libraries.
pip install azure-core azure-mgmt-network azure-mgmt-common azure-mgmt-resource azure-mgmt-compute msrestazure
Start Ray with the Ray cluster launcher#
The provided cluster config file will create a small cluster with a Standard DS2v3 on-demand head node that is configured to autoscale to up to two Standard DS2v3 spot-instance worker nodes.
Note that you’ll need to fill in your Azure resource_group and location in those templates. You also need set the subscription to use. You can do this from the command line with az account set -s <subscription_id>
or by filling in the subscription_id in the cluster config file.
Download and configure the example configuration#
Download the reference example locally:
# Download the example-full.yaml
wget https://raw.githubusercontent.com/ray-project/ray/master/python/ray/autoscaler/azure/example-full.yaml
Automatic SSH Key Generation#
To connect to the provisioned head node VM, Ray has automatic SSH Key Generation if none are specified in the config. This is the simplest approach and requires no manual key management.
The default configuration in example-full.yaml
uses automatic key generation:
auth:
ssh_user: ubuntu
# SSH keys are auto-generated if not specified
# Uncomment and specify custom paths if you want to use existing keys:
# ssh_private_key: /path/to/your/key.pem
# ssh_public_key: /path/to/your/key.pub
(Optional) Manual SSH Key Configuration#
If you prefer to use your own existing SSH keys, uncomment and specify both of the key paths in the auth
section.
For example, to use an existing ed25519
key pair:
auth:
ssh_user: ubuntu
ssh_private_key: ~/.ssh/id_ed25519
ssh_public_key: ~/.ssh/id_ed25519.pub
Or for RSA keys:
auth:
ssh_user: ubuntu
ssh_private_key: ~/.ssh/id_rsa
ssh_public_key: ~/.ssh/id_rsa.pub
Both methods inject the public key directly into the VM’s ~/.ssh/authorized_keys
via Azure ARM templates.
Launch the Ray cluster on Azure#
# Create or update the cluster. When the command finishes, it will print
# out the command that can be used to SSH into the cluster head node.
ray up example-full.yaml
# Get a remote screen on the head node.
ray attach example-full.yaml
# Try running a Ray program.
# Tear down the cluster.
ray down example-full.yaml
Congratulations, you have started a Ray cluster on Azure!
Using Azure portal#
Alternatively, you can deploy a cluster using Azure portal directly. Please note that autoscaling is done using Azure VM Scale Sets and not through the Ray autoscaler. This will deploy Azure Data Science VMs (DSVM) for both the head node and the auto-scalable cluster managed by Azure Virtual Machine Scale Sets. The head node conveniently exposes both SSH as well as JupyterLab.
Once the template is successfully deployed the deployment Outputs page provides the ssh command to connect and the link to the JupyterHub on the head node (username/password as specified on the template input). Use the following code in a Jupyter notebook (using the conda environment specified in the template input, py38_tensorflow by default) to connect to the Ray cluster.
import ray; ray.init()
Under the hood, the azure-init.sh script is executed and performs the following actions:
Activates one of the conda environments available on DSVM
Installs Ray and any other user-specified dependencies
Sets up a systemd task (
/lib/systemd/system/ray.service
) to start Ray in head or worker mode