Use Modin with Ray on Kubernetes#

This example runs a modified version of the Using Modin with the NYC Taxi Dataset example from the Modin official repository using RayJob on Kubernetes.

Step 1: Install KubeRay operator#

Follow steps 1 and 2 from RayCluster QuickStart guide to install KubeRay operator.

Step 2: Run the Modin example with RayJob#

Create a RayJob that runs the Modin example using the following command:

kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/master/ray-operator/config/samples/ray-job.modin.yaml

Step 3: Check the output#

Run the following command to check the output:

kubectl logs -l=job-name=rayjob-sample

# [Example output]
# 2024-07-05 10:01:00,945 INFO worker.py:1446 -- Using address 10.244.0.4:6379 set in the environment variable RAY_ADDRESS
# 2024-07-05 10:01:00,945 INFO worker.py:1586 -- Connecting to existing Ray cluster at address: 10.244.0.4:6379...
# 2024-07-05 10:01:00,948 INFO worker.py:1762 -- Connected to Ray cluster. View the dashboard at 10.244.0.4:8265
# Modin Engine: Ray
# FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
# Time to compute isnull: 0.065887747972738
# Time to compute rounded_trip_distance: 0.34410698304418474
# 2024-07-05 10:01:23,069 SUCC cli.py:60 -- -----------------------------------
# 2024-07-05 10:01:23,069 SUCC cli.py:61 -- Job 'rayjob-sample-zt8wj' succeeded
# 2024-07-05 10:01:23,069 SUCC cli.py:62 -- -----------------------------------