Ray Core Walkthrough¶
This walkthrough will overview the core concepts of Ray:
Using remote functions (tasks)
Fetching results (object refs)
Using remote classes (actors)
With Ray, your code will work on a single machine and can be easily scaled to large cluster.
Java demo code in this documentation can be found here.
To run this walkthrough, install Ray with
pip install -U ray. For the latest wheels (for a snapshot of
master), you can use these instructions at Latest Snapshots (Nightlies).
You can start Ray on a single machine by adding this to your code.
Ray will then be able to utilize all cores of your machine. Find out how to configure the number of cores Ray will use at Configuring Ray.
To start a multi-node Ray cluster, see the cluster setup page.
Remote functions (Tasks)¶
Ray enables arbitrary functions to be executed asynchronously. These asynchronous Ray functions are called “remote functions”. Here is an example.
Passing object refs to remote functions¶
Object refs can also be passed into remote functions. When the function actually gets executed, the argument will be a retrieved as a regular object. For example, take this function:
Note the following behaviors:
The second task will not be executed until the first task has finished executing because the second task depends on the output of the first task.
If the two tasks are scheduled on different machines, the output of the first task (the value corresponding to
obj_ref1/objRef1) will be sent over the network to the machine where the second task is scheduled.
Specifying required resources¶
Oftentimes, you may want to specify a task’s resource requirements (for example one task may require a GPU). Ray will automatically detect the available GPUs and CPUs on the machine. However, you can override this default behavior by passing in specific resources.
Ray also allows specifying a task’s resources requirements (e.g., CPU, GPU, and custom resources). The task will only run on a machine if there are enough resources available to execute the task.
If you do not specify any resources, the default is 1 CPU resource and no other resources.
If specifying CPUs, Ray does not enforce isolation (i.e., your task is expected to honor its request).
If specifying GPUs, Ray does provide isolation in forms of visible devices (setting the environment variable
CUDA_VISIBLE_DEVICES), but it is the task’s responsibility to actually use the GPUs (e.g., through a deep learning framework like TensorFlow or PyTorch).
The resource requirements of a task have implications for the Ray’s scheduling concurrency. In particular, the sum of the resource requirements of all of the concurrently executing tasks on a given node cannot exceed the node’s total resources.
Below are more examples of resource specifications:
Objects in Ray¶
In Ray, we can create and compute on objects. We refer to these objects as remote objects, and we use object refs to refer to them. Remote objects are stored in shared-memory object stores, and there is one object store per node in the cluster. In the cluster setting, we may not actually know which machine each object lives on.
An object ref is essentially a unique ID that can be used to refer to a remote object. If you’re familiar with futures, our object refs are conceptually similar.
Object refs can be created in multiple ways.
They are returned by remote function calls.
They are returned by
Remote objects are immutable. That is, their values cannot be changed after creation. This allows remote objects to be replicated in multiple object stores without needing to synchronize the copies.
You can use the
get method (docstring) to fetch the result of a remote object from an object ref.
If the current node’s object store does not contain the object, the object is downloaded.
After launching a number of tasks, you may want to know which ones have
finished executing. This can be done with
wait (ray.wait). The function
works as follows.
When the object store gets full, objects will be evicted to make room for new objects.
This happens in approximate LRU (least recently used) order. To avoid objects from
being evicted, you can call
get and store their values instead. Numpy array
objects cannot be evicted while they are mapped in any Python process. You can also
configure memory limits to control object store usage by
Objects created with
put are pinned in memory while a Python/Java reference
to the object ref returned by the put exists. This only applies to the specific
ref returned by put, not refs in general or copies of that refs.
Remote Classes (Actors)¶
Actors extend the Ray API from functions (tasks) to classes. An actor is essentially a stateful worker.
Specifying required resources¶
You can specify resource requirements in actors too (see the Actors section for more details.)
Calling the actor¶
We can interact with the actor by calling its methods with the
operator. We can then call
get on the object ref to retrieve the actual
Methods called on different actors can execute in parallel, and methods called on the same actor are executed serially in the order that they are called. Methods on the same actor will share state with one another, as shown below.
To learn more about Ray Actors, see the Actors section.