Actors#
Actors extend the Ray API from functions (tasks) to classes. An actor is essentially a stateful worker (or a service). When you instantiate a new actor, Ray creates a new worker and schedules methods of the actor on that specific worker. The methods can access and mutate the state of that worker.
The ray.remote
decorator indicates that instances of the Counter
class are actors. Each actor runs in its own Python process.
import ray
@ray.remote
class Counter:
def __init__(self):
self.value = 0
def increment(self):
self.value += 1
return self.value
def get_counter(self):
return self.value
# Create an actor from this class.
counter = Counter.remote()
Ray.actor
is used to create actors from regular Java classes.
// A regular Java class.
public class Counter {
private int value = 0;
public int increment() {
this.value += 1;
return this.value;
}
}
// Create an actor from this class.
// `Ray.actor` takes a factory method that can produce
// a `Counter` object. Here, we pass `Counter`'s constructor
// as the argument.
ActorHandle<Counter> counter = Ray.actor(Counter::new).remote();
ray::Actor
is used to create actors from regular C++ classes.
// A regular C++ class.
class Counter {
private:
int value = 0;
public:
int Increment() {
value += 1;
return value;
}
};
// Factory function of Counter class.
static Counter *CreateCounter() {
return new Counter();
};
RAY_REMOTE(&Counter::Increment, CreateCounter);
// Create an actor from this class.
// `ray::Actor` takes a factory method that can produce
// a `Counter` object. Here, we pass `Counter`'s factory function
// as the argument.
auto counter = ray::Actor(CreateCounter).Remote();
Use ray list actors
from State API to see actors states:
# This API is only available when you install Ray with `pip install "ray[default]"`.
ray list actors
======== List: 2023-05-25 10:10:50.095099 ========
Stats:
------------------------------
Total: 1
Table:
------------------------------
ACTOR_ID CLASS_NAME STATE JOB_ID NAME NODE_ID PID RAY_NAMESPACE
0 9e783840250840f87328c9f201000000 Counter ALIVE 01000000 13a475571662b784b4522847692893a823c78f1d3fd8fd32a2624923 38906 ef9de910-64fb-4575-8eb5-50573faa3ddf
Specifying required resources#
Specify resource requirements in actors. See Specifying Task or Actor Resource Requirements for more details.
# Specify required resources for an actor.
@ray.remote(num_cpus=2, num_gpus=0.5)
class Actor:
pass
// Specify required resources for an actor.
Ray.actor(Counter::new).setResource("CPU", 2.0).setResource("GPU", 0.5).remote();
// Specify required resources for an actor.
ray::Actor(CreateCounter).SetResource("CPU", 2.0).SetResource("GPU", 0.5).Remote();
Calling the actor#
You can interact with the actor by calling its methods with the remote
operator. You can then call get
on the object ref to retrieve the actual
value.
# Call the actor.
obj_ref = counter.increment.remote()
print(ray.get(obj_ref))
1
// Call the actor.
ObjectRef<Integer> objectRef = counter.task(&Counter::increment).remote();
Assert.assertTrue(objectRef.get() == 1);
// Call the actor.
auto object_ref = counter.Task(&Counter::increment).Remote();
assert(*object_ref.Get() == 1);
Methods called on different actors execute in parallel, and methods called on the same actor execute serially in the order you call them. Methods on the same actor share state with one another, as shown below.
# Create ten Counter actors.
counters = [Counter.remote() for _ in range(10)]
# Increment each Counter once and get the results. These tasks all happen in
# parallel.
results = ray.get([c.increment.remote() for c in counters])
print(results)
# Increment the first Counter five times. These tasks are executed serially
# and share state.
results = ray.get([counters[0].increment.remote() for _ in range(5)])
print(results)
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[2, 3, 4, 5, 6]
// Create ten Counter actors.
List<ActorHandle<Counter>> counters = new ArrayList<>();
for (int i = 0; i < 10; i++) {
counters.add(Ray.actor(Counter::new).remote());
}
// Increment each Counter once and get the results. These tasks all happen in
// parallel.
List<ObjectRef<Integer>> objectRefs = new ArrayList<>();
for (ActorHandle<Counter> counterActor : counters) {
objectRefs.add(counterActor.task(Counter::increment).remote());
}
// prints [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
System.out.println(Ray.get(objectRefs));
// Increment the first Counter five times. These tasks are executed serially
// and share state.
objectRefs = new ArrayList<>();
for (int i = 0; i < 5; i++) {
objectRefs.add(counters.get(0).task(Counter::increment).remote());
}
// prints [2, 3, 4, 5, 6]
System.out.println(Ray.get(objectRefs));
// Create ten Counter actors.
std::vector<ray::ActorHandle<Counter>> counters;
for (int i = 0; i < 10; i++) {
counters.emplace_back(ray::Actor(CreateCounter).Remote());
}
// Increment each Counter once and get the results. These tasks all happen in
// parallel.
std::vector<ray::ObjectRef<int>> object_refs;
for (ray::ActorHandle<Counter> counter_actor : counters) {
object_refs.emplace_back(counter_actor.Task(&Counter::Increment).Remote());
}
// prints 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
auto results = ray::Get(object_refs);
for (const auto &result : results) {
std::cout << *result;
}
// Increment the first Counter five times. These tasks are executed serially
// and share state.
object_refs.clear();
for (int i = 0; i < 5; i++) {
object_refs.emplace_back(counters[0].Task(&Counter::Increment).Remote());
}
// prints 2, 3, 4, 5, 6
results = ray::Get(object_refs);
for (const auto &result : results) {
std::cout << *result;
}
Passing around actor handles#
You can pass actor handles into other tasks. You can also define remote functions or actor methods that use actor handles.
import time
@ray.remote
def f(counter):
for _ in range(10):
time.sleep(0.1)
counter.increment.remote()
public static class MyRayApp {
public static void foo(ActorHandle<Counter> counter) throws InterruptedException {
for (int i = 0; i < 1000; i++) {
TimeUnit.MILLISECONDS.sleep(100);
counter.task(Counter::increment).remote();
}
}
}
void Foo(ray::ActorHandle<Counter> counter) {
for (int i = 0; i < 1000; i++) {
std::this_thread::sleep_for(std::chrono::milliseconds(100));
counter.Task(&Counter::Increment).Remote();
}
}
If you instantiate an actor, you can pass the handle around to various tasks.
counter = Counter.remote()
# Start some tasks that use the actor.
[f.remote(counter) for _ in range(3)]
# Print the counter value.
for _ in range(10):
time.sleep(0.1)
print(ray.get(counter.get_counter.remote()))
0
3
8
10
15
18
20
25
30
30
ActorHandle<Counter> counter = Ray.actor(Counter::new).remote();
// Start some tasks that use the actor.
for (int i = 0; i < 3; i++) {
Ray.task(MyRayApp::foo, counter).remote();
}
// Print the counter value.
for (int i = 0; i < 10; i++) {
TimeUnit.SECONDS.sleep(1);
System.out.println(counter.task(Counter::getCounter).remote().get());
}
auto counter = ray::Actor(CreateCounter).Remote();
// Start some tasks that use the actor.
for (int i = 0; i < 3; i++) {
ray::Task(Foo).Remote(counter);
}
// Print the counter value.
for (int i = 0; i < 10; i++) {
std::this_thread::sleep_for(std::chrono::seconds(1));
std::cout << *counter.Task(&Counter::GetCounter).Remote().Get() << std::endl;
}
Generators#
Ray is compatible with Python generator syntax. See Ray Generators for more details.
Cancelling actor tasks#
Cancel Actor Tasks by calling ray.cancel()
on the returned ObjectRef
.
import ray
import asyncio
import time
@ray.remote
class Actor:
async def f(self):
try:
await asyncio.sleep(5)
except asyncio.CancelledError:
print("Actor task canceled.")
actor = Actor.remote()
ref = actor.f.remote()
# Wait until task is scheduled.
time.sleep(1)
ray.cancel(ref)
try:
ray.get(ref)
except ray.exceptions.RayTaskError:
print("Object reference was cancelled.")
In Ray, Task cancellation behavior is contingent on the Task’s current state:
Unscheduled tasks:
If Ray hasn’t scheduled an Actor Task yet, Ray attempts to cancel the scheduling.
When Ray successfully cancels at this stage, it invokes ray.get(actor_task_ref)
which produces a TaskCancelledError
.
Running actor tasks (regular actor, threaded actor): For tasks classified as a single-threaded Actor or a multi-threaded Actor, Ray offers no mechanism for interruption.
Running async actor tasks:
For Tasks classified as async Actors <_async-actors>
, Ray seeks to cancel the associated asyncio.Task
.
This cancellation approach aligns with the standards presented in
asyncio task cancellation.
Note that asyncio.Task
won’t be interrupted in the middle of execution if you don’t await
within the async function.
Cancellation guarantee:
Ray attempts to cancel Tasks on a best-effort basis, meaning cancellation isn’t always guaranteed.
For example, if the cancellation request doesn’t get through to the executor,
the Task might not be cancelled.
You can check if a Task was successfully cancelled using ray.get(actor_task_ref)
.
Recursive cancellation:
Ray tracks all child and Actor Tasks. When the recursive=True
argument is given,
it cancels all child and Actor Tasks.
Scheduling#
For each actor, Ray chooses a node to run it on, and bases the scheduling decision on a few factors like the actor’s resource requirements and the specified scheduling strategy. See Ray scheduling for more details.
Fault Tolerance#
By default, Ray actors won’t be restarted and
actor tasks won’t be retried when actors crash unexpectedly.
You can change this behavior by setting
max_restarts
and max_task_retries
options
in ray.remote()
and .options()
.
See Ray fault tolerance for more details.
FAQ: Actors, Workers and Resources#
What’s the difference between a worker and an actor?
Each “Ray worker” is a python process.
Ray treats a worker differently for tasks and actors. For tasks, Ray uses a “Ray worker” to execute multiple Ray tasks. For actors, Ray starts a “Ray worker” as a dedicated Ray actor.
Tasks: When Ray starts on a machine, a number of Ray workers start automatically (1 per CPU by default). Ray uses them to execute tasks (like a process pool). If you execute 8 tasks with
num_cpus=2
, and total number of CPUs is 16 (ray.cluster_resources()["CPU"] == 16
), you end up with 8 of your 16 workers idling.Actor: A Ray Actor is also a “Ray worker” but you instantiate it at runtime with
actor_cls.remote()
. All of its methods run on the same process, using the same resources Ray designates when you define the Actor. Note that unlike tasks, Ray doesn’t reuse the Python processes that run Ray Actors. Ray terminates them when you delete the Actor.
To maximally utilize your resources, you want to maximize the time that your workers work. You also want to allocate enough cluster resources so Ray can run all of your needed actors and any other tasks you define. This also implies that Ray schedules tasks more flexibly, and that if you don’t need the stateful part of an actor, it’s better to use tasks.
Task Events#
By default, Ray traces the execution of actor tasks, reporting task status events and profiling events that Ray Dashboard and State API use.
You can disable task event reporting for the actor by setting the enable_task_events
option to False
in ray.remote()
and .options()
. This setting reduces the overhead of task execution by reducing the amount of data Ray sends to the Ray Dashboard.
You can also disable task event reporting for some actor methods by setting the enable_task_events
option to False
in ray.remote()
and .options()
on the actor method.
Method settings override the actor setting:
@ray.remote
class FooActor:
# Disable task events reporting for this method.
@ray.method(enable_task_events=False)
def foo(self):
pass
foo_actor = FooActor.remote()
ray.get(foo_actor.foo.remote())