Antipattern: Unnecessary call of ray.get in a task¶
TLDR: Avoid calling
ray.get too frequently/for intermediate steps. Work with object references directly, and only call
ray.get at the end to get the final result.
ray.get is called, objects must be transferred to the worker/node that calls
ray.get. If you don’t need to manipulate the object in a task, you probably don’t need to call
ray.get on it!
Typically, it’s a best practice to wait as long as possible before calling
ray.get, or even design your program to avoid having to call
ray.get too soon.
Notice in the first example, we call
ray.get which forces us to transfer the large rollout to the driver, then to reducer after that.
In the fixed version, we only pass the reference to the object to the reducer. The
reducer automatically calls
ray.get once, which means the data is passed directly from
reduce, avoiding the driver.
@ray.remote def generate_rollout(): return np.ones((10000, 10000)) @ray.remote def reduce(rollout): return np.sum(rollout) # `ray.get` downloads the result here. rollout = ray.get(generate_rollout.remote()) # Now we have to reupload `rollout` reduced = ray.get(reduce.remote(rollout))
# Don't need ray.get here. rollout = generate_rollout.remote() # Rollout object is passed by reference. reduced = ray.get(reduce.remote(rollout))