Anti-pattern: Closure capturing large objects harms performance

TLDR: Avoid closure capturing large objects in remote functions or classes, use object store instead.

When you define a ray.remote() function or class, it is easy to accidentally capture large (more than a few MB) objects implicitly in the definition. This can lead to slow performance or even OOM since Ray is not designed to handle serialized functions or classes that are very large.

For such large objects, there are two options to resolve this problem:

  • Use ray.put() to put the large objects in the Ray object store, and then pass object references as arguments to the remote functions or classes (“better approach #1” below)

  • Create the large objects inside the remote functions or classes by passing a lambda method (“better approach #2”). This is also the only option for using unserializable objects.

Code example

Anti-pattern:

import ray
import numpy as np

ray.init()

large_object = np.zeros(10 * 1024 * 1024)


@ray.remote
def f1():
    return len(large_object)  # large_object is serialized along with f1!


ray.get(f1.remote())

Better approach #1:

large_object_ref = ray.put(np.zeros(10 * 1024 * 1024))


@ray.remote
def f2(large_object):
    return len(large_object)


# Large object is passed through object store.
ray.get(f2.remote(large_object_ref))

Better approach #2:

large_object_creator = lambda: np.zeros(10 * 1024 * 1024)  # noqa E731


@ray.remote
def f3():
    large_object = (
        large_object_creator()
    )  # Lambda is small compared with the large object.
    return len(large_object)


ray.get(f3.remote())