ray.serve.batch#

ray.serve.batch(_sync_func: Callable[[List[T]], List[R]], /) Callable[[T], R][source]#
ray.serve.batch(_async_func: Callable[[List[T]], Coroutine[Any, Any, List[R]]], /) Callable[[T], Coroutine[Any, Any, R]]
ray.serve.batch(_sync_meth: _SyncBatchingMethod[SelfType, T, R], /) Callable[[SelfType, T], R]
ray.serve.batch(_async_meth: _AsyncBatchingMethod[SelfType, T, R], /) Callable[[SelfType, T], Coroutine[Any, Any, R]]
ray.serve.batch(_: Literal[None] = None, /, max_batch_size: int = 10, batch_wait_timeout_s: float = 0.01, max_concurrent_batches: int = 1, batch_size_fn: Callable[[List], int] | None = None) _BatchDecorator

Converts a function to asynchronously handle batches.

The function can be a standalone function or a class method. In both cases, the function must be async def and take a list of objects as its sole argument and return a list of the same length as a result.

When invoked, the caller passes a single object. These will be batched and executed asynchronously once there is a batch of max_batch_size or batch_wait_timeout_s has elapsed, whichever occurs first.

max_batch_size and batch_wait_timeout_s can be updated using setter methods from the batch_handler (set_max_batch_size and set_batch_wait_timeout_s).

Example:

from ray import serve
from starlette.requests import Request

@serve.deployment
class BatchedDeployment:
    @serve.batch(max_batch_size=10, batch_wait_timeout_s=0.1)
    async def batch_handler(self, requests: List[Request]) -> List[str]:
        response_batch = []
        for r in requests:
            name = (await requests.json())["name"]
            response_batch.append(f"Hello {name}!")

        return response_batch

    def update_batch_params(self, max_batch_size, batch_wait_timeout_s):
        self.batch_handler.set_max_batch_size(max_batch_size)
        self.batch_handler.set_batch_wait_timeout_s(batch_wait_timeout_s)

    async def __call__(self, request: Request):
        return await self.batch_handler(request)

app = BatchedDeployment.bind()
Parameters:
  • max_batch_size – the maximum batch size that will be executed in one call to the underlying function.

  • batch_wait_timeout_s – the maximum duration to wait for max_batch_size elements before running the current batch.

  • max_concurrent_batches – the maximum number of batches that can be executed concurrently. If the number of concurrent batches exceeds this limit, the batch handler will wait for a batch to complete before sending the next batch to the underlying function.

  • batch_size_fn – optional function to compute the effective batch size. If provided, this function takes a list of items and returns an integer representing the batch size. This is useful for batching based on custom metrics such as total nodes in graphs, total tokens in sequences, or other domain-specific measures. If None, the batch size is computed as len(batch).