ray.data.llm.ProcessorConfig#
- class ray.data.llm.ProcessorConfig(*, batch_size: int = 32, resources_per_bundle: ~typing.Dict[str, float] | None = None, accelerator_type: str | None = None, concurrency: int | ~typing.Tuple[int, int] = 1, experimental: ~typing.Dict[str, ~typing.Any] = <factory>)[source]#
The processor configuration.
- Parameters:
batch_size – Configures batch size for the processor. Large batch sizes are likely to saturate the compute resources and could achieve higher throughput. On the other hand, small batch sizes are more fault-tolerant and could reduce bubbles in the data pipeline. You can tune the batch size to balance the throughput and fault-tolerance based on your use case.
resources_per_bundle – The resource bundles for placement groups. You can specify a custom device label e.g. {‘NPU’: 1}. The default resource bundle for LLM Stage is always a GPU resource i.e. {‘GPU’: 1}.
accelerator_type – The accelerator type used by the LLM stage in a processor. Default to None, meaning that only the CPU will be used.
concurrency – The number of workers for data parallelism. Default to 1. If
concurrency
is atuple
(m, n)
, Ray creates an autoscaling actor pool that scales betweenm
andn
workers (1 <= m <= n
). Ifconcurrency
is anint
n
, Ray uses either a fixed pool ofn
workers or an autoscaling pool from1
ton
workers, depending on the processor and stage.
PublicAPI (alpha): This API is in alpha and may change before becoming stable.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'protected_namespaces': (), 'validate_assignment': True}#
Configuration for the model, should be a dictionary conforming to [
ConfigDict
][pydantic.config.ConfigDict].