ray.data.llm.HttpRequestProcessorConfig#

class ray.data.llm.HttpRequestProcessorConfig(*, batch_size: int = 64, resources_per_bundle: ~typing.Dict[str, float] | None = None, accelerator_type: str | None = None, concurrency: int | None = 1, experimental: ~typing.Dict[str, ~typing.Any] = <factory>, url: str, headers: ~typing.Dict[str, ~typing.Any] | None = None, qps: int | None = None, max_retries: int = 0, base_retry_wait_time_in_s: float = 1, session_factory: ~typing.Any | None = None)[source]#

The configuration for the HTTP request processor.

Parameters:

batch_size – The batch size to send to the HTTP request.
url – The URL to send the HTTP request to.
headers – The headers to send with the HTTP request.
concurrency – The number of concurrent requests to send.

Examples

import ray
from ray.data.llm import HttpRequestProcessorConfig, build_llm_processor

config = HttpRequestProcessorConfig(
    url="https://api.openai.com/v1/chat/completions",
    headers={"Authorization": "Bearer sk-..."},
    concurrency=1,
)
processor = build_llm_processor(
    config,
    preprocess=lambda row: dict(
        payload=dict(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": "You are a calculator"},
                {"role": "user", "content": f"{row['id']} ** 3 = ?"},
            ],
            temperature=0.3,
            max_tokens=20,
        ),
    ),
    postprocess=lambda row: dict(
        resp=row["http_response"]["choices"][0]["message"]["content"],
    ),
)

ds = ray.data.range(10)
ds = processor(ds)
for row in ds.take_all():
    print(row)

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'protected_namespaces': (), 'validate_assignment': True}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].