ray.data.llm.Processor#

class ray.data.llm.Processor(config: ProcessorConfig, stages: List[StatefulStage], preprocess: Callable[[T], U] | Callable[[T], Iterator[U]] | _CallableClassProtocol | None = None, postprocess: Callable[[T], U] | Callable[[T], Iterator[U]] | _CallableClassProtocol | None = None)[source]#

A processor is composed of a preprocess stage, followed by one or more processing stages, and finally a postprocess stage. We use processor as a paradigm for processing data using LLMs.

Parameters:
  • config – The processor config.

  • preprocess – An optional lambda function that takes a row (dict) as input and returns a preprocessed row (dict). The output row must contain the required fields for the following processing stages.

  • postprocess – An optional lambda function that takes a row (dict) as input and returns a postprocessed row (dict).

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

Methods

get_stage_by_name

Get a particular stage by its name.

list_stage_names

List the stage names of this processor in order.

Attributes

data_column