Ray Serve LLM API#

Builders#

build_vllm_deployment

Helper to build a single vllm deployment from the given llm config.

build_openai_app

Helper to build an OpenAI compatible app with the llm deployment setup from the given llm serving args.

Configs#

LLMConfig

The configuration for starting an LLM deployment.

LLMServingArgs

The configuration for starting an LLM deployment application.

ModelLoadingConfig

The configuration for loading an LLM model.

S3MirrorConfig

The configuration for mirroring an LLM model from S3.

S3AWSCredentials

The configuration for loading an LLM model from S3.

GCSMirrorConfig

The configuration for mirroring an LLM model from GCS.

LoraConfig

The configuration for loading an LLM model with LoRA.

Deployments#

VLLMService

The implementation of the VLLM engine deployment.

LLMRouter

The implementation of the OpenAI compatiple model router.

OpenAI API Models#

ChatCompletionRequest

ChatCompletionRequest is the request body for the chat completion API.

CompletionRequest

CompletionRequest is the request body for the completion API.

ChatCompletionStreamResponse

ChatCompletionStreamResponse is the response body for the chat completion API.

ChatCompletionResponse

ChatCompletionResponse is the response body for the chat completion API.

CompletionStreamResponse

CompletionStreamResponse is the response body for the completion API.

CompletionResponse

CompletionResponse is the response body for the completion API.

ErrorResponse

The returned response in case of an error.