Ray Serve LLM API#

Builders#

`build_vllm_deployment`	Helper to build a single vllm deployment from the given llm config.
`build_openai_app`	Helper to build an OpenAI compatible app with the llm deployment setup from the given llm serving args.

Configs#

`LLMConfig`	The configuration for starting an LLM deployment.
`LLMServingArgs`	The configuration for starting an LLM deployment application.
`ModelLoadingConfig`	The configuration for loading an LLM model.
`S3MirrorConfig`	The configuration for mirroring an LLM model from S3.
`S3AWSCredentials`	The configuration for loading an LLM model from S3.
`GCSMirrorConfig`	The configuration for mirroring an LLM model from GCS.
`LoraConfig`	The configuration for loading an LLM model with LoRA.

Deployments#

`VLLMService`	The implementation of the VLLM engine deployment.
`LLMRouter`	The implementation of the OpenAI compatiple model router.

OpenAI API Models#

`ChatCompletionRequest`	ChatCompletionRequest is the request body for the chat completion API.
`CompletionRequest`	CompletionRequest is the request body for the completion API.
`ChatCompletionStreamResponse`	ChatCompletionStreamResponse is the response body for the chat completion API.
`ChatCompletionResponse`	ChatCompletionResponse is the response body for the chat completion API.
`CompletionStreamResponse`	CompletionStreamResponse is the response body for the completion API.
`CompletionResponse`	CompletionResponse is the response body for the completion API.
`ErrorResponse`	The returned response in case of an error.