Ray Serve LLM API#
Builders#
Helper to build a single vllm deployment from the given llm config. |
|
Helper to build an OpenAI compatible app with the llm deployment setup from the given llm serving args. |
Configs#
The configuration for starting an LLM deployment. |
|
The configuration for starting an LLM deployment application. |
|
The configuration for loading an LLM model. |
|
The configuration for mirroring an LLM model from S3. |
|
The configuration for loading an LLM model from S3. |
|
The configuration for mirroring an LLM model from GCS. |
|
The configuration for loading an LLM model with LoRA. |
Deployments#
The implementation of the VLLM engine deployment. |
|
The implementation of the OpenAI compatiple model router. |
OpenAI API Models#
ChatCompletionRequest is the request body for the chat completion API. |
|
CompletionRequest is the request body for the completion API. |
|
ChatCompletionStreamResponse is the response body for the chat completion API. |
|
ChatCompletionResponse is the response body for the chat completion API. |
|
CompletionStreamResponse is the response body for the completion API. |
|
CompletionResponse is the response body for the completion API. |
|
The returned response in case of an error. |