ray.serve.llm.configs.LoraConfig#

pydantic model ray.serve.llm.configs.LoraConfig[source]#

The configuration for loading an LLM model with LoRA.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

field download_timeout_s: float | None = 30.0#: How much time the download subprocess has to download a single LoRA before a timeout. None means no timeout.

field dynamic_lora_loading_path: str | None = None#: Cloud storage path where LoRA adapter weights are stored.

field max_download_tries: int = 3#: The maximum number of download retries.

field max_num_adapters_per_replica: PositiveInt = 16#

The maximum number of adapters load on each replica.

Constraints:

gt = 0

classmethod parse_yaml(file, **kwargs) → ModelT#

validator validate_dynamic_lora_loading_path » dynamic_lora_loading_path[source]#