ray.serve.llm.configs.LoraConfig#

pydantic model ray.serve.llm.configs.LoraConfig[source]#

The configuration for loading an LLM model with LoRA.

PublicAPI (alpha): This API is in alpha and may change before becoming stable.

field download_timeout_s: float | None = 30.0#

How much time the download subprocess has to download a single LoRA before a timeout. None means no timeout.

field dynamic_lora_loading_path: str | None = None#

Cloud storage path where LoRA adapter weights are stored.

field max_download_tries: int = 3#

The maximum number of download retries.

field max_num_adapters_per_replica: PositiveInt = 16#

The maximum number of adapters load on each replica.

Constraints:
  • gt = 0

classmethod parse_yaml(file, **kwargs) ModelT#
validator validate_dynamic_lora_loading_path  »  dynamic_lora_loading_path[source]#