ray.serve.llm.LLMServer.transcriptions#
- async LLMServer.transcriptions(request: TranscriptionRequest) AsyncGenerator[List[str | ErrorResponse] | TranscriptionResponse, None][source]#
Runs an transcriptions request to the engine and returns the response.
Returns an AsyncGenerator over the TranscriptionResponse object. This is so that the caller can have a consistent interface across all the methods of chat, completions, embeddings and transcriptions.
- Parameters:
request – An TranscriptionRequest object.
- Returns:
An AsyncGenerator over the TranscriptionResponse object.