End-to-end RAG Deep Dive#
This tutorial covers end-to-end Retrieval-Augmented Generation (RAG) pipelines using Ray, from data ingestion and LLM deployment to prompt engineering, evaluation and scaling out all workloads in the application.
- End-to-end RAG Deep Dive
- Build a Regular RAG Document Ingestion Pipeline (No Ray required)
- Scalable RAG Data Ingestion and Pagination with Ray Data
- Deploy LLM with Ray Serve LLM
- Build Basic RAG App
- Improve RAG with Prompt Engineering
- Evaluate RAG with Online Inference
- Evaluate RAG using Batch Inference with Ray Data LLM
- How to Decide Between Online vs. Offline Inference for LLM
- Key Benefits of Using Batch Inference with Ray Data LLM
- Prerequisites
- Load the Evaluation Data
- Generating Embeddings from User Requests
- Querying the Vector Store and Generating Prompts
- Configuring and Running LLM Inference
- Saving the Batch Inference Results
- Visualize the Results
- Evaluate Results and Imrpove RAG Quality
- Final Notes