End-to-end RAG Deep Dive#

This tutorial covers end-to-end Retrieval-Augmented Generation (RAG) pipelines using Ray, from data ingestion and LLM deployment to prompt engineering, evaluation and scaling out all workloads in the application.

Notebooks#

  1. 01_(Optional)_Regular_Document_Processing_Pipeline.ipynb
    Demonstrates a baseline document processing workflow for extracting, cleaning, and indexing text prior to RAG.

  2. 02_Scalable_RAG_Data_Ingestion_with_Ray_Data.ipynb
    Shows how to build a high-throughput data ingestion pipeline for RAG using Ray Data.

  3. 03_Deploy_LLM_with_Ray_Serve.ipynb
    Guides you through containerizing and serving a large language model at scale with Ray Serve.

  4. 04_Build_Basic_RAG_Chatbot
    Combines your indexed documents and served LLM to create a simple, interactive RAG chatbot.

  5. 05_Improve_RAG_with_Prompt_Engineering
    Explores prompt-engineering techniques to boost relevance and accuracy in RAG responses.

  6. 06_(Optional)_Evaluate_RAG_with_Online_Inference
    Provides methods to assess RAG quality in real time through live queries and metrics tracking.

  7. 07_Evaluate_RAG_with_Ray_Data_LLM_Batch_inference
    Implements large-scale batch evaluation of RAG outputs using Ray Data + LLM batch inference.


Note: Notebooks marked “(Optional)” cover complementary topics and can be skipped if you prefer to focus on the core RAG flow.