Build Basic RAG App#
This tutorial demonstrates a complete Retrieval-Augmented Generation (RAG) system using Ray ecosystem components for scalable AI workflows.
Since we have already built the data ingestion pipeline in notebook #2, we’ll just show how to buid the user query pipeline for RAG.
It showcases embedding generation, vector search, context-aware prompting, and streaming response delivery—all in a single executable pipeline.
Here is the architecture diagram:

Note: This tutorial is optimized for the Anyscale platform. When running on open source Ray, additional configuration is required. For example, you’ll need to manually:
- Configure your Ray Cluster: Set up your multi-node environment (including head and worker nodes) and manage resource allocation (e.g., autoscaling, GPU/CPU assignments) without the Anyscale automation. See the Ray Cluster Setup documentation for details: https://docs.ray.io/en/latest/cluster/getting-started.html.
- Manage Dependencies: Install and manage dependencies on each node since you won’t have Anyscale’s Docker-based dependency management. Refer to the Ray Installation Guide for instructions on installing and updating Ray in your environment: https://docs.ray.io/en/latest/ray-core/handling-dependencies.html.
- Set Up Storage: Configure your own distributed or shared storage system (instead of relying on Anyscale’s integrated cluster storage). Check out the Ray Cluster Configuration guide for suggestions on setting up shared storage solutions: https://docs.ray.io/en/latest/train/user-guides/persistent-storage.html.
Prerequisites#
Before you move on to the next steps, please make sure you have all the required prerequisites in place.
Verify the LLM service#
First, let’s first verify the LLM service is available
from rag_utils import LLMClient
# Initialize client
model_id='Qwen/Qwen2.5-32B-Instruct' ## model id need to be same as your deployment
base_url = "https://llm-service-qwen-32b-jgz99.cld-kvedzwag2qa8i5bj.s.anyscaleuserdata.com/" ## replace with your own service base url
api_key = "a1ndpMKaXi76sTIfr_afmx8HynFA1fg-TGaZ2gUuDG0" ## replace with your own api key
client = LLMClient(base_url=base_url, api_key=api_key, model_id=model_id)
prompt = "what is anyscale jobs"
print("Model response:")
for token in client.get_response_streaming(prompt, temperature=0):
print(token, end="")
print() # For newline after the streamed response
Model response:
Anyscale Jobs likely refers to job opportunities or roles within the context of Anyscale, a company that specializes in scalable computing solutions. Anyscale is known for developing Ray, an open-source framework for building distributed applications. The company focuses on making it easier for developers and researchers to scale their applications and machine learning models across multiple machines.
If you're looking for information on specific job openings at Anyscale, you would typically find these on the company's official website under a "Careers" or "Jobs" section, or on popular job listing platforms. Positions might include roles in software engineering, machine learning, data science, and other technical fields, given the company's focus on scalable computing and distributed systems.
Observation and Why we need RAG?#
The response refers to job openings at Anyscale instead of explaining what “Anyscale jobs” are in the context of the platform: https://docs.anyscale.com/platform/jobs/
This demonstrates why RAG is needed—by integrating domain-specific context, we help the LLM produce more accurate answers.
Build the embedder#
Similar to previous tutorials, we use the SentenceTransformer
library to convert text strings into numerical embeddings. The embedder automatically utilizes a CUDA-enabled GPU if available, making it efficient for both single and batch processing.
from typing import Dict, List, Union
import torch
import numpy as np
from sentence_transformers import SentenceTransformer
class Embedder:
def __init__(self, model_name: str = "intfloat/multilingual-e5-large-instruct"):
self.model_name = model_name
self.model = SentenceTransformer(
self.model_name,
device="cuda" if torch.cuda.is_available() else "cpu"
)
def embed_single(self, text: str) -> np.ndarray:
"""Generate an embedding for a single text string."""
return self.model.encode(text, convert_to_numpy=True)
def embed_batch(self, texts: List[str]) -> np.ndarray:
"""Generate embeddings for a batch (list) of text strings."""
return self.model.encode(texts, convert_to_numpy=True)
Query the Chroma DB#
Similar to ChromaWrite
class in previous tutotials, we define a ChromaQuerier
class that acts as an interface to a Chroma vector store, enabling efficient retrieval of document chunks based on similarity to a provided query embedding.
It processes raw results by reformatting and filtering them according to a defined score threshold, ensuring that only the most relevant information is returned. As part of a Retrieval-Augmented Generation (RAG) workflow, this setup helps integrate precise, contextually significant data into subsequent generation steps.
The score_threshold
parameter sets the minimum acceptable similarity score for a result to be considered relevant. In this code, each result’s score is calculated as 1 minus its distance, meaning that lower distances (indicating higher similarity) result in higher scores. By filtering out any results with scores below the score_threshold (defaulted to 0.8), the code ensures that only the most contextually relevant documents are returned.
We also implement two special method , __getstate__
and __setstate__
, which are special hooks in Python’s pickling protocol :
__getstate__
: Prepares the object for pickling by removing attributes that can’t be serialized, ensuring that only the essential state is saved.__setstate__
: Rebuilds the object after unpickling by restoring its state and reinitializing the unpickleable components so that the object remains fully functional.
These two functions will prevent the error such as TypeError: cannot pickle 'weakref.ReferenceType' object
when you use map_batches
during batching processing with Ray data.
from pprint import pprint
import chromadb
class ChromaQuerier:
"""
A class to query a Chroma database collection and return formatted search results.
"""
def __init__(
self,
chroma_path: str,
chroma_collection_name: str,
score_threshold: float = 0.8 # Define a default threshold value if needed.
):
"""
Initialize the ChromaQuerier with the specified Chroma DB settings and score threshold.
"""
self.chroma_path = chroma_path
self.chroma_collection_name = chroma_collection_name
self.score_threshold = score_threshold
# Initialize the persistent client and collection.
self._init_chroma_client()
def _init_chroma_client(self):
"""
Initialize or reinitialize the Chroma client and collection.
"""
self.chroma_client = chromadb.PersistentClient(path=self.chroma_path)
self.collection = self.chroma_client.get_or_create_collection(name=self.chroma_collection_name)
def __getstate__(self):
"""
Customize pickling by excluding the unpickleable Chroma client and collection.
"""
state = self.__dict__.copy()
state.pop("chroma_client", None)
state.pop("collection", None)
return state
def __setstate__(self, state):
"""
Restore the state and reinitialize the Chroma client and collection.
"""
self.__dict__.update(state)
self._init_chroma_client()
def _reformat(self, chroma_results: dict) -> list:
"""
Reformat Chroma DB results into a flat list of dictionaries.
"""
reformatted = []
metadatas = chroma_results.get("metadatas", [])
documents = chroma_results.get("documents", [])
distances = chroma_results.get("distances", [])
chunk_index = 1
for meta_group, doc_group, distance_group in zip(metadatas, documents, distances):
for meta, text, distance in zip(meta_group, doc_group, distance_group):
entry = {
"chunk_index": chunk_index,
"chunk_id": meta.get("chunk_id"),
"doc_id": meta.get("doc_id"),
"page_number": meta.get("page_number"),
"source": meta.get("source"),
"text": text,
"distance": distance,
"score": 1 - distance
}
reformatted.append(entry)
chunk_index += 1
return reformatted
def _reformat_batch(self, chroma_results: dict) -> list:
"""
Reformat batch Chroma DB results into a list where each element corresponds
to a list of dictionaries for each query embedding.
"""
batch_results = []
metadatas = chroma_results.get("metadatas", [])
documents = chroma_results.get("documents", [])
distances = chroma_results.get("distances", [])
for meta_group, doc_group, distance_group in zip(metadatas, documents, distances):
formatted_results = []
chunk_index = 1 # Reset index for each query result.
for meta, text, distance in zip(meta_group, doc_group, distance_group):
entry = {
"chunk_index": chunk_index,
"chunk_id": meta.get("chunk_id"),
"doc_id": meta.get("doc_id"),
"page_number": meta.get("page_number"),
"source": meta.get("source"),
"text": text,
"distance": distance,
"score": 1 - distance
}
formatted_results.append(entry)
chunk_index += 1
batch_results.append(formatted_results)
return batch_results
def _filter_by_score(self, results: list) -> list:
"""
Filter out results with a score lower than the specified threshold.
"""
return [result for result in results if result["score"] >= self.score_threshold]
def query(self, query_embedding, n_results: int = 3) -> list:
"""
Query the Chroma collection for the top similar documents based on the provided embedding.
The results are filtered based on the score threshold.
Parameters:
query_embedding (list or np.ndarray): The input embedding vector.
n_results (int): Number of top similar results to return.
Returns:
list: A list of formatted and filtered search result dictionaries.
"""
# Convert numpy array to list if necessary.
if isinstance(query_embedding, np.ndarray):
query_embedding = query_embedding.tolist()
results = self.collection.query(
query_embeddings=query_embedding,
n_results=n_results,
include=["documents", "metadatas", "distances"]
)
formatted_results = self._reformat(results)
filtered_results = self._filter_by_score(formatted_results)
return filtered_results
def query_batch(self, query_embeddings, n_results: int = 3) -> list:
"""
Query the Chroma collection for the top similar documents for a batch of embeddings.
Each query embedding in the input list returns its own set of results, filtered based on the score threshold.
Parameters:
query_embeddings (list): A list of embeddings (each as a list or np.ndarray).
n_results (int): Number of top similar results to return for each query embedding.
Returns:
list: A list where each element is a list of formatted and filtered search result dictionaries
for the corresponding query embedding.
"""
# Process each embedding: if any is a numpy array, convert it to list.
processed_embeddings = [
emb.tolist() if isinstance(emb, np.ndarray) else emb
for emb in query_embeddings
]
# Query the collection with the batch of embeddings.
results = self.collection.query(
query_embeddings=processed_embeddings,
n_results=n_results,
include=["documents", "metadatas", "distances"]
)
# Reformat the results into batches.
batch_results = self._reformat_batch(results)
# Filter each query's results based on the score threshold.
filtered_batch = [self._filter_by_score(results) for results in batch_results]
return filtered_batch
Render the basic RAG prompt#
Create a prompt that includes the retrieved context and the user’s question. This prompt guides the LLM to produce an answer that is informed by the context.
We are using a bisc prompt from langchain’s rag bot: https://python.langchain.com/docs/tutorials/rag/
Note: The quality of the prompt is not optimal; in our next tutorial on prompt engineering, we will demonstrate how to improve it.
def render_basic_rag_prompt(user_request, context):
prompt = f"""Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.
{context}
Question: {user_request}
Helpful Answer:"""
return prompt.strip()
Connect the Components of RAG#
Integrate the embedder, Chroma querier, and LLM service into a complete RAG pipeline.
EMBEDDER_MODEL_NAME = "intfloat/multilingual-e5-large-instruct"
CHROMA_PATH = "/mnt/cluster_storage/vector_store"
CHROMA_COLLECTION_NAME = "anyscale_jobs_docs_embeddings"
# Initialize the querier.
querier = ChromaQuerier(CHROMA_PATH, CHROMA_COLLECTION_NAME, score_threshold=0.8)
embedder = Embedder(EMBEDDER_MODEL_NAME)
# Perform the user request.
user_request = "what is a anyscale job"
embedding = embedder.embed_single(user_request)
formatted_results = querier.query(embedding, n_results=10)
# Print the formatted query results.
print("Query Results:")
pprint(formatted_results)
# Render the prompt.
prompt = render_basic_rag_prompt(user_request, context=formatted_results)
print("Rendered Prompt:")
pprint(prompt)
Query Results:
[{'chunk_id': '55b8db89-0aa6-460c-ae3a-284241f5865c',
'chunk_index': 1,
'distance': 0.10053980350494385,
'doc_id': '7b170e3d-4081-4527-a737-1da022a364c4',
'page_number': 1,
'score': 0.8994601964950562,
'source': 'anyscale-rag-application/100-docs/Jobs.txt',
'text': '2/12/25, 9:48 AM Jobs | Anyscale Docs Jobs Run discrete workloads '
'in production such as batch inference, bulk embeddings generation, '
'or model fine-tuning. Anyscale Jobs allow you to submit '
'applications developed on workspaces to a standalone Ray cluster '
'for execution. Built for production and designed to fit into your '
'CI/CD pipeline, jobs ensure scalable and reliable performance. How '
'does it work? # When you’re ready to promote an app to production, '
'submit a job from the workspace using anyscale job submit . '
'Anyscale Jobs have the following features: Scalability: Rapid '
'scaling to thousands of cloud instances, adjusting computing '
'resources to match application demand. Fault tolerance: Retries for '
'failures and automatic rescheduling to an alternative cluster for '
'unexpected failures like running out of memory. Monitoring and '
'observability: Persistent dashboards that allow you to observe '
'tasks in real time and email alerts upon successf ul job '
'completion. Get started 1. Sign in or sign up for an account. 2. '
'Select the Intro to Jobs example. 3. Select Launch. This example '
'runs in a Workspace. See Workspaces for background information. 4. '
'Follow the notebook or view it in the docs. 5. Terminate the '
"Workspace when you're done. Ask AI "
'https://docs.anyscale.com/platform/jobs/ 1/2 2/12/25, 9:48 AM Jobs '
'| Anyscale Docs https://docs.anyscale.com/platform/jobs/ 2/2'},
{'chunk_id': 'fdd75966-04d9-43d2-8e3e-b85e8387d086',
'chunk_index': 2,
'distance': 0.11302393674850464,
'doc_id': '82efb8cd-2181-47cb-9389-ff101cc68674',
'page_number': 2,
'score': 0.8869760632514954,
'source': 'anyscale-rag-application/100-docs/Create_and_manage_jobs.pdf',
'text': '2/12/25, 9:48 AM Create and manage jobs | Anyscale Docs Defining a '
'job With the CLI, you can define jobs in a YAML file and submit '
'them by referencing the YAML: anyscale job submit --config-file '
'config.yaml For an example of defining a job in a YAML, see the '
'reference docs. Waiting on a job You can block CLI and SDK commands '
'until a job enters a specified state. By default, '
'JobState.SUCCEEDED is used. See all available states in the '
'reference docs. CLI Python SDK anyscale job wait -n job-wait When '
'you submit a job, you can specify --wait , which waits for the job '
'to succeed or exits if the job fails. anyscale job submit -n '
'job-wait --wait -- sleep 30 For more information on submitting jobs '
'with the CLI, see the reference docs. Terminating a job You can '
'terminate a job from the Job page or using the CLI/SDK: CLI Python '
'SDK https://docs.anyscale.com/platform/jobs/manage-jobs 2/5'},
{'chunk_id': '23bf61db-155a-4a1a-a74f-669409c92303',
'chunk_index': 3,
'distance': 0.11444741487503052,
'doc_id': '91ddd49b-9e23-4e58-9143-bbac61ee2157',
'page_number': 1,
'score': 0.8855525851249695,
'source': 'anyscale-rag-application/100-docs/Job_schedules.html',
'text': 'anyscale.job.terminate(name="my-job") For more information on '
'terminating jobs with the SDK, see the reference docs. Archiving a '
'job\u200b Archiving jobs hide them from the job list page, but you '
'can still access them through the CLI and SDK. The cluster '
'associated with an archived job is archived automatically. To be '
'archived, jobs must be in a terminal state. You must have created '
'the job or be an organization admin to archive the job. You can '
'archive jobs in Anyscale console or through the CLI/SDK: CLI Python '
"SDK anyscale job archive --id 'prodjob_...' For more information on "
'archiving jobs with the CLI, see the reference docs. import '
'anyscale\n'
'\n'
'anyscale.job.archive(name="my-job") For more information on '
'archiving jobs with the SDK, see the reference docs. Managing '
'dependencies\u200b When developing Anyscale jobs, you may need to '
'include additional Python packages or system-level dependencies. '
'There are several ways to manage these dependencies: Using a '
'requirements.txt file\u200b The simplest way to manage Python '
'package dependencies is by using a requirements.txt file. Create a '
'requirements.txt file in your project directory: emoji==2.12.1\n'
'numpy==1.21.0 When submitting your job, include the -r or '
'--requirements flag: CLI Python SDK anyscale job submit '
'--config-file job.yaml -r ./requirements.txt import anyscale\n'
'from anyscale.job.models import JobConfig\n'
'\n'
'config = JobConfig(\n'
' name="my-job",\n'
' entrypoint="python main.py",\n'
' working_dir=".",\n'
' requirements="./requirements.txt"\n'
')\n'
'\n'
'anyscale.job.submit(config) This method works well for '
'straightforward Python package dependencies. Anyscale installs '
"these packages in the job's environment before running your code. "
'Using a custom container\u200b For more complex dependency '
'management, including system-level packages or specific environment '
'configurations, use a custom container: Create a Dockerfile: FROM '
'anyscale/ray:2.10.0-py310\n'
'\n'
'# Install system dependencies if needed\n'
'RUN apt-get update && apt-get install -y <your-system-packages>'},
{'chunk_id': '13683649-bcdc-444d-866f-36cafcb132c9',
'chunk_index': 4,
'distance': 0.11574643850326538,
'doc_id': '91ddd49b-9e23-4e58-9143-bbac61ee2157',
'page_number': 1,
'score': 0.8842535614967346,
'source': 'anyscale-rag-application/100-docs/Job_schedules.html',
'text': 'Create and manage jobs Submitting a job\u200b To submit your job to '
'Anyscale, use the Python SDK or CLI and pass in any additional '
'options or configurations for the job. By default, Anyscale uses '
'your workspace or cloud to provision a cluster to run your job. You '
'can define a custom cluster through a compute config or specify an '
'existing cluster. Once submitted, Anyscale runs the job as '
'specified in the entrypoint command, which is typically a Ray Job. '
"If the run doesn't succeed, the job restarts using the same "
'entrypoint up to the number of max_retries. CLI Python SDK anyscale '
'job submit --name=my-job \\\n'
' --working-dir=. --max-retries=5 \\\n'
' --image-uri="anyscale/image/IMAGE_NAME:VERSION" \\\n'
' --compute-config=COMPUTE_CONFIG_NAME \\\n'
' -- python main.py With the CLI, you can either specify an '
'existing compute config with --compute-config=COMPUTE_CONFIG_NAME '
'or define a new one in a job YAML. For more information on '
'submitting jobs with the CLI, see the reference docs. import '
'anyscale\n'
'from anyscale.job.models import JobConfig\n'
'\n'
'config = JobConfig(\n'
' name="my-job",\n'
' entrypoint="python main.py",\n'
' working_dir=".",\n'
' max_retries=5,\n'
' image_uri="anyscale/image/IMAGE_NAME:VERSION",\n'
' compute_config="COMPUTE_CONFIG_NAME"\n'
')'},
{'chunk_id': '3167ba0b-2aff-4f69-8dbe-7a8cdc725005',
'chunk_index': 5,
'distance': 0.11749774217605591,
'doc_id': '7a7730fa-96a1-4775-896a-11deac94d668',
'page_number': 1,
'score': 0.8825022578239441,
'source': 'anyscale-rag-application/100-docs/Job_queues.pptx',
'text': '2/12/25, 9:48 AM\tJob queues | Anyscale Docs Job queues A job queue '
'enables sophisticated scheduling and execution algorithms for '
'Anyscale Jobs. This feature improves resource utilization and '
'reduces provisioning times by enabling multiple jobs to share a '
'single cluster. Anyscale supports flexible scheduling algorithms, '
'including FIFO (first-in, first-out), LIFO (last-in, first-out), '
'and priority-based scheduling. Job processing Anyscale job queues '
'optimize resource utilization and throughput by using sophisticated '
'scheduling to run multiple jobs on the same cluster. Submission: '
'The typical Anyscale Job submission workflow adds the job to the '
'specified queue. Scheduling: Based on the scheduling policy, '
'Anyscale determines ordering of the jobs in the queue and picks '
'jobs at the top of the queue for scheduling. Anyscale schedules no '
'more than the specified max-concurrency jobs for running on a '
'cluster at the same time. Execution: Jobs run until completion, '
'including retries up to the specified number of max_retries . '
'Anyscale provisions a cluster when you submit the first job in a '
'queue, and continues running until there are no more jobs in the '
'queue and it idles. Create a job queue Creating a job queue is '
'similar to creating a standalone Anyscale Job. In your job.yaml '
'file, specify additional job queue configurations: CLI\tPython SDK '
'Ask AI https://docs.anyscale.com/platform/jobs/job-queues 1/5'},
{'chunk_id': '8c158e24-22e2-4984-a402-2331cf0896fc',
'chunk_index': 6,
'distance': 0.12043756246566772,
'doc_id': '82efb8cd-2181-47cb-9389-ff101cc68674',
'page_number': 1,
'score': 0.8795624375343323,
'source': 'anyscale-rag-application/100-docs/Create_and_manage_jobs.pdf',
'text': '2/12/25, 9:48 AM Create and manage jobs | Anyscale Docs Create and '
'manage jobs Submitting a job To submit your job to Anyscale, use '
'the Python SDK or CLI and pass in any additional options or '
'configurations for the job. By default, Anyscale uses your '
'workspace or cloud to provision a cluster to run your job. You can '
'define a custom cluster through a compute config or specify an '
'existing cluster. Once submitted, Anyscale runs the job as '
'specified in the entrypoint command, which is typically a Ray Job. '
"If the run doesn't succeed, the job restarts using the same "
'entrypoint up to the number of max_retries . CLI Python SDK '
'anyscale job submit --name=my-job \\ --working-dir=. '
'--max-retries=5 \\ --image-uri="anyscale/image/IMAGE_NAME:VERSION" '
'\\ --compute-config=COMPUTE_CONFIG_NAME \\ -- python main.py With '
'the CLI, you can either specify an existing compute config with '
'--compute- config=COMPUTE_CONFIG_NAME or define a new one in a job '
'YAML. For more information on submitting jobs with the CLI, see the '
'reference docs. TIP For large-scale, compute-intensive jobs, avoid '
'scheduling Ray tasks onto the head node because it manages '
'cluster-level orchestration. To do that, set the CPU resource on '
'the head node to 0 in your compute config. Ask AI '
'https://docs.anyscale.com/platform/jobs/manage-jobs 1/5'},
{'chunk_id': '2d0623d0-911f-471a-9649-a61c0b2db411',
'chunk_index': 7,
'distance': 0.12314975261688232,
'doc_id': '82efb8cd-2181-47cb-9389-ff101cc68674',
'page_number': 5,
'score': 0.8768502473831177,
'source': 'anyscale-rag-application/100-docs/Create_and_manage_jobs.pdf',
'text': '2/12/25, 9:48 AM Create and manage jobs | Anyscale Docs Using '
'pre-built custom images For frequently used environments, you can '
'build and reuse custom images: 1. Build the image: CLI Python SDK '
'anyscale image build -n my-custom-image --containerfile Dockerfile '
'2. Use the built image in your job submission: CLI Python SDK '
'anyscale job submit --config-file job.yaml --image-uri '
'anyscale/image/my- custom-image:1 This approach is efficient for '
'teams working on multiple jobs that share the same dependencies. '
'https://docs.anyscale.com/platform/jobs/manage-jobs 5/5'},
{'chunk_id': '96285d69-2f27-4fb6-81be-31a4ededffba',
'chunk_index': 8,
'distance': 0.1248595118522644,
'doc_id': '82efb8cd-2181-47cb-9389-ff101cc68674',
'page_number': 3,
'score': 0.8751404881477356,
'source': 'anyscale-rag-application/100-docs/Create_and_manage_jobs.pdf',
'text': '2/12/25, 9:48 AM Create and manage jobs | Anyscale Docs anyscale '
"job terminate --id 'prodjob_...' For more information on "
'terminating jobs with the CLI, see the reference docs. Archiving a '
'job Archiving jobs hide them from the job list page, but you can '
'still access them through the CLI and SDK. The cluster associated '
'with an archived job is archived automatically. To be archived, '
'jobs must be in a terminal state. You must have created the job or '
'be an organization admin to archive the job. You can archive jobs '
'in Anyscale console or through the CLI/SDK: CLI Python SDK anyscale '
"job archive --id 'prodjob_...' For more information on archiving "
'jobs with the CLI, see the reference docs. Managing dependencies '
'When developing Anyscale jobs, you may need to include additional '
'Python packages or system- level dependencies. There are several '
'ways to manage these dependencies: Using a requirements.txt file '
'The simplest way to manage Python package dependencies is by using '
'a requirements.txt file. 1. Create a requirements.txt file in your '
'project directory: emoji==2.12.1 numpy==1.21.0 '
'https://docs.anyscale.com/platform/jobs/manage-jobs 3/5'},
{'chunk_id': 'd7c57ead-0228-4c49-bb77-37b67aed463d',
'chunk_index': 9,
'distance': 0.12725389003753662,
'doc_id': '91ddd49b-9e23-4e58-9143-bbac61ee2157',
'page_number': 1,
'score': 0.8727461099624634,
'source': 'anyscale-rag-application/100-docs/Job_schedules.html',
'text': 'config = JobConfig(\n'
' name="my-job",\n'
' entrypoint="python main.py",\n'
' working_dir=".",\n'
' max_retries=5,\n'
' image_uri="anyscale/image/IMAGE_NAME:VERSION",\n'
' compute_config="COMPUTE_CONFIG_NAME"\n'
')\n'
'\n'
'anyscale.job.submit(config) With the SDK, you can either specify an '
'existing compute config or define a new one using the compute '
'config API. For more information on submitting jobs with the SDK, '
'see the reference docs. For a complete list of supported options '
'defining a JobConfig, see the reference docs for JobConfig. tip For '
'large-scale, compute-intensive jobs, avoid scheduling Ray tasks '
'onto the head node because it manages cluster-level orchestration. '
'To do that, set the CPU resource on the head node to 0 in your '
'compute config. Defining a job\u200b With the CLI, you can define '
'jobs in a YAML file and submit them by referencing the YAML: '
'anyscale job submit --config-file config.yaml For an example of '
'defining a job in a YAML, see the reference docs. Waiting on a '
'job\u200b You can block CLI and SDK commands until a job enters a '
'specified state. By default, JobState.SUCCEEDED is used. See all '
'available states in the reference docs. CLI Python SDK anyscale job '
'wait -n job-wait When you submit a job, you can specify --wait, '
'which waits for the job to succeed or exits if the job fails. '
'anyscale job submit -n job-wait --wait -- sleep 30 For more '
'information on submitting jobs with the CLI, see the reference '
'docs. import anyscale\n'
'from anyscale.job.models import JobConfig\n'
'\n'
'config = JobConfig(name="job-wait", entrypoint="sleep 30")\n'
'\n'
'anyscale.job.submit(config)\n'
'anyscale.job.wait(name="job-wait") For more information on '
'submitting jobs with the SDK, see the reference docs. Terminating a '
'job\u200b You can terminate a job from the Job page or using the '
"CLI/SDK: CLI Python SDK anyscale job terminate --id 'prodjob_...' "
'For more information on terminating jobs with the CLI, see the '
'reference docs. import anyscale'},
{'chunk_id': 'f1e981ec-d129-49a0-8df6-86fc1da99d7f',
'chunk_index': 10,
'distance': 0.1282671093940735,
'doc_id': 'e747a479-126c-42ed-9f20-f84038229e7b',
'page_number': 1,
'score': 0.8717328906059265,
'source': 'anyscale-rag-application/100-docs/Monitor_a_job.docx',
'text': 'to look back. Anyscale stores up to 30 days of logs for your job. '
"You're able to debug issues even after the job terminates. To "
'filter the logs, use the search bar to search for specific '
'keywords. Enter a request ID in the search bar to filter logs for a '
'specific request. You can also use contain a specific pattern. '
'Alerts to filter logs if your logs Anyscale jobs have a built-in '
'alert for when a job succeeds or fails. The creator of the job '
'receives an email notification when the job completes. To set up '
'additional alerts based on your own criteria, see Custom dashboards '
'and alerting guide. These alerts are useful for tracking the health '
'of your jobs or job queues. Ray Dashboard The Ray Dashboard is '
'scoped to a single Ray cluster. Each job attempt launches a new Ray '
'cluster unless Job queues are used. To access this dashboard, click '
'the "Ray Dashboard" tab in the job detail page. To learn more about '
'how to use the Ray Dashboard, see the Ray documentation. Exporting '
'logs and metrics If you want to push logs to Vector, a tool to ship '
'logs to Amazon CloudWatch, Google Cloud Monitoring, Datadog, or '
'other observability tools, see Exporting logs and metrics with '
'Vector. More info To learn more details about the Ray Dashboard, '
'see the Ray Dashboard documentation To learn more about Grafana and '
'how to use it, see the official Grafana documentation To learn more '
'about the metrics that Ray emits, see the System Metrics '
'documentation'}]
Rendered Prompt:
('Use the following pieces of context to answer the question at the end.\n'
"If you don't know the answer, just say that you don't know, don't try to "
'make up an answer.\n'
'Use three sentences maximum and keep the answer as concise as possible.\n'
'Always say "thanks for asking!" at the end of the answer.\n'
'\n'
"[{'chunk_index': 1, 'chunk_id': '55b8db89-0aa6-460c-ae3a-284241f5865c', "
"'doc_id': '7b170e3d-4081-4527-a737-1da022a364c4', 'page_number': 1, "
"'source': 'anyscale-rag-application/100-docs/Jobs.txt', 'text': "
'"2/12/25, 9:48 AM Jobs | Anyscale Docs Jobs Run discrete workloads in '
'production such as batch inference, bulk embeddings generation, or model '
'fine-tuning. Anyscale Jobs allow you to submit applications developed on '
'workspaces to a standalone Ray cluster for execution. Built for production '
'and designed to fit into your CI/CD pipeline, jobs ensure scalable and '
'reliable performance. How does it work? # When you’re ready to promote an '
'app to production, submit a job from the workspace using anyscale job submit '
'. Anyscale Jobs have the following features: Scalability: Rapid scaling to '
'thousands of cloud instances, adjusting computing resources to match '
'application demand. Fault tolerance: Retries for failures and automatic '
'rescheduling to an alternative cluster for unexpected failures like running '
'out of memory. Monitoring and observability: Persistent dashboards that '
'allow you to observe tasks in real time and email alerts upon successf ul '
'job completion. Get started 1. Sign in or sign up for an account. 2. Select '
'the Intro to Jobs example. 3. Select Launch. This example runs in a '
'Workspace. See Workspaces for background information. 4. Follow the notebook '
"or view it in the docs. 5. Terminate the Workspace when you're done. Ask AI "
'https://docs.anyscale.com/platform/jobs/ 1/2 2/12/25, 9:48 AM Jobs | '
'Anyscale Docs https://docs.anyscale.com/platform/jobs/ 2/2", \'distance\': '
"0.10053980350494385, 'score': 0.8994601964950562}, {'chunk_index': 2, "
"'chunk_id': 'fdd75966-04d9-43d2-8e3e-b85e8387d086', 'doc_id': "
"'82efb8cd-2181-47cb-9389-ff101cc68674', 'page_number': 2, 'source': "
"'anyscale-rag-application/100-docs/Create_and_manage_jobs.pdf', 'text': "
"'2/12/25, 9:48 AM Create and manage jobs | Anyscale Docs Defining a job With "
'the CLI, you can define jobs in a YAML file and submit them by referencing '
'the YAML: anyscale job submit --config-file config.yaml For an example of '
'defining a job in a YAML, see the reference docs. Waiting on a job You can '
'block CLI and SDK commands until a job enters a specified state. By default, '
'JobState.SUCCEEDED is used. See all available states in the reference docs. '
'CLI Python SDK anyscale job wait -n job-wait When you submit a job, you can '
'specify --wait , which waits for the job to succeed or exits if the job '
'fails. anyscale job submit -n job-wait --wait -- sleep 30 For more '
'information on submitting jobs with the CLI, see the reference docs. '
'Terminating a job You can terminate a job from the Job page or using the '
'CLI/SDK: CLI Python SDK https://docs.anyscale.com/platform/jobs/manage-jobs '
"2/5', 'distance': 0.11302393674850464, 'score': 0.8869760632514954}, "
"{'chunk_index': 3, 'chunk_id': '23bf61db-155a-4a1a-a74f-669409c92303', "
"'doc_id': '91ddd49b-9e23-4e58-9143-bbac61ee2157', 'page_number': 1, "
"'source': 'anyscale-rag-application/100-docs/Job_schedules.html', 'text': "
'\'anyscale.job.terminate(name="my-job") For more information on terminating '
'jobs with the SDK, see the reference docs. Archiving a job\\u200b Archiving '
'jobs hide them from the job list page, but you can still access them through '
'the CLI and SDK. The cluster associated with an archived job is archived '
'automatically. To be archived, jobs must be in a terminal state. You must '
'have created the job or be an organization admin to archive the job. You can '
'archive jobs in Anyscale console or through the CLI/SDK: CLI Python SDK '
"anyscale job archive --id \\'prodjob_...\\' For more information on "
'archiving jobs with the CLI, see the reference docs. import '
'anyscale\\n\\nanyscale.job.archive(name="my-job") For more information on '
'archiving jobs with the SDK, see the reference docs. Managing '
'dependencies\\u200b When developing Anyscale jobs, you may need to include '
'additional Python packages or system-level dependencies. There are several '
'ways to manage these dependencies: Using a requirements.txt file\\u200b The '
'simplest way to manage Python package dependencies is by using a '
'requirements.txt file. Create a requirements.txt file in your project '
'directory: emoji==2.12.1\\nnumpy==1.21.0 When submitting your job, include '
'the -r or --requirements flag: CLI Python SDK anyscale job submit '
'--config-file job.yaml -r ./requirements.txt import anyscale\\nfrom '
'anyscale.job.models import JobConfig\\n\\nconfig = JobConfig(\\n '
'name="my-job",\\n entrypoint="python main.py",\\n '
'working_dir=".",\\n '
'requirements="./requirements.txt"\\n)\\n\\nanyscale.job.submit(config) This '
'method works well for straightforward Python package dependencies. Anyscale '
"installs these packages in the job\\'s environment before running your code. "
'Using a custom container\\u200b For more complex dependency management, '
'including system-level packages or specific environment configurations, use '
'a custom container: Create a Dockerfile: FROM '
'anyscale/ray:2.10.0-py310\\n\\n# Install system dependencies if needed\\nRUN '
"apt-get update && apt-get install -y <your-system-packages>', 'distance': "
"0.11444741487503052, 'score': 0.8855525851249695}, {'chunk_index': 4, "
"'chunk_id': '13683649-bcdc-444d-866f-36cafcb132c9', 'doc_id': "
"'91ddd49b-9e23-4e58-9143-bbac61ee2157', 'page_number': 1, 'source': "
"'anyscale-rag-application/100-docs/Job_schedules.html', 'text': 'Create and "
'manage jobs Submitting a job\\u200b To submit your job to Anyscale, use the '
'Python SDK or CLI and pass in any additional options or configurations for '
'the job. By default, Anyscale uses your workspace or cloud to provision a '
'cluster to run your job. You can define a custom cluster through a compute '
'config or specify an existing cluster. Once submitted, Anyscale runs the job '
'as specified in the entrypoint command, which is typically a Ray Job. If the '
"run doesn\\'t succeed, the job restarts using the same entrypoint up to the "
'number of max_retries. CLI Python SDK anyscale job submit --name=my-job '
'\\\\\\n --working-dir=. --max-retries=5 \\\\\\n '
'--image-uri="anyscale/image/IMAGE_NAME:VERSION" \\\\\\n '
'--compute-config=COMPUTE_CONFIG_NAME \\\\\\n -- python main.py With the '
'CLI, you can either specify an existing compute config with '
'--compute-config=COMPUTE_CONFIG_NAME or define a new one in a job YAML. For '
'more information on submitting jobs with the CLI, see the reference docs. '
'import anyscale\\nfrom anyscale.job.models import JobConfig\\n\\nconfig = '
'JobConfig(\\n name="my-job",\\n entrypoint="python main.py",\\n '
'working_dir=".",\\n max_retries=5,\\n '
'image_uri="anyscale/image/IMAGE_NAME:VERSION",\\n '
'compute_config="COMPUTE_CONFIG_NAME"\\n)\', \'distance\': '
"0.11574643850326538, 'score': 0.8842535614967346}, {'chunk_index': 5, "
"'chunk_id': '3167ba0b-2aff-4f69-8dbe-7a8cdc725005', 'doc_id': "
"'7a7730fa-96a1-4775-896a-11deac94d668', 'page_number': 1, 'source': "
"'anyscale-rag-application/100-docs/Job_queues.pptx', 'text': '2/12/25, 9:48 "
'AM\\tJob queues | Anyscale Docs Job queues A job queue enables sophisticated '
'scheduling and execution algorithms for Anyscale Jobs. This feature improves '
'resource utilization and reduces provisioning times by enabling multiple '
'jobs to share a single cluster. Anyscale supports flexible scheduling '
'algorithms, including FIFO (first-in, first-out), LIFO (last-in, first-out), '
'and priority-based scheduling. Job processing Anyscale job queues optimize '
'resource utilization and throughput by using sophisticated scheduling to run '
'multiple jobs on the same cluster. Submission: The typical Anyscale Job '
'submission workflow adds the job to the specified queue. Scheduling: Based '
'on the scheduling policy, Anyscale determines ordering of the jobs in the '
'queue and picks jobs at the top of the queue for scheduling. Anyscale '
'schedules no more than the specified max-concurrency jobs for running on a '
'cluster at the same time. Execution: Jobs run until completion, including '
'retries up to the specified number of max_retries . Anyscale provisions a '
'cluster when you submit the first job in a queue, and continues running '
'until there are no more jobs in the queue and it idles. Create a job queue '
'Creating a job queue is similar to creating a standalone Anyscale Job. In '
'your job.yaml file, specify additional job queue configurations: '
'CLI\\tPython SDK Ask AI https://docs.anyscale.com/platform/jobs/job-queues '
"1/5', 'distance': 0.11749774217605591, 'score': 0.8825022578239441}, "
"{'chunk_index': 6, 'chunk_id': '8c158e24-22e2-4984-a402-2331cf0896fc', "
"'doc_id': '82efb8cd-2181-47cb-9389-ff101cc68674', 'page_number': 1, "
"'source': 'anyscale-rag-application/100-docs/Create_and_manage_jobs.pdf', "
"'text': '2/12/25, 9:48 AM Create and manage jobs | Anyscale Docs Create and "
'manage jobs Submitting a job To submit your job to Anyscale, use the Python '
'SDK or CLI and pass in any additional options or configurations for the job. '
'By default, Anyscale uses your workspace or cloud to provision a cluster to '
'run your job. You can define a custom cluster through a compute config or '
'specify an existing cluster. Once submitted, Anyscale runs the job as '
'specified in the entrypoint command, which is typically a Ray Job. If the '
"run doesn\\'t succeed, the job restarts using the same entrypoint up to the "
'number of max_retries . CLI Python SDK anyscale job submit --name=my-job '
'\\\\ --working-dir=. --max-retries=5 \\\\ '
'--image-uri="anyscale/image/IMAGE_NAME:VERSION" \\\\ '
'--compute-config=COMPUTE_CONFIG_NAME \\\\ -- python main.py With the CLI, '
'you can either specify an existing compute config with --compute- '
'config=COMPUTE_CONFIG_NAME or define a new one in a job YAML. For more '
'information on submitting jobs with the CLI, see the reference docs. TIP For '
'large-scale, compute-intensive jobs, avoid scheduling Ray tasks onto the '
'head node because it manages cluster-level orchestration. To do that, set '
'the CPU resource on the head node to 0 in your compute config. Ask AI '
"https://docs.anyscale.com/platform/jobs/manage-jobs 1/5', 'distance': "
"0.12043756246566772, 'score': 0.8795624375343323}, {'chunk_index': 7, "
"'chunk_id': '2d0623d0-911f-471a-9649-a61c0b2db411', 'doc_id': "
"'82efb8cd-2181-47cb-9389-ff101cc68674', 'page_number': 5, 'source': "
"'anyscale-rag-application/100-docs/Create_and_manage_jobs.pdf', 'text': "
"'2/12/25, 9:48 AM Create and manage jobs | Anyscale Docs Using pre-built "
'custom images For frequently used environments, you can build and reuse '
'custom images: 1. Build the image: CLI Python SDK anyscale image build -n '
'my-custom-image --containerfile Dockerfile 2. Use the built image in your '
'job submission: CLI Python SDK anyscale job submit --config-file job.yaml '
'--image-uri anyscale/image/my- custom-image:1 This approach is efficient for '
'teams working on multiple jobs that share the same dependencies. '
"https://docs.anyscale.com/platform/jobs/manage-jobs 5/5', 'distance': "
"0.12314975261688232, 'score': 0.8768502473831177}, {'chunk_index': 8, "
"'chunk_id': '96285d69-2f27-4fb6-81be-31a4ededffba', 'doc_id': "
"'82efb8cd-2181-47cb-9389-ff101cc68674', 'page_number': 3, 'source': "
"'anyscale-rag-application/100-docs/Create_and_manage_jobs.pdf', 'text': "
'"2/12/25, 9:48 AM Create and manage jobs | Anyscale Docs anyscale job '
"terminate --id 'prodjob_...' For more information on terminating jobs with "
'the CLI, see the reference docs. Archiving a job Archiving jobs hide them '
'from the job list page, but you can still access them through the CLI and '
'SDK. The cluster associated with an archived job is archived automatically. '
'To be archived, jobs must be in a terminal state. You must have created the '
'job or be an organization admin to archive the job. You can archive jobs in '
'Anyscale console or through the CLI/SDK: CLI Python SDK anyscale job archive '
"--id 'prodjob_...' For more information on archiving jobs with the CLI, see "
'the reference docs. Managing dependencies When developing Anyscale jobs, you '
'may need to include additional Python packages or system- level '
'dependencies. There are several ways to manage these dependencies: Using a '
'requirements.txt file The simplest way to manage Python package dependencies '
'is by using a requirements.txt file. 1. Create a requirements.txt file in '
'your project directory: emoji==2.12.1 numpy==1.21.0 '
'https://docs.anyscale.com/platform/jobs/manage-jobs 3/5", \'distance\': '
"0.1248595118522644, 'score': 0.8751404881477356}, {'chunk_index': 9, "
"'chunk_id': 'd7c57ead-0228-4c49-bb77-37b67aed463d', 'doc_id': "
"'91ddd49b-9e23-4e58-9143-bbac61ee2157', 'page_number': 1, 'source': "
"'anyscale-rag-application/100-docs/Job_schedules.html', 'text': 'config = "
'JobConfig(\\n name="my-job",\\n entrypoint="python main.py",\\n '
'working_dir=".",\\n max_retries=5,\\n '
'image_uri="anyscale/image/IMAGE_NAME:VERSION",\\n '
'compute_config="COMPUTE_CONFIG_NAME"\\n)\\n\\nanyscale.job.submit(config) '
'With the SDK, you can either specify an existing compute config or define a '
'new one using the compute config API. For more information on submitting '
'jobs with the SDK, see the reference docs. For a complete list of supported '
'options defining a JobConfig, see the reference docs for JobConfig. tip For '
'large-scale, compute-intensive jobs, avoid scheduling Ray tasks onto the '
'head node because it manages cluster-level orchestration. To do that, set '
'the CPU resource on the head node to 0 in your compute config. Defining a '
'job\\u200b With the CLI, you can define jobs in a YAML file and submit them '
'by referencing the YAML: anyscale job submit --config-file config.yaml For '
'an example of defining a job in a YAML, see the reference docs. Waiting on a '
'job\\u200b You can block CLI and SDK commands until a job enters a specified '
'state. By default, JobState.SUCCEEDED is used. See all available states in '
'the reference docs. CLI Python SDK anyscale job wait -n job-wait When you '
'submit a job, you can specify --wait, which waits for the job to succeed or '
'exits if the job fails. anyscale job submit -n job-wait --wait -- sleep 30 '
'For more information on submitting jobs with the CLI, see the reference '
'docs. import anyscale\\nfrom anyscale.job.models import '
'JobConfig\\n\\nconfig = JobConfig(name="job-wait", entrypoint="sleep '
'30")\\n\\nanyscale.job.submit(config)\\nanyscale.job.wait(name="job-wait") '
'For more information on submitting jobs with the SDK, see the reference '
'docs. Terminating a job\\u200b You can terminate a job from the Job page or '
'using the CLI/SDK: CLI Python SDK anyscale job terminate --id '
"\\'prodjob_...\\' For more information on terminating jobs with the CLI, see "
"the reference docs. import anyscale', 'distance': 0.12725389003753662, "
"'score': 0.8727461099624634}, {'chunk_index': 10, 'chunk_id': "
"'f1e981ec-d129-49a0-8df6-86fc1da99d7f', 'doc_id': "
"'e747a479-126c-42ed-9f20-f84038229e7b', 'page_number': 1, 'source': "
"'anyscale-rag-application/100-docs/Monitor_a_job.docx', 'text': 'to look "
"back. Anyscale stores up to 30 days of logs for your job. You\\'re able to "
'debug issues even after the job terminates. To filter the logs, use the '
'search bar to search for specific keywords. Enter a request ID in the search '
'bar to filter logs for a specific request. You can also use contain a '
'specific pattern. Alerts to filter logs if your logs Anyscale jobs have a '
'built-in alert for when a job succeeds or fails. The creator of the job '
'receives an email notification when the job completes. To set up additional '
'alerts based on your own criteria, see Custom dashboards and alerting guide. '
'These alerts are useful for tracking the health of your jobs or job queues. '
'Ray Dashboard The Ray Dashboard is scoped to a single Ray cluster. Each job '
'attempt launches a new Ray cluster unless Job queues are used. To access '
'this dashboard, click the "Ray Dashboard" tab in the job detail page. To '
'learn more about how to use the Ray Dashboard, see the Ray documentation. '
'Exporting logs and metrics If you want to push logs to Vector, a tool to '
'ship logs to Amazon CloudWatch, Google Cloud Monitoring, Datadog, or other '
'observability tools, see Exporting logs and metrics with Vector. More info '
'To learn more details about the Ray Dashboard, see the Ray Dashboard '
'documentation To learn more about Grafana and how to use it, see the '
'official Grafana documentation To learn more about the metrics that Ray '
"emits, see the System Metrics documentation', 'distance': "
"0.1282671093940735, 'score': 0.8717328906059265}]\n"
'\n'
'Question: what is a anyscale job\n'
'\n'
'Helpful Answer:')
Get the Final Response#
Stream the response from the LLM using the generated prompt.
for token in client.get_response_streaming(prompt, temperature=0.5):
print(token, end="")
Anyscale Jobs are used to run discrete workloads in production, such as batch inference or model fine-tuning, on a standalone Ray cluster for scalable and reliable performance. Thanks for asking!
Observations and Next Steps#
As you can see, the LLM now understands that Anysacle jobs refers to the platform rather than job openings at Anysacale. This demonstrates the power of using RAG. However, the response from this basic RAG implementation is not optimal: it’s too short and lacks citations, leaving us unclear about the sources of the information. In future tutorials, we will assess these issues and propose several methods to address them through prompt engineering.