RAG (Retrieval-Augmented Generation)An architecture that keeps LLM weights frozen and instead retrieves relevant passages from your data at inference time, injecting them as context into the prompt to ground the model's response. is an architecture that grounds an LLM's responses in your own data rather than relying solely on knowledge learned during training. Instead of fine-tuning the model, RAG keeps the model weights frozen and supplies relevant passages at inference time through a structured retrieval pipeline.
The RAG pipeline has two distinct phases:
Indexing phase (run once, or incrementally updated):
- Load source documents from a storage location (Azure Blob Storage, local upload, etc.)
- Chunk the documents into smaller text segments that fit within embedding model token limits
- Call an embedding model (e.g.,
text-embedding-ada-002via Azure OpenAI) on each chunk to produce an EmbeddingA fixed-length numerical vector that encodes the semantic meaning of a text chunk. Two semantically similar texts produce vectors that are close together in high-dimensional space, measured by cosine similarity. - Store chunks, vectors, and metadata in a vector store (an Azure AI Search index)
Querying phase (at runtime for every user turn):
- Embed the user's query using the same embedding model
- Run a similarity search against the vector index to find the top-k most relevant chunks
- Augment the prompt: prepend the retrieved chunks as grounding context
- Send the augmented prompt to the chat completion model to generate an answer
Source documents are registered as a Data Component (Data Asset)A versioned, reusable reference to a storage location within an Azure AI Foundry project. Types: uri_file (single file), uri_folder (folder), uri_table (tabular/Parquet). Versions are immutable once created. before indexing. For retrieval, Hybrid SearchA retrieval strategy that runs keyword (BM25) and vector (cosine similarity) searches in parallel and merges results using Reciprocal Rank Fusion (RRF). Consistently produces the best RAG retrieval quality in benchmarks. consistently produces the best retrieval quality by combining keyword and vector search in parallel.
Data Asset Types
Azure AI Foundry supports three data asset types:
| Type | Description | Use for RAG |
|---|---|---|
uri_file | Points to a single file (PDF, DOCX, CSV, etc.) | Small corpora with a single-file source |
uri_folder | Points to a folder — all files within are accessible | Multi-document corpora (most common for RAG) |
uri_table | Points to tabular data (Parquet, CSV) | Structured Q&A datasets |
Data assets are immutable once created. A given name + version combination is permanent. To correct a mistake, archive the incorrect version and create a new version with the correct path.
Chunking Strategies
| Strategy | Description | Best for |
|---|---|---|
| Fixed-size | Split at a fixed token or character count (e.g., 512 tokens) with overlap (e.g., 10–25%) | General-purpose; most commonly used |
| Variable-size (sentence/paragraph) | Split on sentence boundaries or paragraph markers | Prose documents; preserves natural reading units |
| Semantic / document-layout | Break on heading structure or AI-detected section boundaries | Structured documents such as policies and manuals |
| Document parsing | Parse source formats (PDF → pages, JSON → records) | Already-structured sources |
Microsoft's recommended starting point for the Azure AI Search Text Split skill: 512 tokens with 128 tokens of overlap (25%), balancing context preservation against precision.
Azure AI Search Retrieval Modes
| Search Type | Mechanism | Score metric | Best for |
|---|---|---|---|
| Keyword (BM25) | Inverted index on text tokens | BM25 relevance score | Exact terms, product codes, proper names |
| Vector | Nearest-neighbor search on embeddings | Cosine similarity | Semantically similar queries, paraphrase |
| Semantic | Re-ranks results with a language-model comprehension layer | Semantic re-ranker score | Boosting precision of top results |
| Hybrid | Keyword + vector in parallel; merges via RRF | RRF score | Best overall accuracy — recommended for production |
Hybrid search with semantic re-ranking consistently produces the best retrieval quality.
Vector Index Structure
Each chunk stored in the Azure AI Search index includes:
page_content— the raw text of the chunkcontent_vector_open_ai— the embedding vector- Metadata fields — source filename, page number, URL, character and token statistics
The same embedding model must be used at both indexing time and query time. Using different models produces vectors in incompatible spaces, making similarity scores meaningless.
Create a Data Component from Azure Storage
- Sign in to Azure AI Foundry portal and open your hub-based project.
- In the left sidebar, expand My assets and select Data + indexes.
- Select New data to open the data creation wizard.
- For Data source, select Get data with storage URL (or Upload files/folders to push local files directly to
workspaceblobstore). - Choose the Type: select Folder for a multi-document corpus or File for a single document. Provide the Azure Blob Storage URL.
- Select Next, enter a friendly Name for the data asset, and select Create.
- The new data asset appears in Data + indexes with version
1and status Ready.
Build a Vector Index from the Chat Playground
- From the left sidebar select Playgrounds, then Try the Chat playground.
- Ensure a chat completion model deployment is selected (e.g.,
gpt-4o). If none exists, deploy one via Create new deployment. - Scroll to the bottom of the Setup panel. Select + Add a new data source.
- On the Source data tab, choose your data asset (or upload files directly). Select Next.
- On the Index configuration tab, select your Azure AI Search resource from the dropdown. Select Next.
- Select the Azure OpenAI connection that has an embedding model deployed (e.g.,
text-embedding-ada-002). Select Next. - Review the configuration summary and select Create vector index.
- Wait for status Ready — this confirms chunking, embedding, and Azure AI Search index population are complete.
AI-3016 Assessment Focus
RAG questions test whether you understand the two-phase pipeline, the requirement to use the same embedding model at both phases, chunking parameters, and the superiority of hybrid search over keyword or vector alone.
Exam Trap
"You must fine-tune the model to use your own data." RAG does not modify model weights at all. Your data is retrieved at inference time and injected into the prompt. Fine-tuning changes internal model parameters; RAG does not touch them.
Exam Trap
"Any embedding model can be used to query an index built with a different embedding model." You must use the same embedding model at both index-build time and query time. Mixing models produces dimension mismatches or nonsensical similarity scores.
Exam Trap
"Larger chunks always produce better RAG results." Oversized chunks can exceed embedding model token limits and reduce retrieval precision. The recommended starting point is 512 tokens with 25% overlap.
Exam Trap
"Data assets in Azure AI Foundry can be deleted to correct mistakes." Data asset versions are immutable. Archive the incorrect version and create a new version with the correct path.
Exam Trap
"An Azure AI Search resource is optional if you already have Azure Blob Storage." Azure Blob Storage holds raw documents but does not support vector similarity search. Azure AI Search is required to index, embed, and retrieve chunks.
Exam Tip
Keyword search misses paraphrased queries; vector search misses exact-match terms. Hybrid search combines both and consistently outperforms either alone — always the recommended production answer.
Must Memorize
Recommended chunk size: 512 tokens with 128 tokens (25%) overlap. Same embedding model at index time AND query time — no exceptions.
Question — click to flip
Q: Which Azure service stores vector embeddings in an Azure AI Foundry RAG solution?
Question — click to flip
Q: A developer built a vector index with text-embedding-ada-002 and switches to text-embedding-3-large at query time. What is the most likely outcome?
Question — click to flip
Q: What is the Microsoft-recommended starting chunk size and overlap for the Azure AI Search Text Split skill?
Question — click to flip
Q: Which search configuration in Azure AI Search consistently delivers the best RAG retrieval quality?
Question — click to flip
Q: A developer created data asset 'policy-docs' version 1.0 with an incorrect storage path. What is the correct remediation?
Question — click to flip
Q: What are the two phases of the RAG pipeline?