AI-3016 Learning Portal

Concept — What & Why

RAG (Retrieval-Augmented Generation)An architecture that keeps LLM weights frozen and instead retrieves relevant passages from your data at inference time, injecting them as context into the prompt to ground the model's response. is an architecture that grounds an LLM's responses in your own data rather than relying solely on knowledge learned during training. Instead of fine-tuning the model, RAG keeps the model weights frozen and supplies relevant passages at inference time through a structured retrieval pipeline.

The RAG pipeline has two distinct phases:

Indexing phase (run once, or incrementally updated):

Load source documents from a storage location (Azure Blob Storage, local upload, etc.)
Chunk the documents into smaller text segments that fit within embedding model token limits
Call an embedding model (e.g., text-embedding-ada-002 via Azure OpenAI) on each chunk to produce an EmbeddingA fixed-length numerical vector that encodes the semantic meaning of a text chunk. Two semantically similar texts produce vectors that are close together in high-dimensional space, measured by cosine similarity.
Store chunks, vectors, and metadata in a vector store (an Azure AI Search index)

Querying phase (at runtime for every user turn):

Embed the user's query using the same embedding model
Run a similarity search against the vector index to find the top-k most relevant chunks
Augment the prompt: prepend the retrieved chunks as grounding context
Send the augmented prompt to the chat completion model to generate an answer

Source documents are registered as a Data Component (Data Asset)A versioned, reusable reference to a storage location within an Azure AI Foundry project. Types: uri_file (single file), uri_folder (folder), uri_table (tabular/Parquet). Versions are immutable once created. before indexing. For retrieval, Hybrid SearchA retrieval strategy that runs keyword (BM25) and vector (cosine similarity) searches in parallel and merges results using Reciprocal Rank Fusion (RRF). Consistently produces the best RAG retrieval quality in benchmarks. consistently produces the best retrieval quality by combining keyword and vector search in parallel.

Deep Dive — How It Works

Data Asset Types

Azure AI Foundry supports three data asset types:

Type	Description	Use for RAG
`uri_file`	Points to a single file (PDF, DOCX, CSV, etc.)	Small corpora with a single-file source
`uri_folder`	Points to a folder — all files within are accessible	Multi-document corpora (most common for RAG)
`uri_table`	Points to tabular data (Parquet, CSV)	Structured Q&A datasets

Data assets are immutable once created. A given name + version combination is permanent. To correct a mistake, archive the incorrect version and create a new version with the correct path.

Chunking Strategies

Strategy	Description	Best for
Fixed-size	Split at a fixed token or character count (e.g., 512 tokens) with overlap (e.g., 10–25%)	General-purpose; most commonly used
Variable-size (sentence/paragraph)	Split on sentence boundaries or paragraph markers	Prose documents; preserves natural reading units
Semantic / document-layout	Break on heading structure or AI-detected section boundaries	Structured documents such as policies and manuals
Document parsing	Parse source formats (PDF → pages, JSON → records)	Already-structured sources

Microsoft's recommended starting point for the Azure AI Search Text Split skill: 512 tokens with 128 tokens of overlap (25%), balancing context preservation against precision.

Azure AI Search Retrieval Modes

Search Type	Mechanism	Score metric	Best for
Keyword (BM25)	Inverted index on text tokens	BM25 relevance score	Exact terms, product codes, proper names
Vector	Nearest-neighbor search on embeddings	Cosine similarity	Semantically similar queries, paraphrase
Semantic	Re-ranks results with a language-model comprehension layer	Semantic re-ranker score	Boosting precision of top results
Hybrid	Keyword + vector in parallel; merges via RRF	RRF score	Best overall accuracy — recommended for production

Hybrid search with semantic re-ranking consistently produces the best retrieval quality.

Vector Index Structure

Each chunk stored in the Azure AI Search index includes:

page_content — the raw text of the chunk
content_vector_open_ai — the embedding vector
Metadata fields — source filename, page number, URL, character and token statistics

The same embedding model must be used at both indexing time and query time. Using different models produces vectors in incompatible spaces, making similarity scores meaningless.

Hands-On Lab

Create a Data Component from Azure Storage

Sign in to Azure AI Foundry portal and open your hub-based project.
In the left sidebar, expand My assets and select Data + indexes.
Select New data to open the data creation wizard.
For Data source, select Get data with storage URL (or Upload files/folders to push local files directly to workspaceblobstore).
Choose the Type: select Folder for a multi-document corpus or File for a single document. Provide the Azure Blob Storage URL.
Select Next, enter a friendly Name for the data asset, and select Create.
The new data asset appears in Data + indexes with version 1 and status Ready.

Build a Vector Index from the Chat Playground

From the left sidebar select Playgrounds, then Try the Chat playground.
Ensure a chat completion model deployment is selected (e.g., gpt-4o). If none exists, deploy one via Create new deployment.
Scroll to the bottom of the Setup panel. Select + Add a new data source.
On the Source data tab, choose your data asset (or upload files directly). Select Next.
On the Index configuration tab, select your Azure AI Search resource from the dropdown. Select Next.
Select the Azure OpenAI connection that has an embedding model deployed (e.g., text-embedding-ada-002). Select Next.
Review the configuration summary and select Create vector index.
Wait for status Ready — this confirms chunking, embedding, and Azure AI Search index population are complete.

Exam Angle — What AI-3016 Tests

AI-3016 Assessment Focus

RAG questions test whether you understand the two-phase pipeline, the requirement to use the same embedding model at both phases, chunking parameters, and the superiority of hybrid search over keyword or vector alone.

Exam Trap

"You must fine-tune the model to use your own data." RAG does not modify model weights at all. Your data is retrieved at inference time and injected into the prompt. Fine-tuning changes internal model parameters; RAG does not touch them.

Exam Trap

"Any embedding model can be used to query an index built with a different embedding model." You must use the same embedding model at both index-build time and query time. Mixing models produces dimension mismatches or nonsensical similarity scores.

Exam Trap

"Larger chunks always produce better RAG results." Oversized chunks can exceed embedding model token limits and reduce retrieval precision. The recommended starting point is 512 tokens with 25% overlap.

Exam Trap

"Data assets in Azure AI Foundry can be deleted to correct mistakes." Data asset versions are immutable. Archive the incorrect version and create a new version with the correct path.

Exam Trap

"An Azure AI Search resource is optional if you already have Azure Blob Storage." Azure Blob Storage holds raw documents but does not support vector similarity search. Azure AI Search is required to index, embed, and retrieve chunks.

Exam Tip

Keyword search misses paraphrased queries; vector search misses exact-match terms. Hybrid search combines both and consistently outperforms either alone — always the recommended production answer.

Must Memorize

Recommended chunk size: 512 tokens with 128 tokens (25%) overlap. Same embedding model at index time AND query time — no exceptions.

Question — click to flip

Q: Which Azure service stores vector embeddings in an Azure AI Foundry RAG solution?

Question — click to flip

Q: A developer built a vector index with text-embedding-ada-002 and switches to text-embedding-3-large at query time. What is the most likely outcome?

Question — click to flip

Q: What is the Microsoft-recommended starting chunk size and overlap for the Azure AI Search Text Split skill?

Question — click to flip

Q: Which search configuration in Azure AI Search consistently delivers the best RAG retrieval quality?

Question — click to flip

Q: A developer created data asset 'policy-docs' version 1.0 with an incorrect storage path. What is the correct remediation?

Question — click to flip

Q: What are the two phases of the RAG pipeline?

4.1 — Create Data Components and Vector Indexes

Data Asset Types

Chunking Strategies

Azure AI Search Retrieval Modes

Vector Index Structure

Create a Data Component from Azure Storage

Build a Vector Index from the Chat Playground

AI-3016 Assessment Focus