AI-3016 Learning Portal
Objective 4.1 30 minhigh priorityragvector-indexchunkingembeddingsazure-ai-searchdata-asset

4.1 — Create Data Components and Vector Indexes

Create versioned data assets in Azure AI Foundry, understand RAG indexing and querying phases, configure chunking strategy, and build a vector index using Azure AI Search.

Concept — What & Why

RAG (Retrieval-Augmented Generation)An architecture that keeps LLM weights frozen and instead retrieves relevant passages from your data at inference time, injecting them as context into the prompt to ground the model's response. is an architecture that grounds an LLM's responses in your own data rather than relying solely on knowledge learned during training. Instead of fine-tuning the model, RAG keeps the model weights frozen and supplies relevant passages at inference time through a structured retrieval pipeline.

The RAG pipeline has two distinct phases:

Indexing phase (run once, or incrementally updated):

  1. Load source documents from a storage location (Azure Blob Storage, local upload, etc.)
  2. Chunk the documents into smaller text segments that fit within embedding model token limits
  3. Call an embedding model (e.g., text-embedding-ada-002 via Azure OpenAI) on each chunk to produce an EmbeddingA fixed-length numerical vector that encodes the semantic meaning of a text chunk. Two semantically similar texts produce vectors that are close together in high-dimensional space, measured by cosine similarity.
  4. Store chunks, vectors, and metadata in a vector store (an Azure AI Search index)

Querying phase (at runtime for every user turn):

  1. Embed the user's query using the same embedding model
  2. Run a similarity search against the vector index to find the top-k most relevant chunks
  3. Augment the prompt: prepend the retrieved chunks as grounding context
  4. Send the augmented prompt to the chat completion model to generate an answer

Source documents are registered as a Data Component (Data Asset)A versioned, reusable reference to a storage location within an Azure AI Foundry project. Types: uri_file (single file), uri_folder (folder), uri_table (tabular/Parquet). Versions are immutable once created. before indexing. For retrieval, Hybrid SearchA retrieval strategy that runs keyword (BM25) and vector (cosine similarity) searches in parallel and merges results using Reciprocal Rank Fusion (RRF). Consistently produces the best RAG retrieval quality in benchmarks. consistently produces the best retrieval quality by combining keyword and vector search in parallel.

Deep Dive — How It Works

Data Asset Types

Azure AI Foundry supports three data asset types:

TypeDescriptionUse for RAG
uri_filePoints to a single file (PDF, DOCX, CSV, etc.)Small corpora with a single-file source
uri_folderPoints to a folder — all files within are accessibleMulti-document corpora (most common for RAG)
uri_tablePoints to tabular data (Parquet, CSV)Structured Q&A datasets

Data assets are immutable once created. A given name + version combination is permanent. To correct a mistake, archive the incorrect version and create a new version with the correct path.

Chunking Strategies

StrategyDescriptionBest for
Fixed-sizeSplit at a fixed token or character count (e.g., 512 tokens) with overlap (e.g., 10–25%)General-purpose; most commonly used
Variable-size (sentence/paragraph)Split on sentence boundaries or paragraph markersProse documents; preserves natural reading units
Semantic / document-layoutBreak on heading structure or AI-detected section boundariesStructured documents such as policies and manuals
Document parsingParse source formats (PDF → pages, JSON → records)Already-structured sources

Microsoft's recommended starting point for the Azure AI Search Text Split skill: 512 tokens with 128 tokens of overlap (25%), balancing context preservation against precision.

Azure AI Search Retrieval Modes

Search TypeMechanismScore metricBest for
Keyword (BM25)Inverted index on text tokensBM25 relevance scoreExact terms, product codes, proper names
VectorNearest-neighbor search on embeddingsCosine similaritySemantically similar queries, paraphrase
SemanticRe-ranks results with a language-model comprehension layerSemantic re-ranker scoreBoosting precision of top results
HybridKeyword + vector in parallel; merges via RRFRRF scoreBest overall accuracy — recommended for production

Hybrid search with semantic re-ranking consistently produces the best retrieval quality.

Vector Index Structure

Each chunk stored in the Azure AI Search index includes:

  • page_content — the raw text of the chunk
  • content_vector_open_ai — the embedding vector
  • Metadata fields — source filename, page number, URL, character and token statistics

The same embedding model must be used at both indexing time and query time. Using different models produces vectors in incompatible spaces, making similarity scores meaningless.

Hands-On Lab

Create a Data Component from Azure Storage

  1. Sign in to Azure AI Foundry portal and open your hub-based project.
  2. In the left sidebar, expand My assets and select Data + indexes.
  3. Select New data to open the data creation wizard.
  4. For Data source, select Get data with storage URL (or Upload files/folders to push local files directly to workspaceblobstore).
  5. Choose the Type: select Folder for a multi-document corpus or File for a single document. Provide the Azure Blob Storage URL.
  6. Select Next, enter a friendly Name for the data asset, and select Create.
  7. The new data asset appears in Data + indexes with version 1 and status Ready.

Build a Vector Index from the Chat Playground

  1. From the left sidebar select Playgrounds, then Try the Chat playground.
  2. Ensure a chat completion model deployment is selected (e.g., gpt-4o). If none exists, deploy one via Create new deployment.
  3. Scroll to the bottom of the Setup panel. Select + Add a new data source.
  4. On the Source data tab, choose your data asset (or upload files directly). Select Next.
  5. On the Index configuration tab, select your Azure AI Search resource from the dropdown. Select Next.
  6. Select the Azure OpenAI connection that has an embedding model deployed (e.g., text-embedding-ada-002). Select Next.
  7. Review the configuration summary and select Create vector index.
  8. Wait for status Ready — this confirms chunking, embedding, and Azure AI Search index population are complete.
Exam Angle — What AI-3016 Tests

AI-3016 Assessment Focus

RAG questions test whether you understand the two-phase pipeline, the requirement to use the same embedding model at both phases, chunking parameters, and the superiority of hybrid search over keyword or vector alone.

Exam Trap

"You must fine-tune the model to use your own data." RAG does not modify model weights at all. Your data is retrieved at inference time and injected into the prompt. Fine-tuning changes internal model parameters; RAG does not touch them.

Exam Trap

"Any embedding model can be used to query an index built with a different embedding model." You must use the same embedding model at both index-build time and query time. Mixing models produces dimension mismatches or nonsensical similarity scores.

Exam Trap

"Larger chunks always produce better RAG results." Oversized chunks can exceed embedding model token limits and reduce retrieval precision. The recommended starting point is 512 tokens with 25% overlap.

Exam Trap

"Data assets in Azure AI Foundry can be deleted to correct mistakes." Data asset versions are immutable. Archive the incorrect version and create a new version with the correct path.

Exam Trap

"An Azure AI Search resource is optional if you already have Azure Blob Storage." Azure Blob Storage holds raw documents but does not support vector similarity search. Azure AI Search is required to index, embed, and retrieve chunks.

Exam Tip

Keyword search misses paraphrased queries; vector search misses exact-match terms. Hybrid search combines both and consistently outperforms either alone — always the recommended production answer.

Must Memorize

Recommended chunk size: 512 tokens with 128 tokens (25%) overlap. Same embedding model at index time AND query time — no exceptions.

Question — click to flip

Q: Which Azure service stores vector embeddings in an Azure AI Foundry RAG solution?

Question — click to flip

Q: A developer built a vector index with text-embedding-ada-002 and switches to text-embedding-3-large at query time. What is the most likely outcome?

Question — click to flip

Q: What is the Microsoft-recommended starting chunk size and overlap for the Azure AI Search Text Split skill?

Question — click to flip

Q: Which search configuration in Azure AI Search consistently delivers the best RAG retrieval quality?

Question — click to flip

Q: A developer created data asset 'policy-docs' version 1.0 with an incorrect storage path. What is the correct remediation?

Question — click to flip

Q: What are the two phases of the RAG pipeline?

Sources & Further Reading