AI-3016 Learning Portal

Concept — What & Why

Building the vector index (covered in 4.1) is only the first half of a RAG solution. The second half is wiring that index into a chat application so that every user question triggers retrieval before generation. Azure AI Foundry provides two paths:

Chat playground with "Add your data" — no-code, for quick testing and demos
Prompt Flow — visual pipeline with explicit nodes for retrieval, prompt assembly, and LLM generation — the exam-relevant production path

A typical Prompt Flow RAG chat flow has these nodes in sequence:

The Index Lookup ToolA built-in Prompt Flow tool that connects to an Azure AI Search vector index and performs configurable retrieval. The key integration point between your built index and the chat generation step. Accepts mlindex_content, queries, query_type, and top_k. is the key integration node — it connects to an Azure AI Search vector index and retrieves the top-scored chunks. Its required connection parameter is mlindex_contentThe Index Lookup tool's required connection parameter — a YAML blob containing the Azure AI Search endpoint, index name, embedding deployment identifier, and field mappings., a YAML blob containing the search endpoint and index configuration. Set top_kThe number of top-scored chunks returned by the Index Lookup tool. A value of 3–5 is typically recommended; too low risks missing the answer, too high adds noise and token cost. to 3–5 for best results. The retrieved chunks are assembled into a Grounding ContextThe block of retrieved chunk text prepended to the LLM prompt as source material. Prevents hallucination by instructing the model to answer only from the provided context. block prepended to the LLM prompt.

Deep Dive — How It Works

Index Lookup Tool — Key Inputs

Parameter	Type	Description	Required
`mlindex_content`	string (YAML blob)	Connection info: Azure AI Search endpoint, index name, embedding deployment, field mappings	Yes
`queries`	string or list	The text to search against the index (usually the user's question)	Yes
`query_type`	string	`Keyword`, `Vector`, `Hybrid`, `Semantic`, or `HybridWithSemanticReranking`	Yes
`top_k`	integer	Number of top-scored chunks to return (default: 3)	No

Index Lookup Tool — Output Format

[
  {
    "page_content": "The text of the retrieved chunk...",
    "score": 0.921,
    "metadata": {
      "source": { "filename": "policy.pdf", "url": "...", "page_number": 4 },
      "stats": { "chars": 1240, "tiktokens": 312 }
    }
  }
]

The score field is cosine similarity for vector/hybrid queries. Higher values indicate greater relevance.

Query Types for the Index Lookup Tool

Query Type	What it does	When to choose
`Keyword`	Full-text BM25 search only	Legacy search, exact-match terms
`Vector`	Embedding similarity search only	Fully semantic queries with no exact-match requirements
`Hybrid`	Keyword + vector in parallel, merged with RRF	Recommended for general RAG — best recall
`Semantic`	BM25 + semantic re-ranker	When precision over recall is the priority
`HybridWithSemanticReranking`	Hybrid + semantic re-ranker	Best overall quality; slight latency increase

Microsoft recommends Hybrid or HybridWithSemanticReranking for production chat applications.

Grounded Prompt Assembly

After retrieval, format chunks into a grounding context block prepended to the system or user message:

System: You are a helpful assistant. Answer questions using ONLY the information in the context below.
If the answer is not in the context, say "I don't know."

Context:
---
{{ context }}
---

User: {{ question }}

The {{ context }} placeholder is replaced with concatenated page_content values from the top-k retrieved chunks. This technique:

Prevents the model from hallucinating facts not in the source data
Enables accurate citations (metadata contains source filename and URL)
Keeps token usage predictable

Deployment Options After Testing

Once your Prompt Flow is tested, you can:

Deploy as an online endpoint — creates a REST API your application calls
Test in the Chat playground — the playground can consume the deployed flow endpoint directly
Integrate via the Azure AI Projects SDK — call the flow endpoint programmatically

The Chat playground's "Add your data" feature is a simplified version of the same RAG pipeline — it automatically creates an Index Lookup node under the hood.

Hands-On Lab

Add an Index Lookup Tool to a Prompt Flow

Sign in to Azure AI Foundry portal and open your project.
In the left sidebar, under Build and customize, select Prompt flow.
Select + Create, then select Chat flow and select Create again to scaffold a starter chat flow.
Select Start compute session and wait for the session to become active (2–3 minutes).
In the flow canvas, select + More tools and then select Index Lookup. Provide a node name (e.g., lookup) and select Add.
In the lookup node, select the mlindex_content value box. In the dropdown, select Index and choose the vector index you created in 4.1.
Set query_type to HybridWithSemanticReranking for best results. Set top_k to 3.
For queries, connect the input to ${inputs.question}.

Wire Retrieval Results into the LLM Node

Locate the existing LLM node in the flow (named chat or answer_the_question_with_context).

Edit the prompt template to include a context placeholder:

system:
You are a helpful assistant. Answer only from the provided context.
Context: {{context}}
user:
{{question}}

Add a Python node between the lookup node and the LLM node to format the retrieved chunks. Name it generate_context. In the Python code:

from promptflow import tool

@tool
def generate_context(chunks: list) -> str:
    return "\n\n".join([c["page_content"] for c in chunks])

Connect ${generate_context.output} to the context variable in the LLM node's prompt.
Connect ${inputs.question} to the question variable in the LLM node.

Test the Grounded Retrieval Pipeline

Select Run in the top-right of the flow canvas.
Enter a question that relates to content in your indexed documents.
After the run completes, expand the lookup node output to inspect retrieved chunks and their scores.
Expand the generate_context output to verify the context string was assembled correctly.
Review the final output answer and confirm it references information from the retrieved chunks.
If retrieval quality is poor (low scores, irrelevant chunks), adjust chunk size, overlap, or switch query_type.

Exam Angle — What AI-3016 Tests

AI-3016 Assessment Focus

Index Lookup tool configuration (query_type selection, top_k tradeoffs, and mlindex_content) and the distinction between pre-deployment and post-deployment testing paths are key exam topics.

Exam Trap

"The Index Lookup tool automatically embeds the user question before searching." Only partially true. For Vector or Hybrid queries, the tool calls the embedding model configured in mlindex_content. For Keyword query type, no embedding is generated — raw text is used.

Exam Trap

"Setting top_k to 1 always produces better answers because there is less context noise." A top_k of 1 retrieves only one chunk; if that chunk doesn't fully answer the question, the model may hallucinate or give an "I don't know" even when the answer exists in the index. Values of 3–5 are typically better.

Exam Trap

"Deploying a Prompt Flow endpoint is required before testing in the portal." You can test a flow interactively in the Prompt Flow authoring interface before any deployment, using the Run button to execute with a test input.

Exam Trap

"RAG eliminates hallucination entirely." RAG significantly reduces hallucination by grounding responses, but if the relevant information is not in the index (or chunking failed to capture it), the model may still fabricate an answer.

Exam Trap

"The semantic_configuration_name field only matters for Keyword queries." The semantic_configuration_name (e.g., azureml-default) is used for Semantic and HybridWithSemanticReranking query types. It has no effect on Keyword or Vector queries.

Exam Tip

For best overall quality: HybridWithSemanticReranking. For best recall with lower latency: Hybrid. Keyword-only is the weakest option for natural language queries.

Must Memorize

The Python node between Index Lookup and the LLM node is responsible for formatting the raw chunk array into a single context string. Without this formatting step, the LLM receives a raw JSON array, not readable prose context.

Question — click to flip

Q: In a Prompt Flow RAG pipeline, which tool performs the vector similarity search against an Azure AI Search index?

Question — click to flip

Q: A Prompt Flow Index Lookup node uses query_type 'Keyword'. A natural-language question returns zero results. What is the most likely cause?

Question — click to flip

Q: What does the 'score' field represent in the Index Lookup Tool output when query_type is 'Vector'?

Question — click to flip

Q: Which query_type provides the best overall retrieval quality with the highest latency cost?

Question — click to flip

Q: Can you test a Prompt Flow RAG pipeline before deploying an online endpoint?

Question — click to flip

Q: What is the purpose of the Python node between the Index Lookup node and the LLM node in a RAG flow?

4.2 — Integrate RAG into Prompt Flow and Chat Applications

Index Lookup Tool — Key Inputs

Index Lookup Tool — Output Format

Query Types for the Index Lookup Tool

Grounded Prompt Assembly

Deployment Options After Testing

Add an Index Lookup Tool to a Prompt Flow

Wire Retrieval Results into the LLM Node

Test the Grounded Retrieval Pipeline

AI-3016 Assessment Focus