Building the vector index (covered in 4.1) is only the first half of a RAG solution. The second half is wiring that index into a chat application so that every user question triggers retrieval before generation. Azure AI Foundry provides two paths:
- Chat playground with "Add your data" — no-code, for quick testing and demos
- Prompt Flow — visual pipeline with explicit nodes for retrieval, prompt assembly, and LLM generation — the exam-relevant production path
A typical Prompt Flow RAG chat flow has these nodes in sequence:
The Index Lookup ToolA built-in Prompt Flow tool that connects to an Azure AI Search vector index and performs configurable retrieval. The key integration point between your built index and the chat generation step. Accepts mlindex_content, queries, query_type, and top_k. is the key integration node — it connects to an Azure AI Search vector index and retrieves the top-scored chunks. Its required connection parameter is mlindex_contentThe Index Lookup tool's required connection parameter — a YAML blob containing the Azure AI Search endpoint, index name, embedding deployment identifier, and field mappings., a YAML blob containing the search endpoint and index configuration. Set top_kThe number of top-scored chunks returned by the Index Lookup tool. A value of 3–5 is typically recommended; too low risks missing the answer, too high adds noise and token cost. to 3–5 for best results. The retrieved chunks are assembled into a Grounding ContextThe block of retrieved chunk text prepended to the LLM prompt as source material. Prevents hallucination by instructing the model to answer only from the provided context. block prepended to the LLM prompt.
Index Lookup Tool — Key Inputs
| Parameter | Type | Description | Required |
|---|---|---|---|
mlindex_content | string (YAML blob) | Connection info: Azure AI Search endpoint, index name, embedding deployment, field mappings | Yes |
queries | string or list | The text to search against the index (usually the user's question) | Yes |
query_type | string | Keyword, Vector, Hybrid, Semantic, or HybridWithSemanticReranking | Yes |
top_k | integer | Number of top-scored chunks to return (default: 3) | No |
Index Lookup Tool — Output Format
[
{
"page_content": "The text of the retrieved chunk...",
"score": 0.921,
"metadata": {
"source": { "filename": "policy.pdf", "url": "...", "page_number": 4 },
"stats": { "chars": 1240, "tiktokens": 312 }
}
}
]
The score field is cosine similarity for vector/hybrid queries. Higher values indicate greater relevance.
Query Types for the Index Lookup Tool
| Query Type | What it does | When to choose |
|---|---|---|
Keyword | Full-text BM25 search only | Legacy search, exact-match terms |
Vector | Embedding similarity search only | Fully semantic queries with no exact-match requirements |
Hybrid | Keyword + vector in parallel, merged with RRF | Recommended for general RAG — best recall |
Semantic | BM25 + semantic re-ranker | When precision over recall is the priority |
HybridWithSemanticReranking | Hybrid + semantic re-ranker | Best overall quality; slight latency increase |
Microsoft recommends Hybrid or HybridWithSemanticReranking for production chat applications.
Grounded Prompt Assembly
After retrieval, format chunks into a grounding context block prepended to the system or user message:
System: You are a helpful assistant. Answer questions using ONLY the information in the context below.
If the answer is not in the context, say "I don't know."
Context:
---
{{ context }}
---
User: {{ question }}
The {{ context }} placeholder is replaced with concatenated page_content values from the top-k retrieved chunks. This technique:
- Prevents the model from hallucinating facts not in the source data
- Enables accurate citations (metadata contains source filename and URL)
- Keeps token usage predictable
Deployment Options After Testing
Once your Prompt Flow is tested, you can:
- Deploy as an online endpoint — creates a REST API your application calls
- Test in the Chat playground — the playground can consume the deployed flow endpoint directly
- Integrate via the Azure AI Projects SDK — call the flow endpoint programmatically
The Chat playground's "Add your data" feature is a simplified version of the same RAG pipeline — it automatically creates an Index Lookup node under the hood.
Add an Index Lookup Tool to a Prompt Flow
- Sign in to Azure AI Foundry portal and open your project.
- In the left sidebar, under Build and customize, select Prompt flow.
- Select + Create, then select Chat flow and select Create again to scaffold a starter chat flow.
- Select Start compute session and wait for the session to become active (2–3 minutes).
- In the flow canvas, select + More tools and then select Index Lookup. Provide a node name (e.g.,
lookup) and select Add. - In the
lookupnode, select the mlindex_content value box. In the dropdown, select Index and choose the vector index you created in 4.1. - Set query_type to
HybridWithSemanticRerankingfor best results. Set top_k to3. - For queries, connect the input to
${inputs.question}.
Wire Retrieval Results into the LLM Node
- Locate the existing LLM node in the flow (named
chatoranswer_the_question_with_context). - Edit the prompt template to include a context placeholder:
system: You are a helpful assistant. Answer only from the provided context. Context: {{context}} user: {{question}} - Add a Python node between the
lookupnode and the LLM node to format the retrieved chunks. Name itgenerate_context. In the Python code:from promptflow import tool @tool def generate_context(chunks: list) -> str: return "\n\n".join([c["page_content"] for c in chunks]) - Connect
${generate_context.output}to thecontextvariable in the LLM node's prompt. - Connect
${inputs.question}to thequestionvariable in the LLM node.
Test the Grounded Retrieval Pipeline
- Select Run in the top-right of the flow canvas.
- Enter a question that relates to content in your indexed documents.
- After the run completes, expand the lookup node output to inspect retrieved chunks and their scores.
- Expand the generate_context output to verify the context string was assembled correctly.
- Review the final output answer and confirm it references information from the retrieved chunks.
- If retrieval quality is poor (low scores, irrelevant chunks), adjust chunk size, overlap, or switch
query_type.
AI-3016 Assessment Focus
Index Lookup tool configuration (query_type selection, top_k tradeoffs, and mlindex_content) and the distinction between pre-deployment and post-deployment testing paths are key exam topics.
Exam Trap
"The Index Lookup tool automatically embeds the user question before searching." Only partially true. For Vector or Hybrid queries, the tool calls the embedding model configured in mlindex_content. For Keyword query type, no embedding is generated — raw text is used.
Exam Trap
"Setting top_k to 1 always produces better answers because there is less context noise." A top_k of 1 retrieves only one chunk; if that chunk doesn't fully answer the question, the model may hallucinate or give an "I don't know" even when the answer exists in the index. Values of 3–5 are typically better.
Exam Trap
"Deploying a Prompt Flow endpoint is required before testing in the portal." You can test a flow interactively in the Prompt Flow authoring interface before any deployment, using the Run button to execute with a test input.
Exam Trap
"RAG eliminates hallucination entirely." RAG significantly reduces hallucination by grounding responses, but if the relevant information is not in the index (or chunking failed to capture it), the model may still fabricate an answer.
Exam Trap
"The semantic_configuration_name field only matters for Keyword queries." The semantic_configuration_name (e.g., azureml-default) is used for Semantic and HybridWithSemanticReranking query types. It has no effect on Keyword or Vector queries.
Exam Tip
For best overall quality: HybridWithSemanticReranking. For best recall with lower latency: Hybrid. Keyword-only is the weakest option for natural language queries.
Must Memorize
The Python node between Index Lookup and the LLM node is responsible for formatting the raw chunk array into a single context string. Without this formatting step, the LLM receives a raw JSON array, not readable prose context.
Question — click to flip
Q: In a Prompt Flow RAG pipeline, which tool performs the vector similarity search against an Azure AI Search index?
Question — click to flip
Q: A Prompt Flow Index Lookup node uses query_type 'Keyword'. A natural-language question returns zero results. What is the most likely cause?
Question — click to flip
Q: What does the 'score' field represent in the Index Lookup Tool output when query_type is 'Vector'?
Question — click to flip
Q: Which query_type provides the best overall retrieval quality with the highest latency cost?
Question — click to flip
Q: Can you test a Prompt Flow RAG pipeline before deploying an online endpoint?
Question — click to flip
Q: What is the purpose of the Python node between the Index Lookup node and the LLM node in a RAG flow?