AI-3018 Learning Portal
Objective 2.4 30 minmedium prioritytestingplaygroundtool-call-logcitationsdebuggingretrieval

2.4 — Test Knowledge Sources and Tools

Validate agent knowledge retrieval and tool execution by submitting structured test queries in the Foundry portal playground and inspecting tool call logs.

Prerequisites: 2.1, 2.2, 2.3
Concept — What & Why

Why Structured Testing Matters

After configuring knowledge sources and tools, you must verify that retrieval is accurate, tools are invoked correctly, and the agent's responses are grounded. The Azure AI Foundry portal playground is the primary interface for interactive testing without writing application code. Structured testing catches configuration errors before they reach production.

The Foundry Portal Playground Features

Playground FeaturePurpose
Chat windowSubmit test queries and see the agent's responses
Tool Call LogAn expandable panel in the Foundry playground that shows which tools were invoked during a conversation turn, including the input query sent to the tool and the raw output returned. This is the primary debugging surface for tool configuration issues.Expandable panel showing which tools were called, with arguments and raw results
Citations panelLists source documents and chunk references when File Search is used
Clear conversationResets thread so previous context does not influence next test
System prompt editorModify instructions in real time and retest without saving

Each response may include a CitationAn inline reference in the agent's response (e.g., [1]) that links to the source file and specific chunk retrieved by File Search. Citations appear only when File Search was called and the model chose to cite the source. linking to the source file and chunk retrieved by File Search. To isolate indexing problems, begin with an Exact Phrase QueryA test query that pastes a sentence verbatim from an uploaded document. This should always retrieve the correct chunk and is the first step in systematic retrieval debugging. — pasting a sentence verbatim from the uploaded document — before testing with paraphrased or out-of-scope queries.

Deep Dive — How It Works

Effective test queries for retrieval validation:

  1. Exact phrase query — paste a sentence verbatim from the uploaded document. This should always retrieve the correct chunk.
  2. Paraphrase query — rephrase the concept in different words. Tests semantic matching quality.
  3. Out-of-scope query — ask about something not in the documents. Confirms the agent says "I don't know" rather than hallucinating.
  4. Cross-document query — ask a question that requires combining information from two different files.

After each query, expand the Tool calls section to see:

  • Whether File Search was invoked.
  • Which file and chunk were retrieved.
  • The similarity score (if shown).

Reading the Tool Call Log

The tool call log exposes the raw mechanics of each agent turn:

Tool: file_search
Input:  { "query": "maximum upload size for vector store" }
Output: [
  { "file_id": "file-abc123", "filename": "overview.pdf", "score": 0.87,
    "text": "The single-file limit is 512 MB..." }
]

Citations appear inline in the response as numbered references (e.g., [1]). Each citation links to the source file and the specific chunk. If citations are absent when you expect them, it usually means the file search returned results but the model chose not to cite them — check the system prompt to see if citation instructions are present.

Common Issues and Fixes

SymptomMost Likely CauseFix
Tool not called at allTool not enabled, or tool_choice = noneEnable the tool in the Tools tab; check tool_choice setting
Wrong file retrievedWeak query-chunk alignment or low ranking_thresholdRephrase query; raise ranking_threshold; verify file content has text
Empty results from File SearchFile not yet indexed, or scanned PDFWait for Indexed status; re-upload a text-layer PDF
Code Interpreter ignores the uploaded fileFile uploaded to vector store, not to Code InterpreterUpload the file specifically under Code Interpreter in the Tools tab
Hallucinated answer despite file being uploadedmax_num_results too low or chunk mismatchIncrease max_num_results; review chunk sizes
Citation appears but content is wrongChunk boundary cuts off the relevant sentenceRephrase query; consider reducing chunk size

Systematic Retrieval Debugging Steps

  1. Check the Files tab — confirm status is Indexed (not Processing or Failed).
  2. Submit an exact phrase from the document. If this fails, the file may not have extractable text.
  3. Submit a rephrased version. If only exact phrases work, the embedding quality may be low.
  4. Lower ranking_threshold to 0 and re-test — this reveals whether results exist but are being filtered out.
  5. Increase max_num_results to 40 and check whether the correct chunk appears further down the list.
  6. Review the chunk list in the Files tab to see how the document was split.

Validating Code Interpreter Results

After Code Interpreter runs, verify:

  1. Correctness — manually compute a known value from the data and compare to the agent's output.
  2. Code visibility — expand the tool call log to read the generated Python. Check for assumptions (e.g., column names, data types).
  3. Error handling — submit malformed data or an ambiguous query. The model should report an error rather than hallucinate a result.
  4. Chart output — if a chart was generated, download it and confirm axes, labels, and data ranges are accurate.
Hands-On Lab

Hands-On: Systematic Testing of File Search and Code Interpreter

Goal: Systematically test File Search retrieval and Code Interpreter using the playground.

Part A — File Search

  1. Open your agent in Azure AI Foundry → Agents → click the Playground tab.
  2. Confirm File Search is enabled and at least one file shows Indexed status.
  3. Submit an exact sentence from the uploaded document. Observe: does a citation appear? Is the answer correct?
  4. Click Tool calls to expand the log. Note the retrieved file name and similarity score.
  5. Click Clear conversation, then rephrase the same query in different words. Compare results.
  6. Submit an out-of-scope question (topic not in any uploaded file). Confirm the agent acknowledges the limitation.

Part B — Code Interpreter

  1. Ensure Code Interpreter is enabled and a CSV file is uploaded under it.
  2. Ask: "How many rows are in the uploaded file and what is the average of the [column name] column?"
  3. Expand Tool calls to see the Python code generated.
  4. Manually verify the row count and average against the raw CSV.
  5. Ask a follow-up: "Create a bar chart of [column] by [category]." Confirm an image attachment appears in the response.
  6. Click the image to verify axes and data match the source file.
Exam Angle — What AI-3018 Tests

AI-3018 Assessment Focus

Testing is frequently presented as scenario-based troubleshooting. Know what each symptom implies and which parameter or configuration fixes it.

Exam Trap

"If the tool call log shows a tool was called, the result was used in the response" — Not necessarily. The model may receive the tool result and choose to override it with its own knowledge, especially if the system prompt does not explicitly require grounding.

Exam Trap

"Raising max_num_results always improves response quality" — Not always. Too many chunks can introduce noise and push the relevant chunk below the model's effective context attention window, actually degrading quality.

Exam Trap

"Citations in the response guarantee the answer is factually correct" — False. The model may misquote or misinterpret a cited chunk. Citations show the source; they do not validate the model's reasoning.

Exam Trap

"Clearing the conversation thread also resets the tool configuration" — False. Clearing only removes the message history. Tool configuration and uploaded files remain unchanged.

Exam Trap

"A Failed file status can be fixed by re-enabling File Search" — False. A failed file must be deleted and re-uploaded. Toggling the tool on/off does not retry indexing.

Exam Tip

When exact phrase queries succeed but paraphrased queries fail, increase max_num_results first — the correct chunk likely exists but ranks lower for paraphrased input.

Question — click to flip

Q: If the tool call log shows File Search returned results but no citation appears, what is the most likely cause?

Question — click to flip

Q: What does 'Clear conversation' in the playground reset?

Question — click to flip

Q: A file shows 'Failed' status in the Files tab. How do you fix it?

Question — click to flip

Q: Exact phrase queries work but paraphrased queries return wrong answers. What is the first debugging step?

Question — click to flip

Q: Do citations in an agent response guarantee the answer is factually correct?

Question — click to flip

Q: A CSV uploaded to the File Search vector store is not accessible to Code Interpreter. Why?

Sources & Further Reading