Agent Settings Overview
Once you have created an agent in Azure AI Foundry, the real work is configuration. Agent settings determine who the agent is, how it thinks, and how it responds. Getting these settings right is the difference between an agent that reliably does its job and one that drifts, hallucinates, or confuses users.
Agent Name and Description
The agent name is a human-readable label displayed in the Foundry portal and returned in the Agents API response. It is not sent to the model — the model does not "know" its own name unless you mention it in the system instructions.
The description is an optional free-text field for internal documentation. Like the name, the description is metadata only — it is not injected into the model's context.
| Field | Visible to model? | Purpose |
|---|---|---|
| Name | No (unless in instructions) | Portal display, API response metadata |
| Description | No | Internal documentation for developers |
| System Instructions | Yes — always | Defines persona, scope, and behavior |
The System InstructionsThe system prompt field that is injected at the top of every conversation before any user input. The model treats this as its operating manual. It is the single most important configuration field for agent behavior. field is injected at the top of every conversation before any user input. Pair it with a TemperatureA sampling parameter (0.0–2.0) controlling how deterministic or creative the model's token selection is. Low values produce consistent, focused answers; high values produce creative, varied responses. setting appropriate to the agent's use case, a Max TokensThe maximum number of tokens the model can generate in a single completion (output only). Does not affect prompt/input tokens or the model's context window size. limit to control output length, and the correct Response FormatControls whether the agent returns free-form text (default) or a valid JSON object. JSON mode does not validate against a schema — it only guarantees syntactic validity. for your downstream consumers.
System Instructions Deep Dive
The system instructions field is the most powerful configuration in an agent. It is injected at the top of every conversation as the system message, before any user input.
Effective system instructions address five elements:
| Element | What to define | Example |
|---|---|---|
| Persona | Who the agent is and its role | "You are an expert HR assistant for Contoso Corporation." |
| Scope | What the agent will and will not help with | "Only answer questions about Contoso's leave and benefits policies." |
| Tone | Communication style | "Use a friendly, professional tone. Avoid jargon." |
| Limitations | What to do when out of scope or uncertain | "If you don't know the answer, say so and direct the user to HR@contoso.com." |
| Format | How responses should be structured | "Provide concise answers of 3 sentences or fewer unless detail is specifically requested." |
Sample system instruction:
You are an expert HR assistant for Contoso Corporation. Your role is to help employees understand leave policies, benefits, and HR procedures based on the official Contoso HR Handbook.
Only answer questions that are clearly within the scope of Contoso HR policy. If a question is outside your scope, politely decline and direct the user to HR@contoso.com or their local HR business partner.
Always use a professional, empathetic tone. Avoid legal advice — if a question has legal implications, recommend the employee speak with the Legal team.
Format your responses clearly. Use bullet points for multi-step processes. Keep answers under 200 words unless the user explicitly asks for more detail.
Temperature and Top-P
Temperature and top_p control how deterministic or creative the model's responses are. These are sampling parameters that affect the probability distribution the model samples from when generating each token.
| Parameter | Range | Low value | High value |
|---|---|---|---|
| Temperature | 0.0 – 2.0 | More deterministic, focused, repetitive | More creative, varied, potentially inconsistent |
| Top-P | 0.0 – 1.0 | Samples from only the most probable tokens | Considers a wider vocabulary of options |
How to choose:
- Factual / policy agents: Temperature 0.0–0.3 — consistent, predictable answers.
- Creative / writing agents: Temperature 0.7–1.0 — variety and creativity are desirable.
- General-purpose assistants: Temperature ~0.5–0.7 balances coherence with some response variety.
Microsoft's guidance: Alter one parameter at a time. Prefer adjusting temperature and leaving top_p at 1.0 for most agent scenarios.
Max Tokens and Response Format
Max tokens sets the maximum number of tokens the model can generate in a single completion (the agent's response). This controls output length only — it does not affect the prompt/input.
| Setting | Effect |
|---|---|
| Low max tokens (e.g., 256) | Forces concise responses; may truncate complex answers mid-sentence |
| High max tokens (e.g., 4096) | Allows detailed answers; consumes more TPM quota per turn |
| Not set / model default | Uses the model's built-in maximum (varies by model) |
Important: Max tokens is not the same as the model's context window. The context window is the total tokens (prompt + completion) the model can process. If your system instructions + thread history + user message already consume most of the context window, the model may have very few tokens left for its response regardless of the max tokens setting.
The response format setting controls whether the agent returns free-form text or structured JSON:
| Format | Use case | Behavior |
|---|---|---|
| Text (default) | Conversational agents, human-readable responses | Model responds in natural language |
| JSON object | Downstream systems, structured data extraction | Model is forced to return valid JSON; requires JSON instructions in system prompt |
When using JSON mode, you must instruct the model in the system prompt to return JSON and describe the expected schema. JSON mode does not validate against a schema — it only ensures the output is parseable JSON. For schema validation, use Structured Outputs (supported on GPT-4o 2024-08-06 and later).
Hands-On: Configure a Production-Ready Agent
Goal: Configure a production-ready agent with proper system instructions, temperature, and response format settings.
- Open your Foundry Project → Agents → select your agent (or create one).
- In the Name field, enter
Policy Assistant. In the Description field, enterAnswers questions about internal policies using structured, concise responses. - In the Instructions box, replace any existing content with:
You are a policy assistant for Contoso. You help employees find answers about leave, benefits, and workplace policies. Only answer questions related to Contoso policies. If a question is outside scope, say: "I can only help with Contoso policy questions. For other topics, contact support@contoso.com." Always respond concisely in 3 sentences or fewer. - Scroll down to the Model configuration section. Set Temperature to
0.2(low — policy answers should be consistent). - Leave Top-P at its default (
1.0). - Set Max tokens to
512(sufficient for a 3-sentence policy answer with room to spare). - Set Response format to Text (conversational responses, not structured JSON).
- Click Save.
- In the playground, test with:
What is the Contoso parental leave policy?— verify the response is concise and on-topic. - Test with:
What is the capital of France?— verify the agent declines and redirects to support@contoso.com.
AI-3018 Assessment Focus
System instructions, temperature semantics, and JSON mode constraints are heavily tested. Know exactly what each field does and does not do.
Exam Trap
"The agent name is injected into the model's context so the model knows its own name" — The agent name is metadata for the portal and API responses only. The model does not receive the name field. If you want the agent to refer to itself by name, you must include it in the system instructions.
Exam Trap
"Setting temperature to 0 makes responses completely identical every time" — Temperature 0 makes responses highly deterministic but not perfectly reproducible. Slight variations can still occur due to hardware floating-point differences and model infrastructure.
Exam Trap
"Max tokens controls total context length (prompt + response)" — Max tokens only limits the completion (output) tokens. The prompt tokens (system instructions + history + user message) are not constrained by max tokens.
Exam Trap
"JSON mode validates output against a schema" — JSON mode only ensures the output is syntactically valid JSON — it does not validate against any schema. Use Structured Outputs (GPT-4o 2024-08-06+) for schema-enforced output.
Exam Trap
"The description field helps the model stay on topic" — The description is for developers and portal display only; it is never sent to the model. Only the system instructions field influences model behavior.
Exam Tip
When writing system instructions, be explicit about what to do when out of scope. Vague scope statements ("focus on X") are consistently less reliable than explicit rules ("Only answer questions about X. If off-topic, say Y.").
Question — click to flip
Q: Is the agent Name field visible to the model during a conversation?
Question — click to flip
Q: What does JSON response format mode guarantee?
Question — click to flip
Q: For a compliance policy agent requiring consistent answers, what temperature is appropriate?
Question — click to flip
Q: What does max tokens control?
Question — click to flip
Q: When using JSON mode, what must you include in the system instructions?
Question — click to flip
Q: What are the five elements of effective system instructions?