AI-3016 Learning Portal

Concept — What & Why

Once you identify a model in the catalog, you deploy it to make it available for inference. Azure AI Foundry offers two high-level deployment paths:

Standard deployment in Foundry resources — preferred; supports ADMs and select partner models
Managed compute deployment — for open-weight and custom models

The Foundry portal automatically routes you to the correct option based on the model you choose.

Key deployment settings you configure at creation time:

Setting	Description
Deployment name	Used as the `model` parameter in API calls — immutable after creation
Deployment type	Standard, Global-Standard, Global-Batch, Provisioned-Managed, etc.
Tokens per Minute (TPM)A rate limit on throughput allocated from your subscription's per-region, per-model quota pool. TPM is not the model's context window — it controls how many tokens your deployment can process per minute across all callers.	Rate limit from your subscription quota; adjustable post-deployment
Content filter policy	The filter configuration to attach
Version Upgrade PolicyA deployment setting that controls what happens when a new model version is released. Options: OnceNewDefaultVersionAvailable (auto-upgrade to new default), OnceCurrentVersionExpired (upgrade only at retirement), NoAutoUpgrade (manual only — stops at retirement).	Controls auto-upgrade behavior

You can test deployed models interactively in the Foundry PlaygroundThe interactive no-code web interface in Azure AI Foundry for testing deployed models by sending prompts and reviewing responses, with controls for system message, temperature, max response tokens, and top P. without writing any code.

Deep Dive — How It Works

Deployment Types Compared

Deployment Type	Billing	Best For
Standard	Per token (TPM quota)	Development, variable workloads
Global-Standard	Per token, global routing	Broadest regional availability
Provisioned-Managed (PTU)	Reserved capacity (per hour)	High-volume, latency-sensitive production
Global-Batch	Per token, async	Cost-optimized batch processing; no playground
Managed compute	VM core-hours	Open-weight / custom models

Version Upgrade Policies

Policy	Behavior	Best for
`OnceNewDefaultVersionAvailable`	Auto-upgrades when a new default version is set	Development environments; keep current
`OnceCurrentVersionExpired`	Upgrades only when the current version is retired	Production; safest middle ground
`NoAutoUpgrade`	Never auto-upgrades; deployment stops working when the pinned version is retired	Strict version locking (requires active monitoring)

Editable vs. Fixed Post-Deployment Settings

Setting	Can edit after deployment?
Deployment name	No — immutable; delete and redeploy
Tokens per Minute (TPM)	Yes — from the deployment details page
Content filter policy	Yes — replace policy from deployment page
Model version	Yes — triggers `Updating` provisioning state
Deployment type	No — fixed at creation
Azure region	No — fixed at creation

Playground Parameters

Parameter	Range	Effect
Temperature	0–2	Controls randomness; 0 = deterministic, 2 = very random
Max response (tokens)	1–model max	Caps generated response length
Top P	0–1	Nucleus sampling; adjust Temperature OR Top P, not both
Stop sequences	String list	Tokens that halt generation

Hands-On Lab

Deploy a model from the catalog:

Navigate to Azure AI Foundry portal → Discover → Models → select a model (e.g., gpt-4o-mini) → Deploy → Custom settings.

In the deployment wizard: set Deployment name → choose Deployment type (e.g., Global-Standard) → adjust Tokens per Minute slider → assign a Content filter policy → set Version upgrade policy → select Deploy.

Wait for Provisioning state to show Succeeded on the Models + endpoints page.

Test in the playground:

From the deployment list, click the deployment name → Open in playground (or navigate to Playgrounds → Chat).

In the System message box, enter instructions (e.g., "You are a concise technical assistant.") → select Apply changes.

Type a user prompt in the chat box → press Enter to send → review the response.

Adjust Temperature (0–2 scale) or Max response tokens in the Parameters panel → resend the same prompt to observe differences.

Select View code / </> Code tab → copy the pre-populated Python snippet to validate API connectivity.

Edit an existing deployment:

Navigate to Models + endpoints → select the deployment name → Edit (pencil icon).

Increase or decrease the Tokens per Minute allocation → select Save.

To update the model version: in the Properties pane select Edit → change Model version in the dropdown → confirm. The deployment enters Updating state for a few minutes.

Exam Angle — What AI-3016 Tests

AI-3016 Assessment Focus

Deployment-type selection and TPM vs. context-window confusion are high-frequency exam topics. Know which settings are editable post-deployment and what happens when a pinned version is retired with NoAutoUpgrade.

Exam Trap

"Batch deployment supports the Foundry playground for testing." Global-Batch does not support playground testing. Use Standard or Global-Standard deployments for interactive validation.

Exam Trap

"You can change the deployment name after a model is deployed." The deployment name is immutable after creation. If you need a different name, delete and redeploy.

Exam Trap

"TPM quota is the same as the model's max input token limit." TPM is a throughput rate limit allocated from your subscription quota. The model's max input token limit (context window) is a fixed model property unaffected by TPM.

Exam Trap

"The NoAutoUpgrade policy keeps a deployment running indefinitely." When the pinned model version reaches its retirement date, deployments with NoAutoUpgrade will stop serving requests. Manual version update before retirement is required.

Exam Trap

"Temperature and Top P should be adjusted together for best results." Microsoft explicitly recommends adjusting either Temperature or Top P — not both simultaneously — as combining them produces unpredictable behavior.

Exam Tip

For overnight batch cost optimization: Global-Batch. For latency-sensitive high-volume production: Provisioned-Managed (PTU). For development with variable load: Standard or Global-Standard.

Must Memorize

After deployment, only TPM, content filter, and model version are editable. Deployment name, type, provider, and region are fixed.

Question — click to flip

Q: A company processes large document volumes overnight at minimum cost with no playground requirement. Which deployment type is most appropriate?

Question — click to flip

Q: With NoAutoUpgrade policy, what happens when the pinned model version reaches its retirement date?

Question — click to flip

Q: Which deployment settings can be modified after a model is deployed in Azure AI Foundry?

Question — click to flip

Q: What is the primary difference between Standard and Provisioned-Managed (PTU) deployment types?

Question — click to flip

Q: A developer wants to test a deployed model without writing code. Which Foundry feature enables this?

Question — click to flip

Q: What does the Tokens per Minute (TPM) setting control in an Azure AI Foundry deployment?

2.2 — Configure, Deploy, Edit, and Test Models

Deployment Types Compared

Version Upgrade Policies

Editable vs. Fixed Post-Deployment Settings

Playground Parameters

AI-3016 Assessment Focus