High Availability
Azure backs High availabilityThe ability of cloud resources and applications to remain accessible with minimal downtime, even during disruptions, backed by SLA-defined uptime guarantees. with Service Level Agreements (SLAs) that specify guaranteed uptime percentages for each service.
| SLA Uptime | Max downtime per month | Max downtime per year |
|---|---|---|
| 99% | ~7.2 hours | ~3.65 days |
| 99.9% | ~43.8 minutes | ~8.76 hours |
| 99.95% | ~21.9 minutes | ~4.38 hours |
| 99.99% | ~4.4 minutes | ~52.6 minutes |
Key point: SLAs are per-service guarantees. Composite architectures combining multiple services may have a lower combined SLA unless designed with redundancy (e.g., availability zones).
Scalability
ScalabilityThe ability to adjust resource capacity to match demand — up when traffic spikes, down when it subsides — paying only for what you use. in the cloud comes in two forms — vertical (resizing a single resource) and horizontal (changing the number of instances):
| Type | What changes | Example |
|---|---|---|
| Vertical scaling (scale up/down) | Capabilities of a single resource | Add more CPU or RAM to a VM |
| Horizontal scaling (scale out/in) | Number of resource instances | Add more VMs or container replicas |
- Scale out = add more instances (handle more concurrent users)
- Scale in = remove instances (reduce cost when traffic drops)
- Scale up = increase resource size (handle heavier single-thread workloads)
- Scale down = reduce resource size (save money when over-provisioned)
Reliability
The cloud's decentralized, global design underpins ReliabilityThe ability of a system to recover from failures and continue to function; a pillar of the Microsoft Azure Well-Architected Framework., enabling applications to keep running even when parts of the infrastructure fail:
- Resources can be deployed across multiple regions worldwide.
- If one region experiences a catastrophic event, other regions continue operating.
- Applications can be architected to automatically fail over to a healthy region.
Predictability
Predictability comes in two forms:
Performance predictability — confidence that your application will have the resources it needs:
- Autoscaling adds resources when demand rises and removes them when it drops.
- Load balancing distributes traffic across healthy instances.
- High availability design patterns maintain consistent response times.
Cost predictability — confidence in your cloud spend:
- Track resource usage in real time with Azure Cost Management.
- Use the Azure Pricing Calculator to estimate costs before deploying.
- Set budgets and alerts to prevent surprise bills.
Both types of predictability are reinforced by following the Azure Well-Architected Framework.
Security
The cloud offers a range of security controls. The right choice depends on how much control you need:
| Need | Best Service Type | Why |
|---|---|---|
| Maximum control (manage your own OS, patches, firewall) | IaaS | You control the full stack above the physical layer |
| Automatic patching and maintenance | PaaS or SaaS | Provider handles OS and middleware updates |
| Protection against large-scale DDoS attacks | Any cloud service | Cloud providers operate at global scale |
Cloud providers are well-positioned to handle DDoS attacks due to massive network capacity and built-in mitigation services like Azure DDoS Protection.
Governance
GovernanceThe set of policies, controls, and auditing mechanisms that ensure deployed resources meet corporate standards and regulatory requirements. in the cloud is enforced through a combination of templates, policy, and auditing:
- Templates enforce that new deployments conform to approved configurations.
- Azure Policy flags resources that drift out of compliance and suggests remediation.
- Cloud-based auditing provides a continuous compliance baseline.
- Automatic patches (in PaaS/SaaS) help maintain governance standards without manual effort.
Establishing a governance footprint early keeps your cloud environment secure, compliant, and manageable at scale.
Manageability
There are two dimensions of manageability:
Management OF the cloud (what you can manage automatically):
- Autoscale resources based on demand
- Deploy from preconfigured templates (no manual configuration)
- Monitor resource health and automatically replace failing resources
- Receive real-time alerts when metrics breach thresholds
Management IN the cloud (how you interact with resources):
| Interface | When to use |
|---|---|
| Azure portal (web UI) | Visual exploration, one-off tasks, learning |
| Azure CLI | Scripting, automation, cross-platform |
| Azure PowerShell | Scripting in Windows/PowerShell-heavy environments |
| REST APIs | Programmatic integration from applications |
| ARM templates / Bicep | Repeatable, version-controlled infrastructure deployments |
High Availability vs. Reliability — Side-by-Side
These two benefits are tested separately on AZ-900. Many candidates confuse them.
| Attribute | High Availability | Reliability |
|---|---|---|
| Focus | Maximizing uptime | Recovering from failures |
| Measured by | SLA uptime % | Resilience and redundancy design |
| Key Azure feature | SLAs, availability zones | Multi-region deployment, auto-failover |
| Framework pillar | N/A | Azure Well-Architected Framework |
| Example scenario | 99.99% SLA for Azure VMs | App automatically reroutes to West US when East US fails |
Scalability — Vertical vs. Horizontal
| Dimension | Vertical (Scale Up/Down) | Horizontal (Scale Out/In) |
|---|---|---|
| What changes | Size of one resource | Count of resources |
| Limit | VM size ceiling | Practically unlimited |
| Downtime risk | Possible | Usually none |
| Best for | Stateful, single-instance workloads | Stateless, web-tier workloads |
| Azure example | Resize VM from D2s to D8s | VM Scale Set adds 3 VMs during peak |
Autoscaling is a mechanism for scalability — but scalability includes manual scaling too. They are not synonymous terms.
The Two Faces of Predictability
| Type | Tools | Example |
|---|---|---|
| Performance predictability | Autoscaling, load balancing, HA design | App maintains sub-200ms response during Black Friday traffic surge |
| Cost predictability | Cost Management, Pricing Calculator, budgets | CFO receives monthly spend forecast within ±5% of actual bill |
Both types are grounded in the Azure Well-Architected Framework.
Management Interfaces — Feature Comparison
| Tool | Installation needed? | Best audience | Automation-friendly? |
|---|---|---|---|
| Azure portal | None (browser) | Visual learners, one-off tasks | Low |
| Azure CLI | Yes (or Cloud Shell) | Bash / DevOps engineers | High |
| Azure PowerShell | Yes (or Cloud Shell) | Windows admins | High |
| REST APIs | None (HTTP client) | Developers, integrations | Very high |
| ARM / Bicep | Text editor + CLI/PS | Infrastructure engineers | Very high |
Explore Scalability, Governance, and Alerts
Step 1 — View Autoscale settings on an App Service Plan
- Sign in to portal.azure.com.
- Navigate to App Services → select an existing App Service (or create a free-tier one).
- Under Settings, select Scale out (App Service plan).
- Toggle to Custom autoscale and observe the rule builder — this is horizontal scaling (scale out/in).
Step 2 — Set a Cost Budget Alert
- Navigate to Cost Management + Billing → Cost Management → Budgets.
- Select + Add, define a monthly budget amount, and configure an alert at 80% threshold.
- Add an email address to the action group — this is cost predictability in action.
Step 3 — Explore Azure Policy (Governance)
- Search for Policy in the portal.
- Open Definitions and filter by category = "Tags".
- Select Require a tag on resources — read the policy rule JSON to understand how governance is enforced on new resource deployments.
Step 4 — Check SLA Reference
- Navigate to azure.microsoft.com/support/legal/sla/summary/.
- Find the SLA for Azure Virtual Machines — note the uptime % and calculate monthly downtime allowance.
AZ-900 Exam Focus
Exam Trap
"High availability and reliability are the same thing" — They are distinct. High availability = maximizing uptime (SLA-backed). Reliability = ability to recover from failures and keep functioning (includes resilience and redundancy design). Expect at least one question testing this distinction.
Exam Trap
"Scalability means automatically scaling" — Not necessarily. Scaling can be manual or automatic. Autoscaling is one mechanism for scalability, but they are not synonymous. The exam may describe manual scaling as an example of scalability.
Exam Trap
"Predictability only refers to cost" — Predictability has two forms: performance predictability and cost predictability. The exam tests both. Know each form and the Azure tools that enable it.
Exam Trap
"Management in the cloud only means the web portal" — The exam tests all management interfaces: portal, CLI, PowerShell, REST APIs, and ARM templates. Management IN the cloud = the interface you use. Management OF the cloud = automated operations (autoscale, alerts, template deployment).
Exam Tip
Governance is broader than security — Governance covers compliance, standards enforcement, auditing, cost management, and policy. When a question mentions "organizational standards" or "regulatory compliance," the answer is likely governance, not security.
Must Memorize
The 7 cloud benefits: High Availability · Scalability · Reliability · Predictability · Security · Governance · Manageability
Question — click to flip
Q: What is the difference between high availability and reliability in Azure?
Question — click to flip
Q: What is the difference between vertical and horizontal scaling?
Question — click to flip
Q: What are the two types of predictability in cloud computing?
Question — click to flip
Q: What is the difference between 'management OF the cloud' and 'management IN the cloud'?
Question — click to flip
Q: A 99.99% SLA allows how much downtime per month?
Question — click to flip
Q: Which cloud benefit ensures resources remain accessible with minimal downtime and is measured by SLA uptime percentages?