High Availability Architecture
Availability ZonesPhysically separate datacenters within an Azure region, each with independent power, cooling, and networking. Minimum 2 zones per region (up to 3). Synchronous replication provides zero data loss between zones (RPO = 0). VMs, SQL Database, and many services support zone-redundant deployment.Availability SetsA logical grouping of VMs within a single availability zone that spreads them across fault domains (physical hardware) and update domains. Protects against planned maintenance and hardware failures WITHIN a zone. Does NOT protect against zone-level outages.Azure Load BalancerLayer 4 (TCP/UDP) load balancing within a region. Uses health probes to detect unhealthy backends and automatically removes them from rotation. Removal time = unhealthy threshold × probe interval. Standard tier required for zone-redundant deployment.Azure Front DoorMicrosoft's edge-based global load balancing and CDN service. Routes traffic to the nearest healthy origin via Microsoft's global PoPs (Points of Presence). Provides WAF at edge, sub-second failover detection, and automatic origin failover — significantly faster than Traffic Manager for global HA.Azure Traffic ManagerDNS-based global traffic routing across regions/endpoints. Multiple routing methods: Priority (primary/secondary), Performance (nearest), Weighted, Geographic. Failover is DNS-TTL dependent — minimum 30–60 seconds effective RTO. Best for multi-region routing, not sub-second failover.HA Pattern Comparison
| Pattern | RTO | RPO | Cost | Best For |
|---|---|---|---|---|
| Active-Passive (hot standby) | Seconds–minutes | Near-zero | 2x infra | Cost-sensitive HA with brief downtime tolerance |
| Active-Active (symmetric) | Seconds | Zero (sync zones) | 2x infra | Mission-critical maximum uptime |
| Multi-Region Active-Active | Seconds (DNS) | Seconds (async) | 2x+ infra | Global applications, regional DR |
Load Balancing Service Selection
| Service | Layer | Scope | Key Capability |
|---|---|---|---|
| Azure Load Balancer | 4 (TCP/UDP) | Regional | High-throughput, low-latency, zone-redundant |
| Application Gateway | 7 (HTTP/S) | Regional | URL routing, WAF, SSL termination |
| Traffic Manager | DNS | Global | Multi-region DNS failover |
| Azure Front Door | 7 (HTTP/S) | Global edge | Fastest failover, CDN, WAF at edge |
Availability Zone Deployment Patterns
VM Deployment Options
| Option | Protection Level | Use Case |
|---|---|---|
| No redundancy | None | Dev/test only |
| Availability Set | Fault domain + update domain within zone | Single-zone HA |
| Availability Zone | Zone-level (full datacenter) | Regional HA (recommended) |
| VMSS across zones | Zone-level + auto-scaling | Elastic zonal HA |
Key Rule: Availability Sets protect within a zone; Availability Zones protect across zones. For true regional HA, always use Availability Zones over Availability Sets.
Database HA Comparison
| Service | Zone Redundancy | Geo-Replication | Max SLA |
|---|---|---|---|
| Azure SQL Database (Business Critical) | Synchronous (RPO = 0) | Asynchronous (RPO ≈ 5s) | 99.995% |
| Azure Cosmos DB | Multi-region writes, auto failover | Multi-region (configurable consistency) | 99.999% |
| Azure DB for PostgreSQL (Flexible) | Zone-redundant standby | Read replicas | 99.95% |
| Azure Storage (ZRS) | Synchronous across 3 zones | RA-GZRS for geographic | 99.9999999%+ |
Traffic Manager vs. Azure Front Door
Traffic Manager effective RTO = health check detection time + DNS TTL expiration
- Default health check: 30-second interval
- DNS TTL: 60 seconds minimum
- Effective minimum RTO: ~2–3 minutes
Azure Front Door failover: Detects origin failure within seconds and immediately routes to healthy origin — no DNS TTL delay. Required when RTO < 1 minute for multi-region scenarios.
Active-Active Sizing
For an active-active deployment where primary handles 70% load and secondary 30%:
- Incorrect: Size secondary for 30% load only — cannot absorb 100% during failover
- Correct: Size BOTH regions to handle 100% load independently
This is the fundamental active-active sizing requirement — each region must be capable of handling full production load independently.
Hands-On: Deploy Zone-Redundant Application
Step 1: Deploy VMs to Availability Zones
- Create Virtual Machines > set Availability options to Availability zone
- Select Zone 1 for first VM, Zone 2 for second VM
- Deploy VMs with same size, OS, and application configuration
Step 2: Configure Azure Load Balancer (Standard)
- Create Load Balancer > Standard tier (required for zone redundancy)
- Configure Backend pool: Add both VMs (from Zone 1 and Zone 2)
- Create Health probe:
- Protocol: HTTP
- Port: 80, Path:
/health - Interval: 15 seconds, Unhealthy threshold: 2
- Create Load balancing rule: Frontend IP → Backend pool, Port 80
- Test: Access load balancer IP — routes to both VMs; if one VM fails, probe removes it
Step 3: Configure Azure Front Door for Global HA
- Navigate to Front Door and CDN profiles > Create > Front Door
- Add Origins (backend resources):
- Primary: App Service in East US
- Secondary: App Service in West US
- Health probe: Enabled, path
/health, interval 30 seconds
- Add Routes: Pattern
/*, HTTPS only - Front Door routes to nearest healthy origin automatically
Step 4: Configure Traffic Manager (Priority Routing)
- Navigate to Traffic Manager profiles > Create
- Routing method: Priority
- Add endpoints:
- Primary: Azure endpoint, priority = 1
- Secondary: Azure endpoint, priority = 2
- Configure health check: HTTPS to
/healthendpoint - Monitor: Traffic Manager detects failure after 3 consecutive failures (default)
Step 5: Enable Zone-Redundant SQL Database
- Navigate to SQL Databases > Create
- Service tier: Business Critical (or General Purpose for lower SLA)
- Enable Zone redundant database in Compute + Storage settings
- Verify in database Properties: Zone redundant = True
- SLA: 99.995% for Business Critical + zone redundancy
AZ-305 Exam Focus
AZ-305 tests your ability to select the right load balancing and HA services for specific requirements. The most common error is confusing Availability Sets with Availability Zones, or recommending Traffic Manager for sub-second failover when Front Door is required.
Exam Trap
Availability Sets Provide Full HA: Availability Sets protect against planned maintenance and hardware failures within a single zone. They do NOT protect against zone-level outages. For true regional HA (protection against a full datacenter failure), VMs must use Availability Zones.
Exam Trap
Traffic Manager for Sub-Second Failover: Traffic Manager is DNS-based — failover takes at minimum 30–60 seconds (health check + TTL). For scenarios requiring RTO under 1 minute for multi-region failover, Azure Front Door is the correct answer. Front Door detects failures in seconds and reroutes at the edge.
Exam Trap
Geo-Replication = Zero Data Loss: Geo-replication for Azure SQL Database is asynchronous — RPO is approximately 5 seconds, not zero. Zone redundancy within a region uses synchronous replication (RPO = 0). Don't confuse the two. For zero data loss across zone failure, use zone redundancy; for regional DR, geo-replication with accepted RPO.
Exam Trap
Failover Groups Are Instant: Failover groups automatically detect primary failure and redirect traffic, but the application must implement connection retry logic. There is typically a brief connection interruption during failover. Applications that don't retry connections may experience errors even after failover completes.
Exam Tip
Load Balancer Health Probe Math: If health probe interval is 15 seconds and unhealthy threshold is 2, a VM is removed from rotation in 30 seconds (2 × 15s). The exam sometimes gives scenarios with unexpected failover times — calculate threshold × interval to diagnose the cause.
Must Memorize
ZRS Behavior on Zone Failure: ZRS (Zone-Redundant Storage) synchronously replicates to 3 zones. If a zone fails, the other 2 zones retain complete data copies — all data remains accessible with zero data loss. This is a synchronous operation, not a backup — recovery is automatic with no restore process needed.
Question — click to flip
Q: What is the difference between Availability Sets and Availability Zones for VM HA?
Question — click to flip
Q: When should you use Azure Front Door instead of Traffic Manager for global HA?
Question — click to flip
Q: What is the RPO for Azure SQL Database with zone redundancy vs. geo-replication?
Question — click to flip
Q: In an active-active deployment across two regions, how should each region be sized?
Question — click to flip
Q: Why does Traffic Manager have a minimum failover time of 2–3 minutes?
Question — click to flip
Q: A company uses ZRS for Azure Blob Storage. One availability zone fails completely. How much data is lost?