AZ-305 Learning Portal
Objective 3.2 45 minhigh priorityavailability-zonesload-balancerapplication-gatewaytraffic-managerazure-front-dooractive-activeavailability-setszrs

3.2 — Design for High Availability

Design high availability solutions using Availability Zones, load balancing services, active-active patterns, and zone-redundant infrastructure to maximize uptime and resilience across Azure workloads.

Concept — What & Why

High Availability Architecture

Availability ZonesPhysically separate datacenters within an Azure region, each with independent power, cooling, and networking. Minimum 2 zones per region (up to 3). Synchronous replication provides zero data loss between zones (RPO = 0). VMs, SQL Database, and many services support zone-redundant deployment.Availability SetsA logical grouping of VMs within a single availability zone that spreads them across fault domains (physical hardware) and update domains. Protects against planned maintenance and hardware failures WITHIN a zone. Does NOT protect against zone-level outages.Azure Load BalancerLayer 4 (TCP/UDP) load balancing within a region. Uses health probes to detect unhealthy backends and automatically removes them from rotation. Removal time = unhealthy threshold × probe interval. Standard tier required for zone-redundant deployment.Azure Front DoorMicrosoft's edge-based global load balancing and CDN service. Routes traffic to the nearest healthy origin via Microsoft's global PoPs (Points of Presence). Provides WAF at edge, sub-second failover detection, and automatic origin failover — significantly faster than Traffic Manager for global HA.Azure Traffic ManagerDNS-based global traffic routing across regions/endpoints. Multiple routing methods: Priority (primary/secondary), Performance (nearest), Weighted, Geographic. Failover is DNS-TTL dependent — minimum 30–60 seconds effective RTO. Best for multi-region routing, not sub-second failover.

HA Pattern Comparison

PatternRTORPOCostBest For
Active-Passive (hot standby)Seconds–minutesNear-zero2x infraCost-sensitive HA with brief downtime tolerance
Active-Active (symmetric)SecondsZero (sync zones)2x infraMission-critical maximum uptime
Multi-Region Active-ActiveSeconds (DNS)Seconds (async)2x+ infraGlobal applications, regional DR

Load Balancing Service Selection

ServiceLayerScopeKey Capability
Azure Load Balancer4 (TCP/UDP)RegionalHigh-throughput, low-latency, zone-redundant
Application Gateway7 (HTTP/S)RegionalURL routing, WAF, SSL termination
Traffic ManagerDNSGlobalMulti-region DNS failover
Azure Front Door7 (HTTP/S)Global edgeFastest failover, CDN, WAF at edge
Deep Dive — How It Works

Availability Zone Deployment Patterns

VM Deployment Options

OptionProtection LevelUse Case
No redundancyNoneDev/test only
Availability SetFault domain + update domain within zoneSingle-zone HA
Availability ZoneZone-level (full datacenter)Regional HA (recommended)
VMSS across zonesZone-level + auto-scalingElastic zonal HA

Key Rule: Availability Sets protect within a zone; Availability Zones protect across zones. For true regional HA, always use Availability Zones over Availability Sets.

Database HA Comparison

ServiceZone RedundancyGeo-ReplicationMax SLA
Azure SQL Database (Business Critical)Synchronous (RPO = 0)Asynchronous (RPO ≈ 5s)99.995%
Azure Cosmos DBMulti-region writes, auto failoverMulti-region (configurable consistency)99.999%
Azure DB for PostgreSQL (Flexible)Zone-redundant standbyRead replicas99.95%
Azure Storage (ZRS)Synchronous across 3 zonesRA-GZRS for geographic99.9999999%+

Traffic Manager vs. Azure Front Door

Traffic Manager effective RTO = health check detection time + DNS TTL expiration

  • Default health check: 30-second interval
  • DNS TTL: 60 seconds minimum
  • Effective minimum RTO: ~2–3 minutes

Azure Front Door failover: Detects origin failure within seconds and immediately routes to healthy origin — no DNS TTL delay. Required when RTO < 1 minute for multi-region scenarios.

Active-Active Sizing

For an active-active deployment where primary handles 70% load and secondary 30%:

  • Incorrect: Size secondary for 30% load only — cannot absorb 100% during failover
  • Correct: Size BOTH regions to handle 100% load independently

This is the fundamental active-active sizing requirement — each region must be capable of handling full production load independently.

Hands-On Lab

Hands-On: Deploy Zone-Redundant Application

Step 1: Deploy VMs to Availability Zones

  1. Create Virtual Machines > set Availability options to Availability zone
  2. Select Zone 1 for first VM, Zone 2 for second VM
  3. Deploy VMs with same size, OS, and application configuration

Step 2: Configure Azure Load Balancer (Standard)

  1. Create Load Balancer > Standard tier (required for zone redundancy)
  2. Configure Backend pool: Add both VMs (from Zone 1 and Zone 2)
  3. Create Health probe:
    • Protocol: HTTP
    • Port: 80, Path: /health
    • Interval: 15 seconds, Unhealthy threshold: 2
  4. Create Load balancing rule: Frontend IP → Backend pool, Port 80
  5. Test: Access load balancer IP — routes to both VMs; if one VM fails, probe removes it

Step 3: Configure Azure Front Door for Global HA

  1. Navigate to Front Door and CDN profiles > Create > Front Door
  2. Add Origins (backend resources):
    • Primary: App Service in East US
    • Secondary: App Service in West US
    • Health probe: Enabled, path /health, interval 30 seconds
  3. Add Routes: Pattern /*, HTTPS only
  4. Front Door routes to nearest healthy origin automatically

Step 4: Configure Traffic Manager (Priority Routing)

  1. Navigate to Traffic Manager profiles > Create
  2. Routing method: Priority
  3. Add endpoints:
    • Primary: Azure endpoint, priority = 1
    • Secondary: Azure endpoint, priority = 2
  4. Configure health check: HTTPS to /health endpoint
  5. Monitor: Traffic Manager detects failure after 3 consecutive failures (default)

Step 5: Enable Zone-Redundant SQL Database

  1. Navigate to SQL Databases > Create
  2. Service tier: Business Critical (or General Purpose for lower SLA)
  3. Enable Zone redundant database in Compute + Storage settings
  4. Verify in database Properties: Zone redundant = True
  5. SLA: 99.995% for Business Critical + zone redundancy
Exam Angle — What AZ-305 Tests

AZ-305 Exam Focus

AZ-305 tests your ability to select the right load balancing and HA services for specific requirements. The most common error is confusing Availability Sets with Availability Zones, or recommending Traffic Manager for sub-second failover when Front Door is required.

Exam Trap

Availability Sets Provide Full HA: Availability Sets protect against planned maintenance and hardware failures within a single zone. They do NOT protect against zone-level outages. For true regional HA (protection against a full datacenter failure), VMs must use Availability Zones.

Exam Trap

Traffic Manager for Sub-Second Failover: Traffic Manager is DNS-based — failover takes at minimum 30–60 seconds (health check + TTL). For scenarios requiring RTO under 1 minute for multi-region failover, Azure Front Door is the correct answer. Front Door detects failures in seconds and reroutes at the edge.

Exam Trap

Geo-Replication = Zero Data Loss: Geo-replication for Azure SQL Database is asynchronous — RPO is approximately 5 seconds, not zero. Zone redundancy within a region uses synchronous replication (RPO = 0). Don't confuse the two. For zero data loss across zone failure, use zone redundancy; for regional DR, geo-replication with accepted RPO.

Exam Trap

Failover Groups Are Instant: Failover groups automatically detect primary failure and redirect traffic, but the application must implement connection retry logic. There is typically a brief connection interruption during failover. Applications that don't retry connections may experience errors even after failover completes.

Exam Tip

Load Balancer Health Probe Math: If health probe interval is 15 seconds and unhealthy threshold is 2, a VM is removed from rotation in 30 seconds (2 × 15s). The exam sometimes gives scenarios with unexpected failover times — calculate threshold × interval to diagnose the cause.

Must Memorize

ZRS Behavior on Zone Failure: ZRS (Zone-Redundant Storage) synchronously replicates to 3 zones. If a zone fails, the other 2 zones retain complete data copies — all data remains accessible with zero data loss. This is a synchronous operation, not a backup — recovery is automatic with no restore process needed.

Question — click to flip

Q: What is the difference between Availability Sets and Availability Zones for VM HA?

Question — click to flip

Q: When should you use Azure Front Door instead of Traffic Manager for global HA?

Question — click to flip

Q: What is the RPO for Azure SQL Database with zone redundancy vs. geo-replication?

Question — click to flip

Q: In an active-active deployment across two regions, how should each region be sized?

Question — click to flip

Q: Why does Traffic Manager have a minimum failover time of 2–3 minutes?

Question — click to flip

Q: A company uses ZRS for Azure Blob Storage. One availability zone fails completely. How much data is lost?

Sources & Further Reading