Backup and Disaster Recovery Fundamentals
Recovery Time Objective (RTO)The maximum acceptable downtime — how long the business can tolerate service unavailability. Lower RTO requires faster failover mechanisms (Site Recovery, active-active) and costs more. Achievable via replication (minutes) or restore-from-backup (hours to days).Recovery Point Objective (RPO)The maximum acceptable data loss measured in time — how much data the business can afford to lose. Lower RPO requires more frequent snapshots or continuous replication. Crash-consistent recovery points every 5 minutes (Site Recovery); application-consistent every 1–24 hours.Azure BackupA managed backup service for VMs, SQL Server databases, on-premises servers, file shares, and blobs. Provides incremental backups, instant restore snapshots (1–5 day retention), soft delete (14–180 days), immutable vault, and cross-region restore (CRR) for multi-region compliance.Azure Site RecoveryOrchestrates replication and failover for VMs (Azure-to-Azure, on-premises to Azure, VMware, Hyper-V, physical). Provides crash-consistent recovery points every 5 minutes and application-consistent every 1–24 hours. Non-disruptive test failover validates DR without affecting production.Recovery Services VaultThe central management resource for both Azure Backup and Site Recovery. Stores backup data and replication metadata, manages retention policies, provides soft delete protection, and supports immutable vault configuration for compliance.Backup vs. Replication: When to Use Which
| Approach | RTO | RPO | Cost | Best For |
|---|---|---|---|---|
| Backup only | Hours–Days | Hours (daily) | Low | Non-critical, compliance archiving |
| Site Recovery replication | Minutes (less than 1 hour SLA) | 5 minutes (crash-consistent) | Medium | Mission-critical workloads |
| Active-active + replication | Seconds | Zero (sync) | High | Highest-criticality, zero tolerance |
Recovery Point Types
| Type | Interval | Best For |
|---|---|---|
| Crash-consistent | Every 5 minutes (automatic) | Stateless apps, filesystems |
| Application-consistent | Every 1–24 hours | Databases (SQL, SAP HANA) |
| Instant restore snapshots | On backup schedule | Fast local VM restore |
DR Architecture Patterns
Recovery Plan Design for Multi-Tier Applications
Recovery plans support up to 7 groups with sequential failover. Database must be online before app tier connects; app tier must be healthy before web tier receives traffic. Manual actions between groups can run runbooks or validation scripts.
Azure Backup Key Features
| Feature | Description | Design Implication |
|---|---|---|
| Instant Restore | VM snapshots (1–5 days) retained locally | Fastest restore; separate from vault backup |
| Soft Delete | 14–180 day recovery window for deleted backups | Protect against accidental deletion |
| Immutable Vault | Lock vault to prevent modification or deletion | Compliance (ransomware protection) |
| Cross-Region Restore (CRR) | Restore to secondary region | Regional DR; must be explicitly enabled; increases cost |
| Multi-Tier Retention | Daily/weekly/monthly/yearly policies | Compliance archiving (7-year, 10-year) |
CRR Important Note: Cross-Region Restore must be explicitly enabled on the vault — it is NOT enabled by default. Enabling CRR increases vault storage cost because backup data is replicated to the secondary region.
Site Recovery: Extensions Are NOT Replicated
Site Recovery replicates VM disk data but does NOT replicate VM extensions (SQL IaaS Extension, monitoring agents, antivirus, custom script extensions). These must be installed post-failover via:
- Runbooks in the recovery plan (automated)
- Manual installation scripts
- Azure Policy DeployIfNotExists on the target resource group
Cost-Optimized Compliance Backup Strategy
For 7-year retention compliance with minimal cost:
- Daily retention: 30 days (short-term, interactive)
- Weekly retention: 52 weeks (1 year)
- Monthly retention: 12 months
- Yearly retention (archive): 7 years at archive tier pricing (significantly cheaper)
This tier-based approach avoids paying interactive prices for data that's only accessed once per year for audits.
Hands-On: Configure Azure Backup and Site Recovery
Step 1: Create Recovery Services Vault
- Navigate to Recovery Services vaults > Create
- Configure name, subscription, resource group, region
- Review + create
Step 2: Configure Vault Settings
- Open vault > Settings > Backup Configuration:
- Storage redundancy: Geo-Redundant (GRS) for DR capability
- Enable Soft Delete (configure 14–180 days)
- Enable Immutable vault for ransomware protection
- Enable Cross-Region Restore if multi-region restore is required
Step 3: Enable VM Backup
- Open vault > Backup > Azure > Virtual machine
- Select VMs to protect
- Create backup policy:
- Frequency: Daily
- Retention: 30 days daily, 52 weeks weekly, 12 months monthly, 7 years yearly
- Instant restore: 3 days
- Click Enable Backup — initial backup starts immediately
Step 4: Configure Site Recovery for Azure VM
- Open vault > Site Recovery > Prepare infrastructure
- Source region: e.g., East US; Target region: e.g., West US
- Configure replication policy:
- Recovery point retention: 24 hours
- App-consistent snapshots: Every 4 hours (requires VSS enabled on Windows VMs)
- Select VMs and enable replication
- Create Recovery Plan:
- Group 1: Database VMs
- Group 2: Application VMs
- Group 3: Web VMs
- Add runbooks between groups for health validation
Step 5: Run Test Failover (Non-Disruptive)
- Open Recovery Plan > Test failover
- Choose: Latest processed recovery point (fastest failover)
- Select target virtual network
- Click OK — Azure creates test VMs in target region without interrupting production
- Validate: Test connectivity, application functionality
- Click Cleanup test failover to remove test VMs
AZ-305 Exam Focus
AZ-305 tests your ability to design backup + DR solutions that meet stated RTO and RPO requirements. The exam frequently tests understanding of the difference between backup and replication, soft delete vs. immutable vault, and recovery point selection decisions.
Exam Trap
Backups Equal Disaster Recovery: Backups protect against data loss but do NOT provide fast failover. Restoring a VM from backup takes hours. Replication-based approaches (Site Recovery) achieve RTO in under 1 hour. If a scenario has tight RTO requirements (30 minutes, 1 hour), backup alone is insufficient.
Exam Trap
Site Recovery Replicates All VM Configuration: Site Recovery does NOT replicate VM extensions (SQL IaaS Extension, monitoring agents, antivirus). These must be installed post-failover via runbooks in the recovery plan. Forgetting this causes services to fail to start after failover.
Exam Trap
CRR Is Default: Cross-Region Restore is NOT enabled by default on Recovery Services vaults. It must be explicitly enabled and increases storage cost. Enable CRR only for mission-critical workloads requiring regional failover capability from backup data.
Exam Trap
Soft Delete Prevents All Loss: Soft delete retains deleted backups for 14–180 days but does NOT prevent permanent deletion after the window expires. Immutable vault adds a time-based lock that prevents modification or deletion during a configured retention period — required for ransomware protection scenarios.
Exam Tip
Recovery Point Selection for Failover: "Latest" recovery point has lowest RPO but higher RTO (must process pending replication data). "Latest processed" has slightly higher RPO but significantly lower RTO (uses already-processed data). In unplanned outages where minimizing downtime is critical, "Latest processed" is the better choice.
Must Memorize
App-Consistent vs. Crash-Consistent: Use app-consistent snapshots for databases (SQL Server, SAP HANA) to ensure consistent database state. Use crash-consistent for stateless VMs and filesystems. App-consistent uses VSS on Windows (VSS must be enabled) and is limited to every 1+ hours. Crash-consistent is automatic every 5 minutes.
Question — click to flip
Q: What is the difference between RTO and RPO?
Question — click to flip
Q: When is Azure Backup alone insufficient for disaster recovery?
Question — click to flip
Q: What is the difference between soft delete and immutable vault in Recovery Services?
Question — click to flip
Q: In a Site Recovery recovery plan, why is the database group placed first?
Question — click to flip
Q: What is Site Recovery test failover and why is it important?
Question — click to flip
Q: Why does Site Recovery require runbooks or manual steps to install VM extensions after failover?