AZ-305 Learning Portal
Objective 2.2 40 minhigh prioritycosmos-dbblob-storagedata-lakepartition-keyconsistency-levelsttlra-grslifecycle-management

2.2 — Design Data Storage Solutions for Semi-Structured and Unstructured Data

Design non-relational storage solutions for JSON documents, key-value data, graph structures, and unstructured blobs using Azure Cosmos DB and Azure Blob Storage, selecting appropriate partition keys, consistency levels, throughput models, and access tiers.

Concept — What & Why

Azure Non-Relational Storage Services

Azure Cosmos DBA globally distributed NoSQL database with multi-API support (Core SQL, MongoDB, Cassandra, Table, Gremlin). Provides single-digit millisecond latency globally, 99.999% SLA, configurable consistency levels, and built-in multi-region writes. Supports change feed for real-time data processing.Partition KeyThe property used to distribute data across logical partitions in Cosmos DB. Must be high-cardinality (many unique values), evenly distributed, and aligned with primary access patterns. A poor partition key causes hot partitions and request throttling. Cannot be changed after container creation.Cosmos DB Consistency LevelsTrade-offs between latency and data consistency: Strong (linearizable, slowest), Bounded Staleness (within version/time bounds), Session (default, consistent within a session), Consistent Prefix (ordered), Eventual (fastest, weakest). Session consistency is recommended for most applications.Azure Blob StorageObject storage for unstructured data (files, images, videos, logs). Three access tiers: Hot (frequent access), Cool (infrequent, 30+ day minimum), Archive (rare access, 90+ day minimum, requires rehydration). Lifecycle policies automate tier transitions based on last-modified date.Azure Data Lake Storage Gen2Blob Storage with hierarchical namespace (HNS) enabled. Provides POSIX-compliant ACLs, directory-level operations, and is integrated with analytics engines (Spark, Synapse, HDInsight). Required for big data analytics pipelines.

Service Selection Matrix

Data TypeUse CaseRecommended Service
JSON documentsUser profiles, product catalogsCosmos DB Core SQL or MongoDB API
Key-valueSession state, IoT telemetryCosmos DB Table API or Azure Table Storage
GraphSocial networks, knowledge graphsCosmos DB Gremlin API
Files, images, videosMedia, backups, archivesAzure Blob Storage
Big data analyticsSpark, Hadoop, SynapseAzure Data Lake Storage Gen2
Full-text searchE-commerce discoveryAzure AI Search

Blob Storage Redundancy Options

OptionCopiesProtectionUse Case
LRS3 (same DC)Hardware failureDev/test
ZRS3 (across zones)Zone failureHA within region
GRS6 (3 primary + 3 secondary)Regional failureDR, secondary read-only after failover
RA-GRS6 + readable secondaryRegional failureGeo-distributed reads without failover
GZRS / RA-GZRSZone + geoZone + regionalMaximum resilience
Deep Dive — How It Works

Cosmos DB Design Patterns

Partition Key Selection Criteria

A good partition key must satisfy ALL of the following:

  1. High cardinality — Many unique values (user ID, order ID) prevents hot partitions
  2. Even distribution — Requests spread evenly across logical partitions
  3. Access pattern alignment — Your primary query filter should be the partition key
  4. Immutability — Cannot be changed; choose a property that won't be updated

Anti-patterns: Using region (only 3–5 values), status (only a few states), or timestamp (hot partition on the latest time bucket) creates severe throttling.

Throughput Models Comparison

ModelPricingBest For
Provisioned (Manual)Fixed RU/s, minimum chargePredictable, steady workloads
ServerlessPer-RU consumed, no minimumBursty, intermittent, dev/test
AutoscaleScales between 10%–100% of max RU/sVariable with spikes, production

Decision rule: If traffic is zero for significant periods → Serverless. If traffic spikes unpredictably but there's always baseline traffic → Autoscale. If traffic is stable and predictable → Provisioned.

Blob Storage Access Tier Strategy

Lifecycle policy example: Blob last modified >30 days → move to Cool. Last modified >90 days → move to Archive. Last modified >7 years → delete.

Archive tier limitations:

  • Rehydration takes 1–15 hours (Standard) or up to 1 hour (High Priority)
  • Early deletion penalty if deleted before 90 days
  • Not suitable for data accessed more than 3 times per year

Cosmos DB Consistency Level Selection

ConsistencyLatencyWhen to Choose
StrongHighestFinancial transactions, global inventory
Bounded StalenessHighGaming leaderboards, social feeds
SessionLow (default)Most applications — per-session consistency
Consistent PrefixLowOrder-sensitive event streams
EventualLowestClick tracking, telemetry, non-critical data
Hands-On Lab

Hands-On: Create Cosmos DB Container with Optimal Partition Key

Step 1: Create Cosmos DB Account

  1. Navigate to Azure Cosmos DB > Create
  2. Select API: Core (SQL) for JSON with SQL queries
  3. Configure:
    • Account name: Globally unique
    • Consistency level: Session (default)
    • Geo-redundancy: Enable for HA
    • Multi-region writes: Enable for active-active
  4. Review and create

Step 2: Create Database and Container

  1. Open Cosmos DB account > Data Explorer > New Container
  2. Database ID: Create or select
  3. Container ID: Enter meaningful name (e.g., user-profiles)
  4. Partition key: Enter /userId (high-cardinality property)
  5. Throughput: Autoscale with max 4,000 RU/s (recommended)
  6. Enable TTL if items should auto-expire (e.g., session data)

Step 3: Configure Blob Storage Lifecycle Management

  1. Open Storage Account > Lifecycle management
  2. Click Add a rule > Code view and enter:
    {
      "rules": [{
        "name": "tier-and-delete",
        "type": "Lifecycle",
        "definition": {
          "actions": {
            "baseBlob": {
              "tierToCool": {"daysAfterModificationGreaterThan": 30},
              "tierToArchive": {"daysAfterModificationGreaterThan": 90},
              "delete": {"daysAfterModificationGreaterThan": 2555}
            }
          },
          "filters": {"blobTypes": ["blockBlob"]}
        }
      }]
    }
    
  3. Save and apply — transitions occur automatically

Step 4: Query Cosmos DB with SQL API

  1. Open Data Explorer > select container > New SQL Query
  2. Example queries:
    -- Efficient: filters on partition key
    SELECT * FROM c WHERE c.userId = 'user-123'
    
    -- Cross-partition: avoid in hot paths
    SELECT * FROM c WHERE c.region = 'US'
    
  3. Click Execute Query — note RU consumption for each query
  4. Cross-partition queries consume significantly more RUs
Exam Angle — What AZ-305 Tests

AZ-305 Exam Focus

AZ-305 tests your ability to select the right non-relational service, consistency level, partition key strategy, and blob tier for a given scenario. Partition key selection and the serverless vs. provisioned vs. autoscale decision are the highest-frequency topics.

Exam Trap

Wrong Partition Key Selection: Any property with low cardinality (region, status, tier) causes hot partitions — a few logical partitions receive all requests while others sit idle, causing throttling. Always select high-cardinality properties (user ID, order ID, device ID) as partition keys.

Exam Trap

Cosmos DB Serverless Always Cheaper: Serverless has no minimum cost — but at predictable, continuous workloads with high RU/s consumption, Provisioned or Autoscale is cheaper. Serverless excels for bursty or intermittent workloads with significant idle periods.

Exam Trap

Archive Tier as Primary Storage: Archive tier requires rehydration (1–15 hours) before data is accessible. It is not suitable as primary storage for data accessed more than a few times per year. Use Archive only for compliance archiving or rarely accessed backup data.

Exam Trap

GRS vs. RA-GRS for Geo-Distributed Reads: GRS replicates to a secondary region, but the secondary is only readable after a failover. RA-GRS provides a readable secondary endpoint before failover occurs. Use RA-GRS when applications need to read from secondary regions without failover.

Exam Tip

Session Consistency Is the Sweet Spot: For most applications, Session consistency provides the right balance — each client session sees its own writes immediately (read-your-writes within a session). Strong consistency adds significant latency. Eventual consistency can cause surprising data discrepancies. Default to Session unless there's a specific reason to change it.

Must Memorize

TTL for Auto-Expiry: Use Cosmos DB's Time-to-Live (TTL) setting on a container to automatically delete items after a specified duration. This is distinct from Azure Blob lifecycle policies. TTL is the answer for IoT sensor data, session tokens, or any data with a natural expiration period.

Question — click to flip

Q: What makes a good Cosmos DB partition key?

Question — click to flip

Q: When should you use Cosmos DB Serverless vs. Autoscale throughput?

Question — click to flip

Q: What is the difference between GRS and RA-GRS for Blob Storage?

Question — click to flip

Q: What is Cosmos DB TTL and when should you use it?

Question — click to flip

Q: Which Cosmos DB consistency level is recommended for most applications?

Question — click to flip

Q: When should you use Azure Data Lake Storage Gen2 instead of regular Blob Storage?

Sources & Further Reading