Python Examples - Module 7: Pagination & Result Shaping¶
Overview¶
This directory contains comprehensive Python examples demonstrating pagination and result shaping techniques in Azure AI Search using the azure-search-documents SDK.
Prerequisites¶
Python Environment¶
# Python 3.8 or higher
python --version
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install required packages
pip install azure-search-documents
pip install python-dotenv
pip install pandas # For data analysis examples
pip install matplotlib # For visualization examples
Environment Configuration¶
Create a .env file in this directory:
SEARCH_ENDPOINT=https://your-search-service.search.windows.net
SEARCH_API_KEY=your-api-key
INDEX_NAME=hotels-sample
Azure AI Search Setup¶
- Active Azure AI Search service
- Sample data index (hotels-sample recommended)
- Valid API keys and endpoint URLs
Examples Overview¶
Core Pagination Examples¶
- 01_basic_pagination.py - Skip/top pagination fundamentals with error handling
- 02_field_selection.py - Field selection optimization and context-based strategies
- 03_hit_highlighting.py - Hit highlighting implementation with custom tags
- 04_result_counting.py - Smart counting strategies with caching
- 05_range_pagination.py - Range-based pagination for large datasets
Advanced Examples¶
- 06_search_after_pattern.py - Search after pattern for deep pagination
- 07_advanced_shaping.py - Combining multiple result shaping techniques
- 08_performance_optimization.py - Production-ready pagination with monitoring
Quick Start¶
1. Basic Usage¶
from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential
import os
from dotenv import load_dotenv
# Load configuration
load_dotenv()
# Initialize client
client = SearchClient(
endpoint=os.getenv('SEARCH_ENDPOINT'),
index_name=os.getenv('INDEX_NAME'),
credential=AzureKeyCredential(os.getenv('SEARCH_API_KEY'))
)
# Basic pagination
results = client.search(
search_text="luxury",
skip=0,
top=10,
include_total_count=True
)
for result in results:
print(f"Hotel: {result['hotelName']}, Rating: {result.get('rating', 'N/A')}")
2. Running Examples¶
# Run individual examples
python 01_basic_pagination.py
python 02_field_selection.py
# Run all examples
python run_all_examples.py
Example Details¶
01_basic_pagination.py¶
Features: - Skip/top pagination implementation - Page navigation (first, next, previous, last) - Error handling and validation - Performance monitoring - Pagination state management
Key Classes:
- BasicPaginator: Core pagination functionality
- PaginationState: State management
- PaginationMetrics: Performance tracking
02_field_selection.py¶
Features: - Field selection optimization - Context-based field selection - Response size analysis - Performance comparison - Dynamic field selection
Key Classes:
- FieldSelector: Field selection management
- FieldSelectionPresets: Common field combinations
- ResponseAnalyzer: Size and performance analysis
03_hit_highlighting.py¶
Features: - Hit highlighting implementation - Custom highlighting tags - Multi-field highlighting - Highlight processing utilities - Performance optimization
Key Classes:
- HitHighlighter: Highlighting functionality
- HighlightProcessor: Result processing
- HighlightUtils: Utility functions
04_result_counting.py¶
Features: - Smart counting strategies - Count caching - Performance optimization - Conditional counting - Count formatting
Key Classes:
- ResultCounter: Counting functionality
- CountCache: Caching implementation
- CountingStrategy: Strategy pattern
05_range_pagination.py¶
Features: - Range-based pagination - Filter-based navigation - State management - Performance optimization - Large dataset handling
Key Classes:
- RangePaginator: Range pagination
- RangeNavigator: Navigation logic
- RangeStateManager: State tracking
Common Patterns¶
Pagination with Error Handling¶
class SafePaginator:
def __init__(self, client, page_size=20, max_retries=3):
self.client = client
self.page_size = page_size
self.max_retries = max_retries
async def search_page(self, query, page_number, retry_count=0):
try:
results = self.client.search(
search_text=query,
skip=page_number * self.page_size,
top=self.page_size
)
return list(results)
except Exception as e:
if retry_count < self.max_retries:
await asyncio.sleep(2 ** retry_count) # Exponential backoff
return await self.search_page(query, page_number, retry_count + 1)
raise e
Field Selection Optimization¶
class OptimizedFieldSelector:
PRESETS = {
'list_view': ['hotelId', 'hotelName', 'rating'],
'detail_view': ['hotelId', 'hotelName', 'description', 'rating', 'location'],
'map_view': ['hotelId', 'hotelName', 'location', 'rating']
}
def search_with_context(self, query, context='list_view', **kwargs):
fields = self.PRESETS.get(context, self.PRESETS['list_view'])
return self.client.search(
search_text=query,
select=fields,
**kwargs
)
Performance Monitoring¶
import time
from functools import wraps
def monitor_performance(func):
@wraps(func)
def wrapper(*args, **kwargs):
start_time = time.time()
try:
result = func(*args, **kwargs)
duration = (time.time() - start_time) * 1000
print(f"{func.__name__} completed in {duration:.2f}ms")
return result
except Exception as e:
duration = (time.time() - start_time) * 1000
print(f"{func.__name__} failed after {duration:.2f}ms: {e}")
raise
return wrapper
@monitor_performance
def search_with_monitoring(client, query, **kwargs):
return list(client.search(search_text=query, **kwargs))
Testing and Validation¶
Unit Tests¶
import unittest
from unittest.mock import Mock, patch
class TestPagination(unittest.TestCase):
def setUp(self):
self.mock_client = Mock()
self.paginator = BasicPaginator(self.mock_client)
def test_basic_pagination(self):
# Mock search results
self.mock_client.search.return_value = [
{'hotelId': '1', 'hotelName': 'Hotel 1'},
{'hotelId': '2', 'hotelName': 'Hotel 2'}
]
result = self.paginator.search_page("test", 0)
self.assertEqual(len(result), 2)
self.mock_client.search.assert_called_once()
if __name__ == '__main__':
unittest.main()
Integration Tests¶
def test_integration():
"""Test with real Azure AI Search service"""
client = SearchClient(endpoint, index_name, credential)
# Test basic search
results = list(client.search("*", top=5))
assert len(results) <= 5
# Test pagination
page1 = list(client.search("*", skip=0, top=5))
page2 = list(client.search("*", skip=5, top=5))
# Ensure different results (assuming enough data)
if len(page1) == 5 and len(page2) > 0:
assert page1[0]['hotelId'] != page2[0]['hotelId']
Performance Optimization¶
Async Support¶
import asyncio
from azure.search.documents.aio import SearchClient as AsyncSearchClient
class AsyncPaginator:
def __init__(self, client, page_size=20):
self.client = client
self.page_size = page_size
async def search_page(self, query, page_number):
async with self.client:
results = self.client.search(
search_text=query,
skip=page_number * self.page_size,
top=self.page_size
)
return [doc async for doc in results]
async def search_multiple_pages(self, query, page_count):
tasks = [
self.search_page(query, page)
for page in range(page_count)
]
return await asyncio.gather(*tasks)
Caching Implementation¶
from functools import lru_cache
import hashlib
import json
class CachedPaginator:
def __init__(self, client, cache_size=128):
self.client = client
self.cache_size = cache_size
def _cache_key(self, query, skip, top, **kwargs):
"""Generate cache key from parameters"""
params = {'query': query, 'skip': skip, 'top': top, **kwargs}
key_str = json.dumps(params, sort_keys=True)
return hashlib.md5(key_str.encode()).hexdigest()
@lru_cache(maxsize=128)
def _cached_search(self, cache_key, query, skip, top, **kwargs):
"""Cached search implementation"""
return list(self.client.search(
search_text=query,
skip=skip,
top=top,
**kwargs
))
def search_page(self, query, page_number, **kwargs):
skip = page_number * self.page_size
cache_key = self._cache_key(query, skip, self.page_size, **kwargs)
return self._cached_search(cache_key, query, skip, self.page_size, **kwargs)
Troubleshooting¶
Common Issues¶
Connection Problems¶
def test_connection():
try:
client = SearchClient(endpoint, index_name, credential)
results = list(client.search("*", top=1))
print("✅ Connection successful")
return True
except Exception as e:
print(f"❌ Connection failed: {e}")
return False
Rate Limiting¶
import time
import random
def handle_rate_limiting(func, max_retries=3):
def wrapper(*args, **kwargs):
for attempt in range(max_retries):
try:
return func(*args, **kwargs)
except Exception as e:
if "429" in str(e) and attempt < max_retries - 1:
wait_time = (2 ** attempt) + random.uniform(0, 1)
time.sleep(wait_time)
continue
raise e
return wrapper
Memory Management¶
def memory_efficient_pagination(client, query, batch_size=100):
"""Generator for memory-efficient pagination"""
skip = 0
while True:
results = list(client.search(
search_text=query,
skip=skip,
top=batch_size
))
if not results:
break
for result in results:
yield result
if len(results) < batch_size:
break
skip += batch_size
Best Practices¶
Code Organization¶
- Use classes for complex pagination logic
- Implement proper error handling
- Add logging for debugging
- Use type hints for better code documentation
- Follow PEP 8 style guidelines
Performance¶
- Use appropriate page sizes (10-50 for UI, up to 1000 for APIs)
- Implement caching for frequently accessed data
- Use async operations for concurrent requests
- Monitor and log performance metrics
- Set reasonable timeouts
Error Handling¶
- Implement retry logic with exponential backoff
- Handle rate limiting gracefully
- Validate input parameters
- Provide meaningful error messages
- Log errors for debugging
Contributing¶
To contribute to these examples: 1. Follow the existing code style 2. Add comprehensive docstrings 3. Include error handling 4. Add unit tests 5. Update documentation
Next Steps¶
After exploring these examples: 1. Try the interactive Jupyter notebooks 2. Implement pagination in your application 3. Explore advanced patterns in other modules 4. Contribute improvements and new examples