Python Code Samples - Module 3: Index Management¶

This directory contains focused Python examples for index management operations in Azure AI Search. Each file demonstrates a specific aspect of index management with clear, production-ready code.

📁 File Structure¶

python/
├── README.md                           # This file
├── 01_create_basic_index.py           # Basic index creation
├── 02_schema_design.py                # Advanced schema design patterns
├── 03_data_ingestion.py               # Document upload strategies
├── 04_index_operations.py             # Index management operations
├── 05_performance_optimization.py     # Performance tuning techniques
└── 06_error_handling.py               # Robust error handling patterns

🚀 Quick Start¶

Prerequisites¶

Environment Setup:

# Set environment variables
export AZURE_SEARCH_SERVICE_ENDPOINT="https://your-service.search.windows.net"
export AZURE_SEARCH_ADMIN_KEY="your-admin-api-key"

# Install dependencies
pip install azure-search-documents python-dotenv

Run Prerequisites Setup:
```
cd ../
python setup_prerequisites.py
```

Running Examples¶

# Basic index creation
python 01_create_basic_index.py

# Advanced schema design
python 02_schema_design.py

# Data ingestion strategies
python 03_data_ingestion.py

# Continue with other examples...

📚 Example Categories¶

1. Basic Index Creation (`01_create_basic_index.py`)¶

Focus: Fundamental index creation concepts

What you'll learn: - Creating SearchIndexClient - Defining field types and attributes - Basic index creation and validation - Testing index functionality

Key concepts:

# Basic field definition
SimpleField(name="id", type=SearchFieldDataType.String, key=True)
SearchableField(name="title", type=SearchFieldDataType.String)

# Index creation
index = SearchIndex(name="my-index", fields=fields)
result = index_client.create_index(index)

2. Schema Design (`02_schema_design.py`)¶

Focus: Advanced schema design patterns and best practices

What you'll learn: - Field type selection strategies - Attribute optimization for performance - Complex field structures - Schema design patterns for different use cases

Key concepts:

# Complex field with nested structure
ComplexField(name="author", fields=[
    SimpleField(name="name", type=SearchFieldDataType.String),
    SimpleField(name="email", type=SearchFieldDataType.String)
])

# Collection fields
SimpleField(name="tags", type=SearchFieldDataType.Collection(SearchFieldDataType.String))

3. Data Ingestion (`03_data_ingestion.py`)¶

Focus: Efficient document upload and management strategies

What you'll learn: - Single vs batch document uploads - Large dataset handling techniques - Upload optimization strategies - Progress tracking and monitoring

Key concepts:

# Batch upload with error handling
result = search_client.upload_documents(documents)
successful = sum(1 for r in result if r.succeeded)

# Large dataset processing
for batch in create_batches(large_dataset, batch_size=100):
    upload_batch(batch)

4. Index Operations (`04_index_operations.py`)¶

Focus: Index lifecycle management operations

What you'll learn: - Listing and inspecting indexes - Getting index statistics - Updating index schemas - Index deletion and cleanup

Key concepts:

# List indexes
indexes = list(index_client.list_indexes())

# Get index details
index = index_client.get_index("my-index")

# Update schema
updated_index = SearchIndex(name="my-index", fields=new_fields)
index_client.create_or_update_index(updated_index)

5. Performance Optimization (`05_performance_optimization.py`)¶

Focus: Performance tuning and optimization techniques

What you'll learn: - Batch size optimization - Parallel upload strategies - Memory management techniques - Performance monitoring and metrics

Key concepts:

# Custom analyzer
custom_analyzer = CustomAnalyzer(
    name="my_analyzer",
    tokenizer_name="standard",
    token_filters=["lowercase", "stop"]
)

# Scoring profile
scoring_profile = ScoringProfile(
    name="boost_recent",
    text_weights=TextWeights(weights={"title": 2.0})
)

6. Error Handling (`06_error_handling.py`)¶

Focus: Robust error handling and recovery patterns

What you'll learn: - Common error scenarios and solutions - Retry strategies with exponential backoff - Partial failure handling - Graceful degradation techniques

Key concepts:

# Optimal batch sizing
def get_optimal_batch_size(document_size):
    if document_size < 1024:  # 1KB
        return 1000
    elif document_size < 10240:  # 10KB
        return 500
    else:
        return 100

# Parallel processing
with ThreadPoolExecutor(max_workers=4) as executor:
    futures = [executor.submit(upload_batch, batch) for batch in batches]

Key concepts:

# Retry with exponential backoff
@retry(max_attempts=3, backoff_factor=2.0)
def upload_with_retry(documents):
    return search_client.upload_documents(documents)

# Comprehensive error handling
try:
    result = upload_documents(docs)
except HttpResponseError as e:
    handle_http_error(e)
except Exception as e:
    handle_general_error(e)

🎯 Learning Paths¶

1. Beginner Path (Sequential)¶

Follow the numbered sequence for structured learning:

python 01_create_basic_index.py      # Start here
python 02_schema_design.py           # Learn schema design
python 03_data_ingestion.py          # Master data upload
python 04_index_operations.py        # Index management
# Continue through all examples...

2. Topic-Focused Path¶

Jump to specific areas of interest:

# Focus on performance
python 05_performance_optimization.py

# Focus on error handling
python 06_error_handling.py

# Focus on index operations
python 04_index_operations.py

3. Problem-Solving Path¶

Start with common scenarios:

# "I need to create an index"
python 01_create_basic_index.py

# "I need to upload lots of data"
python 03_data_ingestion.py

# "My uploads are failing"
python 06_error_handling.py

🔧 Code Features¶

Production-Ready Patterns¶

✅ Comprehensive error handling
✅ Input validation and sanitization
✅ Proper resource cleanup
✅ Logging and monitoring integration

Performance Optimizations¶

✅ Efficient batch processing
✅ Memory-conscious data handling
✅ Connection pooling and reuse
✅ Parallel processing where appropriate

Best Practices¶

✅ Environment variable configuration
✅ Secure credential management
✅ Clear code structure and documentation
✅ Modular, reusable functions

🚨 Common Issues and Solutions¶

Issue 1: Import Errors¶

# Problem: Cannot import azure.search.documents
# Solution: Install the correct package
pip install azure-search-documents==11.4.0

Issue 2: Authentication Errors¶

# Problem: 403 Forbidden errors
# Solution: Use admin key for index operations
credential = AzureKeyCredential(admin_key)  # Not query key!

Issue 3: Index Already Exists¶

# Problem: Index creation fails because it exists
# Solution: Use create_or_update_index
result = index_client.create_or_update_index(index)  # Safe

Issue 4: Document Upload Failures¶

# Problem: Some documents fail to upload
# Solution: Check individual results
for result in upload_results:
    if not result.succeeded:
        print(f"Failed: {result.key} - {result.error_message}")

💡 Tips for Success¶

Development Workflow¶

Start Simple: Begin with basic examples and add complexity
Test Frequently: Run examples with small datasets first
Handle Errors: Always implement proper error handling
Monitor Performance: Track upload speeds and success rates
Clean Up: Delete test indexes when done

Debugging Techniques¶

Enable Logging: Use Python logging for detailed output
Check Responses: Examine HTTP response codes and messages
Validate Data: Ensure documents match your schema
Test Incrementally: Upload small batches to isolate issues
Use Try-Catch: Wrap operations in appropriate exception handling

Performance Tips¶

Batch Operations: Always use batch uploads for multiple documents
Optimize Batch Size: Adjust based on document size and complexity
Use Parallel Processing: For large datasets, consider parallel uploads
Monitor Resources: Watch memory usage during large operations
Connection Reuse: Reuse clients instead of creating new ones

Module 3 Resources¶

Module 3 Documentation - Complete theory and concepts
Interactive Notebooks - Jupyter notebook examples
C# Examples - .NET implementations
JavaScript Examples - Node.js implementations

External Resources¶

Azure AI Search Python SDK - Official SDK documentation
Azure AI Search REST API - REST API reference
Python Best Practices - Python programming guide

🚀 Next Steps¶

After mastering these Python examples:

✅ Complete All Examples: Work through each file systematically
🔬 Experiment: Modify examples to work with your own data
📝 Practice: Complete the module exercises
🌐 Explore Other Languages: Try C#, JavaScript, or REST examples
🏗️ Build Applications: Apply concepts to real-world projects
📚 Continue Learning: Move to Module 4: Simple Queries and Filters

Ready to master Azure AI Search index management with Python? 🐍✨

Start with 01_create_basic_index.py and begin your journey!

Python Code Samples - Module 3: Index Management¶

📁 File Structure¶

🚀 Quick Start¶

Prerequisites¶

Running Examples¶

📚 Example Categories¶

1. Basic Index Creation (01_create_basic_index.py)¶

2. Schema Design (02_schema_design.py)¶

3. Data Ingestion (03_data_ingestion.py)¶

4. Index Operations (04_index_operations.py)¶

5. Performance Optimization (05_performance_optimization.py)¶

6. Error Handling (06_error_handling.py)¶

🎯 Learning Paths¶

1. Beginner Path (Sequential)¶

2. Topic-Focused Path¶

3. Problem-Solving Path¶

🔧 Code Features¶

Production-Ready Patterns¶

Performance Optimizations¶

Best Practices¶

🚨 Common Issues and Solutions¶

Issue 1: Import Errors¶

Issue 2: Authentication Errors¶

Issue 3: Index Already Exists¶

Issue 4: Document Upload Failures¶

💡 Tips for Success¶

Development Workflow¶

Debugging Techniques¶

Performance Tips¶

🔗 Related Resources¶

Module 3 Resources¶

External Resources¶

🚀 Next Steps¶

1. Basic Index Creation (`01_create_basic_index.py`)¶

2. Schema Design (`02_schema_design.py`)¶

3. Data Ingestion (`03_data_ingestion.py`)¶

4. Index Operations (`04_index_operations.py`)¶

5. Performance Optimization (`05_performance_optimization.py`)¶

6. Error Handling (`06_error_handling.py`)¶