Module 10: Analyzers & Scoring - Code Samples¶

This directory contains comprehensive code samples demonstrating text analysis and scoring techniques in Azure AI Search.

Overview¶

These code samples cover:

Built-in Analyzer Usage: Working with language-specific and specialized analyzers
Custom Analyzer Creation: Building analyzers with tokenizers, filters, and character filters
Scoring Profile Implementation: Creating and testing custom scoring algorithms
Performance Testing: Measuring and optimizing analyzer and scoring performance
Advanced Techniques: N-gram analyzers, phonetic matching, and multi-language support

Sample Categories¶

1. Analyzer Configuration and Testing¶

01_builtin_analyzers: Compare and test built-in analyzers
02_custom_analyzers: Create and configure custom text analysis pipelines
03_analyzer_testing: Comprehensive testing and validation frameworks
04_ngram_autocomplete: N-gram analyzers for autocomplete functionality

2. Scoring Profile Implementation¶

05_basic_scoring: Field weights and basic scoring profiles
06_advanced_scoring: Complex scoring with multiple functions
07_location_scoring: Geographic distance-based scoring
08_performance_optimization: Scoring profile performance tuning

Language Support¶

Code samples are provided in multiple programming languages:

Python: Using azure-search-documents SDK
JavaScript/Node.js: Using @azure/search-documents SDK
C#: Using Azure.Search.Documents NuGet package
REST API: Direct HTTP requests with curl and HTTP files

Prerequisites¶

Before running these samples:

Azure AI Search Service: Standard tier or higher for custom analyzers
Admin API Key: Required for creating indexes and analyzers
Development Environment: Appropriate SDK installed for your language
Sample Data: Test documents for analyzer and scoring validation

Quick Start¶

Python Setup¶

pip install azure-search-documents azure-identity

JavaScript Setup¶

npm install @azure/search-documents @azure/identity

C# Setup¶

dotnet add package Azure.Search.Documents

Configuration¶

Create a configuration file with your Azure AI Search service details:

Python (config.py)¶

SEARCH_SERVICE_NAME = "your-search-service"
SEARCH_ADMIN_KEY = "your-admin-key"
SEARCH_QUERY_KEY = "your-query-key"
SEARCH_INDEX_NAME = "analyzer-test-index"

JavaScript (config.js)¶

module.exports = {
    searchServiceName: "your-search-service",
    adminKey: "your-admin-key",
    queryKey: "your-query-key",
    indexName: "analyzer-test-index"
};

C# (appsettings.json)¶

{
  "SearchServiceName": "your-search-service",
  "SearchAdminKey": "your-admin-key",
  "SearchQueryKey": "your-query-key",
  "SearchIndexName": "analyzer-test-index"
}

Sample Structure¶

Each code sample includes:

Main Implementation: Core functionality demonstration
Configuration: Index schema and analyzer definitions
Test Data: Sample documents for testing
Validation: Methods to verify expected behavior
Documentation: Inline comments and explanations

Running the Samples¶

Individual Samples¶

Each sample can be run independently:

# Python
python 01_builtin_analyzers.py

# JavaScript
node 01_builtin_analyzers.js

# C#
dotnet run 01_BuiltinAnalyzers.cs

Complete Test Suite¶

Run all samples in sequence:

# Python
python run_all_samples.py

# JavaScript
npm run test-all

# C#
dotnet test

Sample Descriptions¶

01_builtin_analyzers¶

Compare standard, English, keyword, and simple analyzers
Test tokenization differences with various text inputs
Demonstrate language-specific analyzer behavior

02_custom_analyzers¶

Create custom analyzers with character filters, tokenizers, and token filters
Implement domain-specific text processing
Test HTML stripping, synonym mapping, and stop word removal

03_analyzer_testing¶

Comprehensive analyzer testing framework
Automated validation of tokenization results
Performance benchmarking and comparison tools

04_ngram_autocomplete¶

Edge n-gram tokenizer for autocomplete functionality
Separate index and search analyzers
Autocomplete query implementation and testing

05_basic_scoring¶

Field weight configuration and testing
Simple scoring profile implementation
Result ranking comparison with and without scoring

06_advanced_scoring¶

Multiple scoring functions (freshness, magnitude, distance)
Function aggregation strategies
Complex business logic implementation

07_location_scoring¶

Geographic distance-based scoring
Location parameter handling
Restaurant/business finder implementation

08_performance_optimization¶

Performance measurement and monitoring
Analyzer and scoring profile optimization
A/B testing framework for configuration comparison

Best Practices Demonstrated¶

Analyzer Design¶

Start with built-in analyzers before creating custom ones
Use appropriate analyzers for different field types
Test thoroughly with representative data
Monitor performance impact

Scoring Profile Design¶

Balance field weights appropriately
Use scoring functions judiciously
Test with real user queries
Monitor relevance metrics

Performance Optimization¶

Measure baseline performance
Use separate index/search analyzers when beneficial
Apply complex analysis selectively
Implement caching strategies

Troubleshooting¶

Common Issues¶

Analyzer Not Found: Ensure analyzer is defined before field reference
Invalid Tokens: Use Analyze API to debug tokenization
Poor Performance: Simplify analyzers or use selective application
Scoring Not Applied: Verify scoring profile parameter in queries

Debugging Tools¶

Analyze API: Test text processing step by step
Performance Monitoring: Measure indexing and query performance
Validation Scripts: Automated testing of expected behavior
Logging: Detailed operation logging for troubleshooting

Additional Resources¶

Contributing¶

When adding new samples:

Follow the established naming convention
Include comprehensive documentation
Add validation and error handling
Test with multiple data scenarios
Update this README with sample descriptions

These code samples provide practical, hands-on experience with text analysis and scoring in Azure AI Search, demonstrating real-world implementation patterns and best practices.