Module 10: Analyzers & Scoring - Code Samples¶
This directory contains comprehensive code samples demonstrating text analysis and scoring techniques in Azure AI Search.
Overview¶
These code samples cover:
- Built-in Analyzer Usage: Working with language-specific and specialized analyzers
- Custom Analyzer Creation: Building analyzers with tokenizers, filters, and character filters
- Scoring Profile Implementation: Creating and testing custom scoring algorithms
- Performance Testing: Measuring and optimizing analyzer and scoring performance
- Advanced Techniques: N-gram analyzers, phonetic matching, and multi-language support
Sample Categories¶
1. Analyzer Configuration and Testing¶
- 01_builtin_analyzers: Compare and test built-in analyzers
- 02_custom_analyzers: Create and configure custom text analysis pipelines
- 03_analyzer_testing: Comprehensive testing and validation frameworks
- 04_ngram_autocomplete: N-gram analyzers for autocomplete functionality
2. Scoring Profile Implementation¶
- 05_basic_scoring: Field weights and basic scoring profiles
- 06_advanced_scoring: Complex scoring with multiple functions
- 07_location_scoring: Geographic distance-based scoring
- 08_performance_optimization: Scoring profile performance tuning
Language Support¶
Code samples are provided in multiple programming languages:
- Python: Using
azure-search-documentsSDK - JavaScript/Node.js: Using
@azure/search-documentsSDK - C#: Using
Azure.Search.DocumentsNuGet package - REST API: Direct HTTP requests with curl and HTTP files
Prerequisites¶
Before running these samples:
- Azure AI Search Service: Standard tier or higher for custom analyzers
- Admin API Key: Required for creating indexes and analyzers
- Development Environment: Appropriate SDK installed for your language
- Sample Data: Test documents for analyzer and scoring validation
Quick Start¶
Python Setup¶
JavaScript Setup¶
C# Setup¶
Configuration¶
Create a configuration file with your Azure AI Search service details:
Python (config.py)¶
SEARCH_SERVICE_NAME = "your-search-service"
SEARCH_ADMIN_KEY = "your-admin-key"
SEARCH_QUERY_KEY = "your-query-key"
SEARCH_INDEX_NAME = "analyzer-test-index"
JavaScript (config.js)¶
module.exports = {
searchServiceName: "your-search-service",
adminKey: "your-admin-key",
queryKey: "your-query-key",
indexName: "analyzer-test-index"
};
C# (appsettings.json)¶
{
"SearchServiceName": "your-search-service",
"SearchAdminKey": "your-admin-key",
"SearchQueryKey": "your-query-key",
"SearchIndexName": "analyzer-test-index"
}
Sample Structure¶
Each code sample includes:
- Main Implementation: Core functionality demonstration
- Configuration: Index schema and analyzer definitions
- Test Data: Sample documents for testing
- Validation: Methods to verify expected behavior
- Documentation: Inline comments and explanations
Running the Samples¶
Individual Samples¶
Each sample can be run independently:
# Python
python 01_builtin_analyzers.py
# JavaScript
node 01_builtin_analyzers.js
# C#
dotnet run 01_BuiltinAnalyzers.cs
Complete Test Suite¶
Run all samples in sequence:
Sample Descriptions¶
01_builtin_analyzers¶
- Compare standard, English, keyword, and simple analyzers
- Test tokenization differences with various text inputs
- Demonstrate language-specific analyzer behavior
02_custom_analyzers¶
- Create custom analyzers with character filters, tokenizers, and token filters
- Implement domain-specific text processing
- Test HTML stripping, synonym mapping, and stop word removal
03_analyzer_testing¶
- Comprehensive analyzer testing framework
- Automated validation of tokenization results
- Performance benchmarking and comparison tools
04_ngram_autocomplete¶
- Edge n-gram tokenizer for autocomplete functionality
- Separate index and search analyzers
- Autocomplete query implementation and testing
05_basic_scoring¶
- Field weight configuration and testing
- Simple scoring profile implementation
- Result ranking comparison with and without scoring
06_advanced_scoring¶
- Multiple scoring functions (freshness, magnitude, distance)
- Function aggregation strategies
- Complex business logic implementation
07_location_scoring¶
- Geographic distance-based scoring
- Location parameter handling
- Restaurant/business finder implementation
08_performance_optimization¶
- Performance measurement and monitoring
- Analyzer and scoring profile optimization
- A/B testing framework for configuration comparison
Best Practices Demonstrated¶
Analyzer Design¶
- Start with built-in analyzers before creating custom ones
- Use appropriate analyzers for different field types
- Test thoroughly with representative data
- Monitor performance impact
Scoring Profile Design¶
- Balance field weights appropriately
- Use scoring functions judiciously
- Test with real user queries
- Monitor relevance metrics
Performance Optimization¶
- Measure baseline performance
- Use separate index/search analyzers when beneficial
- Apply complex analysis selectively
- Implement caching strategies
Troubleshooting¶
Common Issues¶
- Analyzer Not Found: Ensure analyzer is defined before field reference
- Invalid Tokens: Use Analyze API to debug tokenization
- Poor Performance: Simplify analyzers or use selective application
- Scoring Not Applied: Verify scoring profile parameter in queries
Debugging Tools¶
- Analyze API: Test text processing step by step
- Performance Monitoring: Measure indexing and query performance
- Validation Scripts: Automated testing of expected behavior
- Logging: Detailed operation logging for troubleshooting
Additional Resources¶
- Azure AI Search Analyzers Documentation
- Scoring Profiles Documentation
- REST API Reference
- SDK Documentation
Contributing¶
When adding new samples:
- Follow the established naming convention
- Include comprehensive documentation
- Add validation and error handling
- Test with multiple data scenarios
- Update this README with sample descriptions
These code samples provide practical, hands-on experience with text analysis and scoring in Azure AI Search, demonstrating real-world implementation patterns and best practices.