Exercise 3: Basic Scoring Implementation¶
📋 Exercise Details¶
- Difficulty: Beginner
- Duration: 60-90 minutes
- Skills: Scoring profiles, field weights, magnitude functions, business logic
🎯 Objective¶
Learn to implement and test basic scoring profiles in Azure AI Search by creating field weight configurations and simple scoring functions that align with business objectives and improve search relevance.
📚 Prerequisites¶
- Completed Exercises 1-2
- Understanding of search relevance concepts
- Basic knowledge of business metrics (ratings, popularity, recency)
- Familiarity with JSON configuration syntax
🛠️ Instructions¶
Step 1: Design Scoring Requirements¶
Define business requirements for your scoring system:
Business Scenarios¶
- Content Discovery Platform: Boost high-quality, recent content
- E-commerce Product Search: Prioritize popular, well-rated products
- Knowledge Base Search: Emphasize authoritative, frequently accessed articles
- News/Blog Platform: Balance relevance with freshness and engagement
Scoring Factors¶
- Content Quality: Ratings, reviews, expert validation
- Popularity: View counts, downloads, shares
- Freshness: Publication date, last updated
- Authority: Author reputation, source credibility
Step 2: Create Scoring Profile Index¶
Design an index schema with scoring-relevant fields:
{
"name": "content-scoring-demo",
"fields": [
{
"name": "id",
"type": "Edm.String",
"key": true,
"searchable": false
},
{
"name": "title",
"type": "Edm.String",
"searchable": true,
"analyzer": "en.microsoft"
},
{
"name": "content",
"type": "Edm.String",
"searchable": true,
"analyzer": "en.microsoft"
},
{
"name": "category",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"facetable": true
},
{
"name": "author",
"type": "Edm.String",
"searchable": true,
"filterable": true
},
{
"name": "publishDate",
"type": "Edm.DateTimeOffset",
"filterable": true,
"sortable": true
},
{
"name": "lastModified",
"type": "Edm.DateTimeOffset",
"filterable": true,
"sortable": true
},
{
"name": "rating",
"type": "Edm.Double",
"filterable": true,
"sortable": true,
"facetable": true
},
{
"name": "viewCount",
"type": "Edm.Int32",
"filterable": true,
"sortable": true
},
{
"name": "likeCount",
"type": "Edm.Int32",
"filterable": true,
"sortable": true
},
{
"name": "commentCount",
"type": "Edm.Int32",
"filterable": true,
"sortable": true
},
{
"name": "difficulty",
"type": "Edm.String",
"filterable": true,
"facetable": true
},
{
"name": "tags",
"type": "Collection(Edm.String)",
"searchable": true,
"filterable": true,
"facetable": true
}
],
"scoringProfiles": [
{
"name": "content_quality",
"text": {
"weights": {
"title": 4.0,
"content": 1.0,
"category": 2.0,
"author": 1.5,
"tags": 2.5
}
},
"functions": [
{
"type": "magnitude",
"fieldName": "rating",
"boost": 2.5,
"interpolation": "linear",
"magnitude": {
"boostingRangeStart": 1.0,
"boostingRangeEnd": 5.0,
"constantBoostBeyondRange": true
}
}
],
"functionAggregation": "sum"
},
{
"name": "popularity_boost",
"text": {
"weights": {
"title": 3.0,
"content": 1.0,
"category": 2.0
}
},
"functions": [
{
"type": "magnitude",
"fieldName": "viewCount",
"boost": 1.8,
"interpolation": "logarithmic",
"magnitude": {
"boostingRangeStart": 1,
"boostingRangeEnd": 100000,
"constantBoostBeyondRange": true
}
},
{
"type": "magnitude",
"fieldName": "likeCount",
"boost": 1.5,
"interpolation": "linear",
"magnitude": {
"boostingRangeStart": 0,
"boostingRangeEnd": 1000,
"constantBoostBeyondRange": true
}
}
],
"functionAggregation": "sum"
},
{
"name": "freshness_priority",
"text": {
"weights": {
"title": 3.0,
"content": 1.0,
"category": 1.5
}
},
"functions": [
{
"type": "freshness",
"fieldName": "publishDate",
"boost": 2.0,
"interpolation": "linear",
"freshness": {
"boostingDuration": "P30D"
}
},
{
"type": "freshness",
"fieldName": "lastModified",
"boost": 1.3,
"interpolation": "linear",
"freshness": {
"boostingDuration": "P7D"
}
}
],
"functionAggregation": "sum"
},
{
"name": "balanced_relevance",
"text": {
"weights": {
"title": 4.0,
"content": 1.0,
"category": 2.0,
"author": 1.5,
"tags": 2.0
}
},
"functions": [
{
"type": "magnitude",
"fieldName": "rating",
"boost": 1.8,
"interpolation": "linear",
"magnitude": {
"boostingRangeStart": 3.0,
"boostingRangeEnd": 5.0,
"constantBoostBeyondRange": true
}
},
{
"type": "magnitude",
"fieldName": "viewCount",
"boost": 1.4,
"interpolation": "logarithmic",
"magnitude": {
"boostingRangeStart": 10,
"boostingRangeEnd": 10000,
"constantBoostBeyondRange": true
}
},
{
"type": "freshness",
"fieldName": "publishDate",
"boost": 1.2,
"interpolation": "linear",
"freshness": {
"boostingDuration": "P90D"
}
}
],
"functionAggregation": "sum"
}
]
}
Step 3: Generate Diverse Test Data¶
Create test documents with varied scoring characteristics:
from datetime import datetime, timedelta
import random
def generate_test_documents(count=20):
"""Generate diverse test documents for scoring validation."""
categories = ["Technology", "Science", "Business", "Education", "Health", "Entertainment"]
authors = ["Dr. Sarah Johnson", "Prof. Michael Chen", "Emily Rodriguez", "James Wilson", "Lisa Thompson"]
difficulties = ["Beginner", "Intermediate", "Advanced", "Expert"]
base_date = datetime.now()
documents = []
for i in range(1, count + 1):
# Vary publication dates
days_ago = random.randint(1, 365)
publish_date = base_date - timedelta(days=days_ago)
# Vary last modified (some content updated recently)
if random.random() < 0.3: # 30% chance of recent update
last_modified = base_date - timedelta(days=random.randint(1, 30))
else:
last_modified = publish_date + timedelta(days=random.randint(1, 10))
# Generate realistic metrics
base_views = random.randint(50, 50000)
rating = round(random.uniform(2.5, 5.0), 1)
like_count = int(base_views * random.uniform(0.01, 0.1))
comment_count = int(base_views * random.uniform(0.005, 0.05))
doc = {
"id": str(i),
"title": f"Article {i}: {random.choice(['Advanced', 'Introduction to', 'Complete Guide to', 'Best Practices for'])} {random.choice(['Machine Learning', 'Data Science', 'Cloud Computing', 'Web Development', 'Artificial Intelligence'])}",
"content": f"This is a comprehensive article about {random.choice(categories).lower()} topics. It covers fundamental concepts, practical applications, and real-world examples. The content is designed for {random.choice(difficulties).lower()} level readers.",
"category": random.choice(categories),
"author": random.choice(authors),
"publishDate": publish_date.isoformat() + "Z",
"lastModified": last_modified.isoformat() + "Z",
"rating": rating,
"viewCount": base_views,
"likeCount": like_count,
"commentCount": comment_count,
"difficulty": random.choice(difficulties),
"tags": random.sample(["tutorial", "guide", "reference", "example", "best-practices", "advanced", "beginner"], k=random.randint(2, 4))
}
documents.append(doc)
return documents
# Generate test data
test_documents = generate_test_documents(25)
Step 4: Implement Scoring Profile Testing¶
Create a comprehensive testing framework:
class ScoringProfileTester:
def __init__(self, search_client, index_name):
self.search_client = search_client
self.index_name = index_name
def test_scoring_profile(self, query, profile_name=None, top=10):
"""Test a specific scoring profile with a query."""
search_params = {
"search_text": query,
"top": top,
"include_total_count": True,
"select": ["id", "title", "category", "author", "rating", "viewCount", "publishDate"]
}
if profile_name:
search_params["scoring_profile"] = profile_name
try:
results = list(self.search_client.search(**search_params))
return {
"profile": profile_name or "default",
"query": query,
"results": results,
"count": len(results)
}
except Exception as e:
return {
"profile": profile_name or "default",
"query": query,
"error": str(e),
"results": [],
"count": 0
}
def compare_scoring_profiles(self, query, profiles, top=5):
"""Compare multiple scoring profiles with the same query."""
print(f"\n🔍 Query: '{query}'")
print("=" * 60)
all_results = {}
for profile in profiles:
result = self.test_scoring_profile(query, profile, top)
all_results[profile or "default"] = result
print(f"\n📊 {profile or 'Default'} Scoring:")
if result.get("error"):
print(f" ❌ Error: {result['error']}")
continue
if not result["results"]:
print(" No results found")
continue
for i, doc in enumerate(result["results"], 1):
title = doc.get("title", "No title")
score = doc.get("@search.score", 0)
rating = doc.get("rating", 0)
views = doc.get("viewCount", 0)
category = doc.get("category", "Unknown")
print(f" {i}. {title[:50]}...")
print(f" Score: {score:.3f}, Rating: {rating}, Views: {views}, Category: {category}")
return all_results
def analyze_scoring_impact(self, query, profiles):
"""Analyze the impact of different scoring profiles."""
results = self.compare_scoring_profiles(query, profiles)
print(f"\n📈 Scoring Impact Analysis:")
print("-" * 40)
# Compare result ordering
default_ids = [doc.get("id") for doc in results.get("default", {}).get("results", [])]
for profile_name, result in results.items():
if profile_name == "default" or result.get("error"):
continue
profile_ids = [doc.get("id") for doc in result.get("results", [])]
if profile_ids != default_ids:
print(f" ✅ {profile_name}: Changed result ordering")
# Calculate position changes
position_changes = []
for doc_id in profile_ids[:5]: # Top 5 results
default_pos = default_ids.index(doc_id) + 1 if doc_id in default_ids else None
profile_pos = profile_ids.index(doc_id) + 1
if default_pos:
change = default_pos - profile_pos
if change != 0:
position_changes.append(f"ID {doc_id}: {change:+d} positions")
if position_changes:
print(f" Position changes: {', '.join(position_changes)}")
else:
print(f" ⚠️ {profile_name}: No change in result ordering")
return results
def validate_business_logic(self, test_cases):
"""Validate that scoring profiles implement business logic correctly."""
print(f"\n✅ Business Logic Validation")
print("=" * 60)
passed = 0
failed = 0
for test_case in test_cases:
query = test_case["query"]
profile = test_case["profile"]
expected_behavior = test_case["expected_behavior"]
validation_func = test_case.get("validation_function")
print(f"\n📋 Testing: {query} with {profile}")
print(f"Expected: {expected_behavior}")
result = self.test_scoring_profile(query, profile, top=10)
if result.get("error"):
print(f" ❌ FAIL: {result['error']}")
failed += 1
continue
if validation_func:
success = validation_func(result["results"])
if success:
print(f" ✅ PASS: Business logic validated")
passed += 1
else:
print(f" ❌ FAIL: Business logic not working as expected")
failed += 1
else:
# Basic validation: check if results are returned
if result["results"]:
print(f" ✅ PASS: Results returned")
passed += 1
else:
print(f" ❌ FAIL: No results returned")
failed += 1
print(f"\nValidation Summary: {passed} passed, {failed} failed")
return passed, failed
# Example validation functions
def validate_rating_boost(results):
"""Validate that higher-rated content appears first."""
if len(results) < 2:
return True
ratings = [doc.get("rating", 0) for doc in results[:3]]
return ratings == sorted(ratings, reverse=True)
def validate_popularity_boost(results):
"""Validate that popular content (high view count) is boosted."""
if len(results) < 2:
return True
view_counts = [doc.get("viewCount", 0) for doc in results[:3]]
# Check if generally trending towards higher view counts
return sum(view_counts[:2]) > sum(view_counts[1:3]) if len(view_counts) >= 3 else True
def validate_freshness_boost(results):
"""Validate that recent content is boosted."""
if len(results) < 2:
return True
from datetime import datetime
dates = []
for doc in results[:3]:
date_str = doc.get("publishDate", "")
if date_str:
try:
date = datetime.fromisoformat(date_str.replace('Z', '+00:00'))
dates.append(date)
except:
continue
if len(dates) < 2:
return True
# Check if generally trending towards more recent dates
return dates[0] >= dates[-1]
Step 5: Comprehensive Testing Scenarios¶
Create test scenarios that validate business requirements:
def run_comprehensive_scoring_tests():
"""Run comprehensive scoring profile tests."""
# Initialize tester (assuming search_client is configured)
tester = ScoringProfileTester(search_client, "content-scoring-demo")
# Test queries representing different search intents
test_queries = [
"machine learning tutorial",
"data science guide",
"artificial intelligence",
"web development",
"cloud computing"
]
# Scoring profiles to test
profiles = [None, "content_quality", "popularity_boost", "freshness_priority", "balanced_relevance"]
print("🚀 Comprehensive Scoring Profile Testing")
print("=" * 60)
# Test each query with all profiles
for query in test_queries:
results = tester.analyze_scoring_impact(query, profiles)
# Brief pause between queries
time.sleep(2)
# Business logic validation
validation_test_cases = [
{
"query": "machine learning",
"profile": "content_quality",
"expected_behavior": "Higher-rated content should rank higher",
"validation_function": validate_rating_boost
},
{
"query": "data science",
"profile": "popularity_boost",
"expected_behavior": "Popular content should be boosted",
"validation_function": validate_popularity_boost
},
{
"query": "artificial intelligence",
"profile": "freshness_priority",
"expected_behavior": "Recent content should rank higher",
"validation_function": validate_freshness_boost
},
{
"query": "web development",
"profile": "balanced_relevance",
"expected_behavior": "Balanced scoring should work",
"validation_function": None # Basic validation only
}
]
passed, failed = tester.validate_business_logic(validation_test_cases)
print(f"\n🎯 Overall Test Results:")
print(f"✅ Passed: {passed}")
print(f"❌ Failed: {failed}")
print(f"Success Rate: {passed/(passed+failed)*100:.1f}%" if (passed+failed) > 0 else "No tests run")
# Run the tests
run_comprehensive_scoring_tests()
✅ Validation¶
Expected Outcomes¶
Document your findings for each scoring profile:
- Content Quality Profile
- Higher-rated content ranks higher
- Rating boost function works correctly
-
Field weights prioritize title and category
-
Popularity Boost Profile
- High view count content gets boosted
- Logarithmic interpolation smooths extreme values
-
Like count provides additional signal
-
Freshness Priority Profile
- Recent content ranks higher
- Recent updates also boost relevance
-
Freshness functions work as expected
-
Balanced Relevance Profile
- Combines multiple factors effectively
- No single factor dominates results
- Provides good overall relevance
Performance Metrics¶
Measure and document:
def measure_scoring_performance():
"""Measure performance impact of scoring profiles."""
import time
query = "machine learning"
profiles = [None, "content_quality", "popularity_boost", "balanced_relevance"]
iterations = 10
print("⏱️ Performance Impact Analysis")
print("=" * 40)
for profile in profiles:
latencies = []
for _ in range(iterations):
start_time = time.time()
result = tester.test_scoring_profile(query, profile, top=20)
end_time = time.time()
latencies.append((end_time - start_time) * 1000) # Convert to ms
avg_latency = sum(latencies) / len(latencies)
min_latency = min(latencies)
max_latency = max(latencies)
profile_name = profile or "Default"
print(f"{profile_name}:")
print(f" Average: {avg_latency:.2f}ms")
print(f" Min: {min_latency:.2f}ms")
print(f" Max: {max_latency:.2f}ms")
measure_scoring_performance()
Validation Checklist¶
- [ ] Index created with scoring profiles successfully
- [ ] Test documents uploaded and indexed
- [ ] All scoring profiles return results
- [ ] Field weights affect result ordering
- [ ] Magnitude functions boost appropriate content
- [ ] Freshness functions prioritize recent content
- [ ] Business logic validation passes
- [ ] Performance impact is acceptable
🚀 Extensions¶
Extension 1: A/B Testing Framework¶
Implement statistical A/B testing to compare scoring profiles:
def ab_test_scoring_profiles(profile_a, profile_b, test_queries, sample_size=100):
"""Run A/B test comparing two scoring profiles."""
import random
from scipy import stats
results_a = []
results_b = []
for _ in range(sample_size):
query = random.choice(test_queries)
# Test profile A
result_a = tester.test_scoring_profile(query, profile_a, top=5)
if result_a["results"]:
avg_score_a = sum(doc.get("@search.score", 0) for doc in result_a["results"]) / len(result_a["results"])
results_a.append(avg_score_a)
# Test profile B
result_b = tester.test_scoring_profile(query, profile_b, top=5)
if result_b["results"]:
avg_score_b = sum(doc.get("@search.score", 0) for doc in result_b["results"]) / len(result_b["results"])
results_b.append(avg_score_b)
# Statistical analysis
if results_a and results_b:
t_stat, p_value = stats.ttest_ind(results_a, results_b)
print(f"A/B Test Results:")
print(f"Profile A ({profile_a}): {len(results_a)} samples, avg score: {sum(results_a)/len(results_a):.3f}")
print(f"Profile B ({profile_b}): {len(results_b)} samples, avg score: {sum(results_b)/len(results_b):.3f}")
print(f"T-statistic: {t_stat:.3f}")
print(f"P-value: {p_value:.3f}")
print(f"Significant difference: {'Yes' if p_value < 0.05 else 'No'}")
Extension 2: Dynamic Scoring¶
Implement dynamic scoring based on user context:
def dynamic_scoring_profile(user_context):
"""Generate scoring profile based on user context."""
base_weights = {
"title": 3.0,
"content": 1.0,
"category": 2.0
}
# Adjust weights based on user preferences
if user_context.get("prefers_recent_content"):
# Boost freshness for users who prefer recent content
freshness_boost = 2.0
else:
freshness_boost = 1.0
if user_context.get("experience_level") == "beginner":
# Boost highly-rated content for beginners
rating_boost = 2.5
else:
rating_boost = 1.5
return {
"text_weights": base_weights,
"freshness_boost": freshness_boost,
"rating_boost": rating_boost
}
Extension 3: Machine Learning Integration¶
Use ML to optimize scoring weights:
def optimize_scoring_weights(click_data, conversion_data):
"""Use machine learning to optimize scoring profile weights."""
from sklearn.linear_model import LinearRegression
import numpy as np
# Features: field matches, rating, view count, freshness
# Target: click-through rate or conversion rate
features = []
targets = []
for interaction in click_data:
feature_vector = [
interaction["title_match_score"],
interaction["content_match_score"],
interaction["rating"],
interaction["view_count"],
interaction["days_since_publish"]
]
features.append(feature_vector)
targets.append(interaction["clicked"])
# Train model
model = LinearRegression()
model.fit(np.array(features), np.array(targets))
# Extract optimized weights
coefficients = model.coef_
optimized_weights = {
"title": max(1.0, coefficients[0] * 4.0),
"content": 1.0, # Base weight
"rating_boost": max(1.0, coefficients[2] * 2.0),
"popularity_boost": max(1.0, coefficients[3] * 1.5)
}
return optimized_weights
💡 Solutions¶
Key Implementation Insights¶
- Field Weight Strategy:
- Title fields typically get highest weights (3-5x)
- Content gets base weight (1.0)
- Category and tags get medium weights (1.5-2.5x)
-
Author/metadata gets lower weights (1.0-1.5x)
-
Scoring Function Design:
- Use linear interpolation for ratings (clear scale)
- Use logarithmic interpolation for view counts (wide range)
- Set appropriate boosting ranges based on data distribution
-
Combine functions with sum aggregation for additive effects
-
Business Logic Alignment:
- Quality-focused: Emphasize ratings and expert content
- Popularity-focused: Boost view counts and engagement
- Freshness-focused: Prioritize recent and updated content
- Balanced: Combine multiple factors with moderate weights
Common Pitfalls¶
- Over-boosting: Using extreme boost values that dominate relevance
- Narrow ranges: Setting boosting ranges too narrow for your data
- Ignoring performance: Complex scoring profiles can impact query latency
- Static configuration: Not updating profiles based on user behavior
Best Practices¶
- Start with simple field weights before adding functions
- Test with real data and user queries
- Monitor performance impact of scoring profiles
- Use A/B testing to validate improvements
- Regularly review and update based on analytics
Next Exercise: Autocomplete System - Implement n-gram analyzers for autocomplete functionality.