Exercise 3: Basic Scoring Implementation¶

📋 Exercise Details¶

Difficulty: Beginner
Duration: 60-90 minutes
Skills: Scoring profiles, field weights, magnitude functions, business logic

🎯 Objective¶

Learn to implement and test basic scoring profiles in Azure AI Search by creating field weight configurations and simple scoring functions that align with business objectives and improve search relevance.

📚 Prerequisites¶

Completed Exercises 1-2
Understanding of search relevance concepts
Basic knowledge of business metrics (ratings, popularity, recency)
Familiarity with JSON configuration syntax

🛠️ Instructions¶

Step 1: Design Scoring Requirements¶

Define business requirements for your scoring system:

Business Scenarios¶

Content Discovery Platform: Boost high-quality, recent content
E-commerce Product Search: Prioritize popular, well-rated products
Knowledge Base Search: Emphasize authoritative, frequently accessed articles
News/Blog Platform: Balance relevance with freshness and engagement

Scoring Factors¶

Content Quality: Ratings, reviews, expert validation
Popularity: View counts, downloads, shares
Freshness: Publication date, last updated
Authority: Author reputation, source credibility

Step 2: Create Scoring Profile Index¶

Design an index schema with scoring-relevant fields:

{
  "name": "content-scoring-demo",
  "fields": [
    {
      "name": "id",
      "type": "Edm.String",
      "key": true,
      "searchable": false
    },
    {
      "name": "title",
      "type": "Edm.String",
      "searchable": true,
      "analyzer": "en.microsoft"
    },
    {
      "name": "content",
      "type": "Edm.String",
      "searchable": true,
      "analyzer": "en.microsoft"
    },
    {
      "name": "category",
      "type": "Edm.String",
      "searchable": true,
      "filterable": true,
      "facetable": true
    },
    {
      "name": "author",
      "type": "Edm.String",
      "searchable": true,
      "filterable": true
    },
    {
      "name": "publishDate",
      "type": "Edm.DateTimeOffset",
      "filterable": true,
      "sortable": true
    },
    {
      "name": "lastModified",
      "type": "Edm.DateTimeOffset",
      "filterable": true,
      "sortable": true
    },
    {
      "name": "rating",
      "type": "Edm.Double",
      "filterable": true,
      "sortable": true,
      "facetable": true
    },
    {
      "name": "viewCount",
      "type": "Edm.Int32",
      "filterable": true,
      "sortable": true
    },
    {
      "name": "likeCount",
      "type": "Edm.Int32",
      "filterable": true,
      "sortable": true
    },
    {
      "name": "commentCount",
      "type": "Edm.Int32",
      "filterable": true,
      "sortable": true
    },
    {
      "name": "difficulty",
      "type": "Edm.String",
      "filterable": true,
      "facetable": true
    },
    {
      "name": "tags",
      "type": "Collection(Edm.String)",
      "searchable": true,
      "filterable": true,
      "facetable": true
    }
  ],
  "scoringProfiles": [
    {
      "name": "content_quality",
      "text": {
        "weights": {
          "title": 4.0,
          "content": 1.0,
          "category": 2.0,
          "author": 1.5,
          "tags": 2.5
        }
      },
      "functions": [
        {
          "type": "magnitude",
          "fieldName": "rating",
          "boost": 2.5,
          "interpolation": "linear",
          "magnitude": {
            "boostingRangeStart": 1.0,
            "boostingRangeEnd": 5.0,
            "constantBoostBeyondRange": true
          }
        }
      ],
      "functionAggregation": "sum"
    },
    {
      "name": "popularity_boost",
      "text": {
        "weights": {
          "title": 3.0,
          "content": 1.0,
          "category": 2.0
        }
      },
      "functions": [
        {
          "type": "magnitude",
          "fieldName": "viewCount",
          "boost": 1.8,
          "interpolation": "logarithmic",
          "magnitude": {
            "boostingRangeStart": 1,
            "boostingRangeEnd": 100000,
            "constantBoostBeyondRange": true
          }
        },
        {
          "type": "magnitude",
          "fieldName": "likeCount",
          "boost": 1.5,
          "interpolation": "linear",
          "magnitude": {
            "boostingRangeStart": 0,
            "boostingRangeEnd": 1000,
            "constantBoostBeyondRange": true
          }
        }
      ],
      "functionAggregation": "sum"
    },
    {
      "name": "freshness_priority",
      "text": {
        "weights": {
          "title": 3.0,
          "content": 1.0,
          "category": 1.5
        }
      },
      "functions": [
        {
          "type": "freshness",
          "fieldName": "publishDate",
          "boost": 2.0,
          "interpolation": "linear",
          "freshness": {
            "boostingDuration": "P30D"
          }
        },
        {
          "type": "freshness",
          "fieldName": "lastModified",
          "boost": 1.3,
          "interpolation": "linear",
          "freshness": {
            "boostingDuration": "P7D"
          }
        }
      ],
      "functionAggregation": "sum"
    },
    {
      "name": "balanced_relevance",
      "text": {
        "weights": {
          "title": 4.0,
          "content": 1.0,
          "category": 2.0,
          "author": 1.5,
          "tags": 2.0
        }
      },
      "functions": [
        {
          "type": "magnitude",
          "fieldName": "rating",
          "boost": 1.8,
          "interpolation": "linear",
          "magnitude": {
            "boostingRangeStart": 3.0,
            "boostingRangeEnd": 5.0,
            "constantBoostBeyondRange": true
          }
        },
        {
          "type": "magnitude",
          "fieldName": "viewCount",
          "boost": 1.4,
          "interpolation": "logarithmic",
          "magnitude": {
            "boostingRangeStart": 10,
            "boostingRangeEnd": 10000,
            "constantBoostBeyondRange": true
          }
        },
        {
          "type": "freshness",
          "fieldName": "publishDate",
          "boost": 1.2,
          "interpolation": "linear",
          "freshness": {
            "boostingDuration": "P90D"
          }
        }
      ],
      "functionAggregation": "sum"
    }
  ]
}

Step 3: Generate Diverse Test Data¶

Create test documents with varied scoring characteristics:

from datetime import datetime, timedelta
import random

def generate_test_documents(count=20):
    """Generate diverse test documents for scoring validation."""

    categories = ["Technology", "Science", "Business", "Education", "Health", "Entertainment"]
    authors = ["Dr. Sarah Johnson", "Prof. Michael Chen", "Emily Rodriguez", "James Wilson", "Lisa Thompson"]
    difficulties = ["Beginner", "Intermediate", "Advanced", "Expert"]

    base_date = datetime.now()
    documents = []

    for i in range(1, count + 1):
        # Vary publication dates
        days_ago = random.randint(1, 365)
        publish_date = base_date - timedelta(days=days_ago)

        # Vary last modified (some content updated recently)
        if random.random() < 0.3:  # 30% chance of recent update
            last_modified = base_date - timedelta(days=random.randint(1, 30))
        else:
            last_modified = publish_date + timedelta(days=random.randint(1, 10))

        # Generate realistic metrics
        base_views = random.randint(50, 50000)
        rating = round(random.uniform(2.5, 5.0), 1)
        like_count = int(base_views * random.uniform(0.01, 0.1))
        comment_count = int(base_views * random.uniform(0.005, 0.05))

        doc = {
            "id": str(i),
            "title": f"Article {i}: {random.choice(['Advanced', 'Introduction to', 'Complete Guide to', 'Best Practices for'])} {random.choice(['Machine Learning', 'Data Science', 'Cloud Computing', 'Web Development', 'Artificial Intelligence'])}",
            "content": f"This is a comprehensive article about {random.choice(categories).lower()} topics. It covers fundamental concepts, practical applications, and real-world examples. The content is designed for {random.choice(difficulties).lower()} level readers.",
            "category": random.choice(categories),
            "author": random.choice(authors),
            "publishDate": publish_date.isoformat() + "Z",
            "lastModified": last_modified.isoformat() + "Z",
            "rating": rating,
            "viewCount": base_views,
            "likeCount": like_count,
            "commentCount": comment_count,
            "difficulty": random.choice(difficulties),
            "tags": random.sample(["tutorial", "guide", "reference", "example", "best-practices", "advanced", "beginner"], k=random.randint(2, 4))
        }

        documents.append(doc)

    return documents

# Generate test data
test_documents = generate_test_documents(25)

Step 4: Implement Scoring Profile Testing¶

Create a comprehensive testing framework:

class ScoringProfileTester:
    def __init__(self, search_client, index_name):
        self.search_client = search_client
        self.index_name = index_name

    def test_scoring_profile(self, query, profile_name=None, top=10):
        """Test a specific scoring profile with a query."""

        search_params = {
            "search_text": query,
            "top": top,
            "include_total_count": True,
            "select": ["id", "title", "category", "author", "rating", "viewCount", "publishDate"]
        }

        if profile_name:
            search_params["scoring_profile"] = profile_name

        try:
            results = list(self.search_client.search(**search_params))
            return {
                "profile": profile_name or "default",
                "query": query,
                "results": results,
                "count": len(results)
            }
        except Exception as e:
            return {
                "profile": profile_name or "default",
                "query": query,
                "error": str(e),
                "results": [],
                "count": 0
            }

    def compare_scoring_profiles(self, query, profiles, top=5):
        """Compare multiple scoring profiles with the same query."""

        print(f"\n🔍 Query: '{query}'")
        print("=" * 60)

        all_results = {}

        for profile in profiles:
            result = self.test_scoring_profile(query, profile, top)
            all_results[profile or "default"] = result

            print(f"\n📊 {profile or 'Default'} Scoring:")

            if result.get("error"):
                print(f"  ❌ Error: {result['error']}")
                continue

            if not result["results"]:
                print("  No results found")
                continue

            for i, doc in enumerate(result["results"], 1):
                title = doc.get("title", "No title")
                score = doc.get("@search.score", 0)
                rating = doc.get("rating", 0)
                views = doc.get("viewCount", 0)
                category = doc.get("category", "Unknown")

                print(f"  {i}. {title[:50]}...")
                print(f"     Score: {score:.3f}, Rating: {rating}, Views: {views}, Category: {category}")

        return all_results

    def analyze_scoring_impact(self, query, profiles):
        """Analyze the impact of different scoring profiles."""

        results = self.compare_scoring_profiles(query, profiles)

        print(f"\n📈 Scoring Impact Analysis:")
        print("-" * 40)

        # Compare result ordering
        default_ids = [doc.get("id") for doc in results.get("default", {}).get("results", [])]

        for profile_name, result in results.items():
            if profile_name == "default" or result.get("error"):
                continue

            profile_ids = [doc.get("id") for doc in result.get("results", [])]

            if profile_ids != default_ids:
                print(f"  ✅ {profile_name}: Changed result ordering")

                # Calculate position changes
                position_changes = []
                for doc_id in profile_ids[:5]:  # Top 5 results
                    default_pos = default_ids.index(doc_id) + 1 if doc_id in default_ids else None
                    profile_pos = profile_ids.index(doc_id) + 1

                    if default_pos:
                        change = default_pos - profile_pos
                        if change != 0:
                            position_changes.append(f"ID {doc_id}: {change:+d} positions")

                if position_changes:
                    print(f"    Position changes: {', '.join(position_changes)}")
            else:
                print(f"  ⚠️ {profile_name}: No change in result ordering")

        return results

    def validate_business_logic(self, test_cases):
        """Validate that scoring profiles implement business logic correctly."""

        print(f"\n✅ Business Logic Validation")
        print("=" * 60)

        passed = 0
        failed = 0

        for test_case in test_cases:
            query = test_case["query"]
            profile = test_case["profile"]
            expected_behavior = test_case["expected_behavior"]
            validation_func = test_case.get("validation_function")

            print(f"\n📋 Testing: {query} with {profile}")
            print(f"Expected: {expected_behavior}")

            result = self.test_scoring_profile(query, profile, top=10)

            if result.get("error"):
                print(f"  ❌ FAIL: {result['error']}")
                failed += 1
                continue

            if validation_func:
                success = validation_func(result["results"])
                if success:
                    print(f"  ✅ PASS: Business logic validated")
                    passed += 1
                else:
                    print(f"  ❌ FAIL: Business logic not working as expected")
                    failed += 1
            else:
                # Basic validation: check if results are returned
                if result["results"]:
                    print(f"  ✅ PASS: Results returned")
                    passed += 1
                else:
                    print(f"  ❌ FAIL: No results returned")
                    failed += 1

        print(f"\nValidation Summary: {passed} passed, {failed} failed")
        return passed, failed

# Example validation functions
def validate_rating_boost(results):
    """Validate that higher-rated content appears first."""
    if len(results) < 2:
        return True

    ratings = [doc.get("rating", 0) for doc in results[:3]]
    return ratings == sorted(ratings, reverse=True)

def validate_popularity_boost(results):
    """Validate that popular content (high view count) is boosted."""
    if len(results) < 2:
        return True

    view_counts = [doc.get("viewCount", 0) for doc in results[:3]]
    # Check if generally trending towards higher view counts
    return sum(view_counts[:2]) > sum(view_counts[1:3]) if len(view_counts) >= 3 else True

def validate_freshness_boost(results):
    """Validate that recent content is boosted."""
    if len(results) < 2:
        return True

    from datetime import datetime

    dates = []
    for doc in results[:3]:
        date_str = doc.get("publishDate", "")
        if date_str:
            try:
                date = datetime.fromisoformat(date_str.replace('Z', '+00:00'))
                dates.append(date)
            except:
                continue

    if len(dates) < 2:
        return True

    # Check if generally trending towards more recent dates
    return dates[0] >= dates[-1]

Step 5: Comprehensive Testing Scenarios¶

Create test scenarios that validate business requirements:

def run_comprehensive_scoring_tests():
    """Run comprehensive scoring profile tests."""

    # Initialize tester (assuming search_client is configured)
    tester = ScoringProfileTester(search_client, "content-scoring-demo")

    # Test queries representing different search intents
    test_queries = [
        "machine learning tutorial",
        "data science guide",
        "artificial intelligence",
        "web development",
        "cloud computing"
    ]

    # Scoring profiles to test
    profiles = [None, "content_quality", "popularity_boost", "freshness_priority", "balanced_relevance"]

    print("🚀 Comprehensive Scoring Profile Testing")
    print("=" * 60)

    # Test each query with all profiles
    for query in test_queries:
        results = tester.analyze_scoring_impact(query, profiles)

        # Brief pause between queries
        time.sleep(2)

    # Business logic validation
    validation_test_cases = [
        {
            "query": "machine learning",
            "profile": "content_quality",
            "expected_behavior": "Higher-rated content should rank higher",
            "validation_function": validate_rating_boost
        },
        {
            "query": "data science",
            "profile": "popularity_boost",
            "expected_behavior": "Popular content should be boosted",
            "validation_function": validate_popularity_boost
        },
        {
            "query": "artificial intelligence",
            "profile": "freshness_priority",
            "expected_behavior": "Recent content should rank higher",
            "validation_function": validate_freshness_boost
        },
        {
            "query": "web development",
            "profile": "balanced_relevance",
            "expected_behavior": "Balanced scoring should work",
            "validation_function": None  # Basic validation only
        }
    ]

    passed, failed = tester.validate_business_logic(validation_test_cases)

    print(f"\n🎯 Overall Test Results:")
    print(f"✅ Passed: {passed}")
    print(f"❌ Failed: {failed}")
    print(f"Success Rate: {passed/(passed+failed)*100:.1f}%" if (passed+failed) > 0 else "No tests run")

# Run the tests
run_comprehensive_scoring_tests()

✅ Validation¶

Expected Outcomes¶

Document your findings for each scoring profile:

Content Quality Profile
Higher-rated content ranks higher
Rating boost function works correctly
Field weights prioritize title and category
Popularity Boost Profile
High view count content gets boosted
Logarithmic interpolation smooths extreme values
Like count provides additional signal
Freshness Priority Profile
Recent content ranks higher
Recent updates also boost relevance
Freshness functions work as expected
Balanced Relevance Profile
Combines multiple factors effectively
No single factor dominates results
Provides good overall relevance

Performance Metrics¶

Measure and document:

def measure_scoring_performance():
    """Measure performance impact of scoring profiles."""

    import time

    query = "machine learning"
    profiles = [None, "content_quality", "popularity_boost", "balanced_relevance"]
    iterations = 10

    print("⏱️ Performance Impact Analysis")
    print("=" * 40)

    for profile in profiles:
        latencies = []

        for _ in range(iterations):
            start_time = time.time()

            result = tester.test_scoring_profile(query, profile, top=20)

            end_time = time.time()
            latencies.append((end_time - start_time) * 1000)  # Convert to ms

        avg_latency = sum(latencies) / len(latencies)
        min_latency = min(latencies)
        max_latency = max(latencies)

        profile_name = profile or "Default"
        print(f"{profile_name}:")
        print(f"  Average: {avg_latency:.2f}ms")
        print(f"  Min: {min_latency:.2f}ms")
        print(f"  Max: {max_latency:.2f}ms")

measure_scoring_performance()

Validation Checklist¶

[ ] Index created with scoring profiles successfully
[ ] Test documents uploaded and indexed
[ ] All scoring profiles return results
[ ] Field weights affect result ordering
[ ] Magnitude functions boost appropriate content
[ ] Freshness functions prioritize recent content
[ ] Business logic validation passes
[ ] Performance impact is acceptable

🚀 Extensions¶

Extension 1: A/B Testing Framework¶

Implement statistical A/B testing to compare scoring profiles:

def ab_test_scoring_profiles(profile_a, profile_b, test_queries, sample_size=100):
    """Run A/B test comparing two scoring profiles."""

    import random
    from scipy import stats

    results_a = []
    results_b = []

    for _ in range(sample_size):
        query = random.choice(test_queries)

        # Test profile A
        result_a = tester.test_scoring_profile(query, profile_a, top=5)
        if result_a["results"]:
            avg_score_a = sum(doc.get("@search.score", 0) for doc in result_a["results"]) / len(result_a["results"])
            results_a.append(avg_score_a)

        # Test profile B
        result_b = tester.test_scoring_profile(query, profile_b, top=5)
        if result_b["results"]:
            avg_score_b = sum(doc.get("@search.score", 0) for doc in result_b["results"]) / len(result_b["results"])
            results_b.append(avg_score_b)

    # Statistical analysis
    if results_a and results_b:
        t_stat, p_value = stats.ttest_ind(results_a, results_b)

        print(f"A/B Test Results:")
        print(f"Profile A ({profile_a}): {len(results_a)} samples, avg score: {sum(results_a)/len(results_a):.3f}")
        print(f"Profile B ({profile_b}): {len(results_b)} samples, avg score: {sum(results_b)/len(results_b):.3f}")
        print(f"T-statistic: {t_stat:.3f}")
        print(f"P-value: {p_value:.3f}")
        print(f"Significant difference: {'Yes' if p_value < 0.05 else 'No'}")

Extension 2: Dynamic Scoring¶

Implement dynamic scoring based on user context:

def dynamic_scoring_profile(user_context):
    """Generate scoring profile based on user context."""

    base_weights = {
        "title": 3.0,
        "content": 1.0,
        "category": 2.0
    }

    # Adjust weights based on user preferences
    if user_context.get("prefers_recent_content"):
        # Boost freshness for users who prefer recent content
        freshness_boost = 2.0
    else:
        freshness_boost = 1.0

    if user_context.get("experience_level") == "beginner":
        # Boost highly-rated content for beginners
        rating_boost = 2.5
    else:
        rating_boost = 1.5

    return {
        "text_weights": base_weights,
        "freshness_boost": freshness_boost,
        "rating_boost": rating_boost
    }

Extension 3: Machine Learning Integration¶

Use ML to optimize scoring weights:

def optimize_scoring_weights(click_data, conversion_data):
    """Use machine learning to optimize scoring profile weights."""

    from sklearn.linear_model import LinearRegression
    import numpy as np

    # Features: field matches, rating, view count, freshness
    # Target: click-through rate or conversion rate

    features = []
    targets = []

    for interaction in click_data:
        feature_vector = [
            interaction["title_match_score"],
            interaction["content_match_score"],
            interaction["rating"],
            interaction["view_count"],
            interaction["days_since_publish"]
        ]
        features.append(feature_vector)
        targets.append(interaction["clicked"])

    # Train model
    model = LinearRegression()
    model.fit(np.array(features), np.array(targets))

    # Extract optimized weights
    coefficients = model.coef_

    optimized_weights = {
        "title": max(1.0, coefficients[0] * 4.0),
        "content": 1.0,  # Base weight
        "rating_boost": max(1.0, coefficients[2] * 2.0),
        "popularity_boost": max(1.0, coefficients[3] * 1.5)
    }

    return optimized_weights

💡 Solutions¶

Key Implementation Insights¶

Field Weight Strategy:
Title fields typically get highest weights (3-5x)
Content gets base weight (1.0)
Category and tags get medium weights (1.5-2.5x)
Author/metadata gets lower weights (1.0-1.5x)
Scoring Function Design:
Use linear interpolation for ratings (clear scale)
Use logarithmic interpolation for view counts (wide range)
Set appropriate boosting ranges based on data distribution
Combine functions with sum aggregation for additive effects
Business Logic Alignment:
Quality-focused: Emphasize ratings and expert content
Popularity-focused: Boost view counts and engagement
Freshness-focused: Prioritize recent and updated content
Balanced: Combine multiple factors with moderate weights

Common Pitfalls¶

Over-boosting: Using extreme boost values that dominate relevance
Narrow ranges: Setting boosting ranges too narrow for your data
Ignoring performance: Complex scoring profiles can impact query latency
Static configuration: Not updating profiles based on user behavior

Best Practices¶

Start with simple field weights before adding functions
Test with real data and user queries
Monitor performance impact of scoring profiles
Use A/B testing to validate improvements
Regularly review and update based on analytics

Next Exercise: Autocomplete System - Implement n-gram analyzers for autocomplete functionality.