Skip to content

Exercise 3: Basic Scoring Implementation

📋 Exercise Details

  • Difficulty: Beginner
  • Duration: 60-90 minutes
  • Skills: Scoring profiles, field weights, magnitude functions, business logic

🎯 Objective

Learn to implement and test basic scoring profiles in Azure AI Search by creating field weight configurations and simple scoring functions that align with business objectives and improve search relevance.

📚 Prerequisites

  • Completed Exercises 1-2
  • Understanding of search relevance concepts
  • Basic knowledge of business metrics (ratings, popularity, recency)
  • Familiarity with JSON configuration syntax

🛠️ Instructions

Step 1: Design Scoring Requirements

Define business requirements for your scoring system:

Business Scenarios

  1. Content Discovery Platform: Boost high-quality, recent content
  2. E-commerce Product Search: Prioritize popular, well-rated products
  3. Knowledge Base Search: Emphasize authoritative, frequently accessed articles
  4. News/Blog Platform: Balance relevance with freshness and engagement

Scoring Factors

  • Content Quality: Ratings, reviews, expert validation
  • Popularity: View counts, downloads, shares
  • Freshness: Publication date, last updated
  • Authority: Author reputation, source credibility

Step 2: Create Scoring Profile Index

Design an index schema with scoring-relevant fields:

{
  "name": "content-scoring-demo",
  "fields": [
    {
      "name": "id",
      "type": "Edm.String",
      "key": true,
      "searchable": false
    },
    {
      "name": "title",
      "type": "Edm.String",
      "searchable": true,
      "analyzer": "en.microsoft"
    },
    {
      "name": "content",
      "type": "Edm.String",
      "searchable": true,
      "analyzer": "en.microsoft"
    },
    {
      "name": "category",
      "type": "Edm.String",
      "searchable": true,
      "filterable": true,
      "facetable": true
    },
    {
      "name": "author",
      "type": "Edm.String",
      "searchable": true,
      "filterable": true
    },
    {
      "name": "publishDate",
      "type": "Edm.DateTimeOffset",
      "filterable": true,
      "sortable": true
    },
    {
      "name": "lastModified",
      "type": "Edm.DateTimeOffset",
      "filterable": true,
      "sortable": true
    },
    {
      "name": "rating",
      "type": "Edm.Double",
      "filterable": true,
      "sortable": true,
      "facetable": true
    },
    {
      "name": "viewCount",
      "type": "Edm.Int32",
      "filterable": true,
      "sortable": true
    },
    {
      "name": "likeCount",
      "type": "Edm.Int32",
      "filterable": true,
      "sortable": true
    },
    {
      "name": "commentCount",
      "type": "Edm.Int32",
      "filterable": true,
      "sortable": true
    },
    {
      "name": "difficulty",
      "type": "Edm.String",
      "filterable": true,
      "facetable": true
    },
    {
      "name": "tags",
      "type": "Collection(Edm.String)",
      "searchable": true,
      "filterable": true,
      "facetable": true
    }
  ],
  "scoringProfiles": [
    {
      "name": "content_quality",
      "text": {
        "weights": {
          "title": 4.0,
          "content": 1.0,
          "category": 2.0,
          "author": 1.5,
          "tags": 2.5
        }
      },
      "functions": [
        {
          "type": "magnitude",
          "fieldName": "rating",
          "boost": 2.5,
          "interpolation": "linear",
          "magnitude": {
            "boostingRangeStart": 1.0,
            "boostingRangeEnd": 5.0,
            "constantBoostBeyondRange": true
          }
        }
      ],
      "functionAggregation": "sum"
    },
    {
      "name": "popularity_boost",
      "text": {
        "weights": {
          "title": 3.0,
          "content": 1.0,
          "category": 2.0
        }
      },
      "functions": [
        {
          "type": "magnitude",
          "fieldName": "viewCount",
          "boost": 1.8,
          "interpolation": "logarithmic",
          "magnitude": {
            "boostingRangeStart": 1,
            "boostingRangeEnd": 100000,
            "constantBoostBeyondRange": true
          }
        },
        {
          "type": "magnitude",
          "fieldName": "likeCount",
          "boost": 1.5,
          "interpolation": "linear",
          "magnitude": {
            "boostingRangeStart": 0,
            "boostingRangeEnd": 1000,
            "constantBoostBeyondRange": true
          }
        }
      ],
      "functionAggregation": "sum"
    },
    {
      "name": "freshness_priority",
      "text": {
        "weights": {
          "title": 3.0,
          "content": 1.0,
          "category": 1.5
        }
      },
      "functions": [
        {
          "type": "freshness",
          "fieldName": "publishDate",
          "boost": 2.0,
          "interpolation": "linear",
          "freshness": {
            "boostingDuration": "P30D"
          }
        },
        {
          "type": "freshness",
          "fieldName": "lastModified",
          "boost": 1.3,
          "interpolation": "linear",
          "freshness": {
            "boostingDuration": "P7D"
          }
        }
      ],
      "functionAggregation": "sum"
    },
    {
      "name": "balanced_relevance",
      "text": {
        "weights": {
          "title": 4.0,
          "content": 1.0,
          "category": 2.0,
          "author": 1.5,
          "tags": 2.0
        }
      },
      "functions": [
        {
          "type": "magnitude",
          "fieldName": "rating",
          "boost": 1.8,
          "interpolation": "linear",
          "magnitude": {
            "boostingRangeStart": 3.0,
            "boostingRangeEnd": 5.0,
            "constantBoostBeyondRange": true
          }
        },
        {
          "type": "magnitude",
          "fieldName": "viewCount",
          "boost": 1.4,
          "interpolation": "logarithmic",
          "magnitude": {
            "boostingRangeStart": 10,
            "boostingRangeEnd": 10000,
            "constantBoostBeyondRange": true
          }
        },
        {
          "type": "freshness",
          "fieldName": "publishDate",
          "boost": 1.2,
          "interpolation": "linear",
          "freshness": {
            "boostingDuration": "P90D"
          }
        }
      ],
      "functionAggregation": "sum"
    }
  ]
}

Step 3: Generate Diverse Test Data

Create test documents with varied scoring characteristics:

from datetime import datetime, timedelta
import random

def generate_test_documents(count=20):
    """Generate diverse test documents for scoring validation."""

    categories = ["Technology", "Science", "Business", "Education", "Health", "Entertainment"]
    authors = ["Dr. Sarah Johnson", "Prof. Michael Chen", "Emily Rodriguez", "James Wilson", "Lisa Thompson"]
    difficulties = ["Beginner", "Intermediate", "Advanced", "Expert"]

    base_date = datetime.now()
    documents = []

    for i in range(1, count + 1):
        # Vary publication dates
        days_ago = random.randint(1, 365)
        publish_date = base_date - timedelta(days=days_ago)

        # Vary last modified (some content updated recently)
        if random.random() < 0.3:  # 30% chance of recent update
            last_modified = base_date - timedelta(days=random.randint(1, 30))
        else:
            last_modified = publish_date + timedelta(days=random.randint(1, 10))

        # Generate realistic metrics
        base_views = random.randint(50, 50000)
        rating = round(random.uniform(2.5, 5.0), 1)
        like_count = int(base_views * random.uniform(0.01, 0.1))
        comment_count = int(base_views * random.uniform(0.005, 0.05))

        doc = {
            "id": str(i),
            "title": f"Article {i}: {random.choice(['Advanced', 'Introduction to', 'Complete Guide to', 'Best Practices for'])} {random.choice(['Machine Learning', 'Data Science', 'Cloud Computing', 'Web Development', 'Artificial Intelligence'])}",
            "content": f"This is a comprehensive article about {random.choice(categories).lower()} topics. It covers fundamental concepts, practical applications, and real-world examples. The content is designed for {random.choice(difficulties).lower()} level readers.",
            "category": random.choice(categories),
            "author": random.choice(authors),
            "publishDate": publish_date.isoformat() + "Z",
            "lastModified": last_modified.isoformat() + "Z",
            "rating": rating,
            "viewCount": base_views,
            "likeCount": like_count,
            "commentCount": comment_count,
            "difficulty": random.choice(difficulties),
            "tags": random.sample(["tutorial", "guide", "reference", "example", "best-practices", "advanced", "beginner"], k=random.randint(2, 4))
        }

        documents.append(doc)

    return documents

# Generate test data
test_documents = generate_test_documents(25)

Step 4: Implement Scoring Profile Testing

Create a comprehensive testing framework:

class ScoringProfileTester:
    def __init__(self, search_client, index_name):
        self.search_client = search_client
        self.index_name = index_name

    def test_scoring_profile(self, query, profile_name=None, top=10):
        """Test a specific scoring profile with a query."""

        search_params = {
            "search_text": query,
            "top": top,
            "include_total_count": True,
            "select": ["id", "title", "category", "author", "rating", "viewCount", "publishDate"]
        }

        if profile_name:
            search_params["scoring_profile"] = profile_name

        try:
            results = list(self.search_client.search(**search_params))
            return {
                "profile": profile_name or "default",
                "query": query,
                "results": results,
                "count": len(results)
            }
        except Exception as e:
            return {
                "profile": profile_name or "default",
                "query": query,
                "error": str(e),
                "results": [],
                "count": 0
            }

    def compare_scoring_profiles(self, query, profiles, top=5):
        """Compare multiple scoring profiles with the same query."""

        print(f"\n🔍 Query: '{query}'")
        print("=" * 60)

        all_results = {}

        for profile in profiles:
            result = self.test_scoring_profile(query, profile, top)
            all_results[profile or "default"] = result

            print(f"\n📊 {profile or 'Default'} Scoring:")

            if result.get("error"):
                print(f"  ❌ Error: {result['error']}")
                continue

            if not result["results"]:
                print("  No results found")
                continue

            for i, doc in enumerate(result["results"], 1):
                title = doc.get("title", "No title")
                score = doc.get("@search.score", 0)
                rating = doc.get("rating", 0)
                views = doc.get("viewCount", 0)
                category = doc.get("category", "Unknown")

                print(f"  {i}. {title[:50]}...")
                print(f"     Score: {score:.3f}, Rating: {rating}, Views: {views}, Category: {category}")

        return all_results

    def analyze_scoring_impact(self, query, profiles):
        """Analyze the impact of different scoring profiles."""

        results = self.compare_scoring_profiles(query, profiles)

        print(f"\n📈 Scoring Impact Analysis:")
        print("-" * 40)

        # Compare result ordering
        default_ids = [doc.get("id") for doc in results.get("default", {}).get("results", [])]

        for profile_name, result in results.items():
            if profile_name == "default" or result.get("error"):
                continue

            profile_ids = [doc.get("id") for doc in result.get("results", [])]

            if profile_ids != default_ids:
                print(f"  ✅ {profile_name}: Changed result ordering")

                # Calculate position changes
                position_changes = []
                for doc_id in profile_ids[:5]:  # Top 5 results
                    default_pos = default_ids.index(doc_id) + 1 if doc_id in default_ids else None
                    profile_pos = profile_ids.index(doc_id) + 1

                    if default_pos:
                        change = default_pos - profile_pos
                        if change != 0:
                            position_changes.append(f"ID {doc_id}: {change:+d} positions")

                if position_changes:
                    print(f"    Position changes: {', '.join(position_changes)}")
            else:
                print(f"  ⚠️ {profile_name}: No change in result ordering")

        return results

    def validate_business_logic(self, test_cases):
        """Validate that scoring profiles implement business logic correctly."""

        print(f"\n✅ Business Logic Validation")
        print("=" * 60)

        passed = 0
        failed = 0

        for test_case in test_cases:
            query = test_case["query"]
            profile = test_case["profile"]
            expected_behavior = test_case["expected_behavior"]
            validation_func = test_case.get("validation_function")

            print(f"\n📋 Testing: {query} with {profile}")
            print(f"Expected: {expected_behavior}")

            result = self.test_scoring_profile(query, profile, top=10)

            if result.get("error"):
                print(f"  ❌ FAIL: {result['error']}")
                failed += 1
                continue

            if validation_func:
                success = validation_func(result["results"])
                if success:
                    print(f"  ✅ PASS: Business logic validated")
                    passed += 1
                else:
                    print(f"  ❌ FAIL: Business logic not working as expected")
                    failed += 1
            else:
                # Basic validation: check if results are returned
                if result["results"]:
                    print(f"  ✅ PASS: Results returned")
                    passed += 1
                else:
                    print(f"  ❌ FAIL: No results returned")
                    failed += 1

        print(f"\nValidation Summary: {passed} passed, {failed} failed")
        return passed, failed

# Example validation functions
def validate_rating_boost(results):
    """Validate that higher-rated content appears first."""
    if len(results) < 2:
        return True

    ratings = [doc.get("rating", 0) for doc in results[:3]]
    return ratings == sorted(ratings, reverse=True)

def validate_popularity_boost(results):
    """Validate that popular content (high view count) is boosted."""
    if len(results) < 2:
        return True

    view_counts = [doc.get("viewCount", 0) for doc in results[:3]]
    # Check if generally trending towards higher view counts
    return sum(view_counts[:2]) > sum(view_counts[1:3]) if len(view_counts) >= 3 else True

def validate_freshness_boost(results):
    """Validate that recent content is boosted."""
    if len(results) < 2:
        return True

    from datetime import datetime

    dates = []
    for doc in results[:3]:
        date_str = doc.get("publishDate", "")
        if date_str:
            try:
                date = datetime.fromisoformat(date_str.replace('Z', '+00:00'))
                dates.append(date)
            except:
                continue

    if len(dates) < 2:
        return True

    # Check if generally trending towards more recent dates
    return dates[0] >= dates[-1]

Step 5: Comprehensive Testing Scenarios

Create test scenarios that validate business requirements:

def run_comprehensive_scoring_tests():
    """Run comprehensive scoring profile tests."""

    # Initialize tester (assuming search_client is configured)
    tester = ScoringProfileTester(search_client, "content-scoring-demo")

    # Test queries representing different search intents
    test_queries = [
        "machine learning tutorial",
        "data science guide",
        "artificial intelligence",
        "web development",
        "cloud computing"
    ]

    # Scoring profiles to test
    profiles = [None, "content_quality", "popularity_boost", "freshness_priority", "balanced_relevance"]

    print("🚀 Comprehensive Scoring Profile Testing")
    print("=" * 60)

    # Test each query with all profiles
    for query in test_queries:
        results = tester.analyze_scoring_impact(query, profiles)

        # Brief pause between queries
        time.sleep(2)

    # Business logic validation
    validation_test_cases = [
        {
            "query": "machine learning",
            "profile": "content_quality",
            "expected_behavior": "Higher-rated content should rank higher",
            "validation_function": validate_rating_boost
        },
        {
            "query": "data science",
            "profile": "popularity_boost",
            "expected_behavior": "Popular content should be boosted",
            "validation_function": validate_popularity_boost
        },
        {
            "query": "artificial intelligence",
            "profile": "freshness_priority",
            "expected_behavior": "Recent content should rank higher",
            "validation_function": validate_freshness_boost
        },
        {
            "query": "web development",
            "profile": "balanced_relevance",
            "expected_behavior": "Balanced scoring should work",
            "validation_function": None  # Basic validation only
        }
    ]

    passed, failed = tester.validate_business_logic(validation_test_cases)

    print(f"\n🎯 Overall Test Results:")
    print(f"✅ Passed: {passed}")
    print(f"❌ Failed: {failed}")
    print(f"Success Rate: {passed/(passed+failed)*100:.1f}%" if (passed+failed) > 0 else "No tests run")

# Run the tests
run_comprehensive_scoring_tests()

✅ Validation

Expected Outcomes

Document your findings for each scoring profile:

  1. Content Quality Profile
  2. Higher-rated content ranks higher
  3. Rating boost function works correctly
  4. Field weights prioritize title and category

  5. Popularity Boost Profile

  6. High view count content gets boosted
  7. Logarithmic interpolation smooths extreme values
  8. Like count provides additional signal

  9. Freshness Priority Profile

  10. Recent content ranks higher
  11. Recent updates also boost relevance
  12. Freshness functions work as expected

  13. Balanced Relevance Profile

  14. Combines multiple factors effectively
  15. No single factor dominates results
  16. Provides good overall relevance

Performance Metrics

Measure and document:

def measure_scoring_performance():
    """Measure performance impact of scoring profiles."""

    import time

    query = "machine learning"
    profiles = [None, "content_quality", "popularity_boost", "balanced_relevance"]
    iterations = 10

    print("⏱️ Performance Impact Analysis")
    print("=" * 40)

    for profile in profiles:
        latencies = []

        for _ in range(iterations):
            start_time = time.time()

            result = tester.test_scoring_profile(query, profile, top=20)

            end_time = time.time()
            latencies.append((end_time - start_time) * 1000)  # Convert to ms

        avg_latency = sum(latencies) / len(latencies)
        min_latency = min(latencies)
        max_latency = max(latencies)

        profile_name = profile or "Default"
        print(f"{profile_name}:")
        print(f"  Average: {avg_latency:.2f}ms")
        print(f"  Min: {min_latency:.2f}ms")
        print(f"  Max: {max_latency:.2f}ms")

measure_scoring_performance()

Validation Checklist

  • [ ] Index created with scoring profiles successfully
  • [ ] Test documents uploaded and indexed
  • [ ] All scoring profiles return results
  • [ ] Field weights affect result ordering
  • [ ] Magnitude functions boost appropriate content
  • [ ] Freshness functions prioritize recent content
  • [ ] Business logic validation passes
  • [ ] Performance impact is acceptable

🚀 Extensions

Extension 1: A/B Testing Framework

Implement statistical A/B testing to compare scoring profiles:

def ab_test_scoring_profiles(profile_a, profile_b, test_queries, sample_size=100):
    """Run A/B test comparing two scoring profiles."""

    import random
    from scipy import stats

    results_a = []
    results_b = []

    for _ in range(sample_size):
        query = random.choice(test_queries)

        # Test profile A
        result_a = tester.test_scoring_profile(query, profile_a, top=5)
        if result_a["results"]:
            avg_score_a = sum(doc.get("@search.score", 0) for doc in result_a["results"]) / len(result_a["results"])
            results_a.append(avg_score_a)

        # Test profile B
        result_b = tester.test_scoring_profile(query, profile_b, top=5)
        if result_b["results"]:
            avg_score_b = sum(doc.get("@search.score", 0) for doc in result_b["results"]) / len(result_b["results"])
            results_b.append(avg_score_b)

    # Statistical analysis
    if results_a and results_b:
        t_stat, p_value = stats.ttest_ind(results_a, results_b)

        print(f"A/B Test Results:")
        print(f"Profile A ({profile_a}): {len(results_a)} samples, avg score: {sum(results_a)/len(results_a):.3f}")
        print(f"Profile B ({profile_b}): {len(results_b)} samples, avg score: {sum(results_b)/len(results_b):.3f}")
        print(f"T-statistic: {t_stat:.3f}")
        print(f"P-value: {p_value:.3f}")
        print(f"Significant difference: {'Yes' if p_value < 0.05 else 'No'}")

Extension 2: Dynamic Scoring

Implement dynamic scoring based on user context:

def dynamic_scoring_profile(user_context):
    """Generate scoring profile based on user context."""

    base_weights = {
        "title": 3.0,
        "content": 1.0,
        "category": 2.0
    }

    # Adjust weights based on user preferences
    if user_context.get("prefers_recent_content"):
        # Boost freshness for users who prefer recent content
        freshness_boost = 2.0
    else:
        freshness_boost = 1.0

    if user_context.get("experience_level") == "beginner":
        # Boost highly-rated content for beginners
        rating_boost = 2.5
    else:
        rating_boost = 1.5

    return {
        "text_weights": base_weights,
        "freshness_boost": freshness_boost,
        "rating_boost": rating_boost
    }

Extension 3: Machine Learning Integration

Use ML to optimize scoring weights:

def optimize_scoring_weights(click_data, conversion_data):
    """Use machine learning to optimize scoring profile weights."""

    from sklearn.linear_model import LinearRegression
    import numpy as np

    # Features: field matches, rating, view count, freshness
    # Target: click-through rate or conversion rate

    features = []
    targets = []

    for interaction in click_data:
        feature_vector = [
            interaction["title_match_score"],
            interaction["content_match_score"],
            interaction["rating"],
            interaction["view_count"],
            interaction["days_since_publish"]
        ]
        features.append(feature_vector)
        targets.append(interaction["clicked"])

    # Train model
    model = LinearRegression()
    model.fit(np.array(features), np.array(targets))

    # Extract optimized weights
    coefficients = model.coef_

    optimized_weights = {
        "title": max(1.0, coefficients[0] * 4.0),
        "content": 1.0,  # Base weight
        "rating_boost": max(1.0, coefficients[2] * 2.0),
        "popularity_boost": max(1.0, coefficients[3] * 1.5)
    }

    return optimized_weights

💡 Solutions

Key Implementation Insights

  1. Field Weight Strategy:
  2. Title fields typically get highest weights (3-5x)
  3. Content gets base weight (1.0)
  4. Category and tags get medium weights (1.5-2.5x)
  5. Author/metadata gets lower weights (1.0-1.5x)

  6. Scoring Function Design:

  7. Use linear interpolation for ratings (clear scale)
  8. Use logarithmic interpolation for view counts (wide range)
  9. Set appropriate boosting ranges based on data distribution
  10. Combine functions with sum aggregation for additive effects

  11. Business Logic Alignment:

  12. Quality-focused: Emphasize ratings and expert content
  13. Popularity-focused: Boost view counts and engagement
  14. Freshness-focused: Prioritize recent and updated content
  15. Balanced: Combine multiple factors with moderate weights

Common Pitfalls

  • Over-boosting: Using extreme boost values that dominate relevance
  • Narrow ranges: Setting boosting ranges too narrow for your data
  • Ignoring performance: Complex scoring profiles can impact query latency
  • Static configuration: Not updating profiles based on user behavior

Best Practices

  • Start with simple field weights before adding functions
  • Test with real data and user queries
  • Monitor performance impact of scoring profiles
  • Use A/B testing to validate improvements
  • Regularly review and update based on analytics

Next Exercise: Autocomplete System - Implement n-gram analyzers for autocomplete functionality.