Skip to content

Schema Issues Troubleshooting - Module 3: Index Management

Schema Design Problems

Issue: Field attribute conflicts

Symptoms: - Index creation fails with validation errors - Unexpected behavior with field operations - Performance issues with certain field combinations

Common Causes: - Incompatible field attribute combinations - Key fields with inappropriate attributes - Collection fields with unsupported attributes

Solutions:

// Invalid: Key field cannot be facetable
{
  "name": "id",
  "type": "Edm.String",
  "key": true,
  "facetable": true  // ❌ This will fail
}

// Valid: Key field configuration
{
  "name": "id",
  "type": "Edm.String",
  "key": true,
  "retrievable": true
}

Issue: Incorrect data type selection

Symptoms: - Data conversion errors during indexing - Unexpected query behavior - Performance issues

Common Causes: - Using string for numeric data - Wrong date/time format specification - Inappropriate collection types

Solutions: 1. Numeric Data: Use appropriate numeric types

// Wrong
{"name": "price", "type": "Edm.String"}

// Correct
{"name": "price", "type": "Edm.Double"}

  1. Date/Time Data: Use DateTimeOffset for dates

    // Wrong
    {"name": "created", "type": "Edm.String"}
    
    // Correct
    {"name": "created", "type": "Edm.DateTimeOffset"}
    

  2. Boolean Data: Use Boolean type for true/false values

    // Wrong
    {"name": "isActive", "type": "Edm.String"}
    
    // Correct
    {"name": "isActive", "type": "Edm.Boolean"}
    

Field Naming Issues

Issue: Invalid field names

Symptoms: - Schema validation errors - Index creation failures - API operation failures

Common Causes: - Field names starting with numbers - Special characters in field names - Reserved keywords used as field names - Case sensitivity issues

Solutions:

// Invalid field names
{
  "name": "2ndField",     // ❌ Cannot start with number
  "name": "field-name",   // ❌ Hyphens not allowed
  "name": "field name",   // ❌ Spaces not allowed
  "name": "type"          // ❌ Reserved keyword
}

// Valid field names
{
  "name": "secondField",  // ✅ Starts with letter
  "name": "field_name",   // ✅ Underscores allowed
  "name": "fieldName",    // ✅ CamelCase allowed
  "name": "documentType"  // ✅ Descriptive name
}

Field Naming Best Practices

  • Start with letter or underscore
  • Use only letters, numbers, and underscores
  • Avoid reserved keywords
  • Use descriptive, meaningful names
  • Be consistent with naming conventions

Analyzer Configuration Issues

Issue: Incorrect analyzer selection

Symptoms: - Poor search results - Unexpected tokenization behavior - Language-specific search problems

Common Causes: - Wrong language analyzer - Inappropriate analyzer for content type - Missing custom analyzer configuration

Solutions: 1. Language-Specific Content:

// For English content
{
  "name": "description",
  "type": "Edm.String",
  "searchable": true,
  "analyzer": "en.lucene"
}

// For multilingual content
{
  "name": "description",
  "type": "Edm.String",
  "searchable": true,
  "analyzer": "standard.lucene"
}

  1. Exact Match Fields:
    // For exact matching (like product codes)
    {
      "name": "productCode",
      "type": "Edm.String",
      "searchable": true,
      "analyzer": "keyword"
    }
    

Issue: Custom analyzer problems

Symptoms: - Analyzer not found errors - Unexpected tokenization results - Performance issues

Common Causes: - Analyzer not defined in index - Incorrect analyzer configuration - Circular dependencies in analyzer definition

Solutions: 1. Define custom analyzers properly:

{
  "analyzers": [
    {
      "name": "my_custom_analyzer",
      "tokenizer": "standard",
      "tokenFilters": ["lowercase", "asciifolding"]
    }
  ],
  "fields": [
    {
      "name": "content",
      "type": "Edm.String",
      "searchable": true,
      "analyzer": "my_custom_analyzer"
    }
  ]
}

Complex Field Issues

Issue: Complex field configuration errors

Symptoms: - Schema validation failures - Nested object indexing problems - Query failures on complex fields

Common Causes: - Incorrect complex field structure - Missing sub-field definitions - Inappropriate attributes on complex fields

Solutions:

// Correct complex field definition
{
  "name": "address",
  "type": "Edm.ComplexType",
  "fields": [
    {
      "name": "street",
      "type": "Edm.String",
      "searchable": true
    },
    {
      "name": "city",
      "type": "Edm.String",
      "filterable": true,
      "facetable": true
    },
    {
      "name": "zipCode",
      "type": "Edm.String",
      "filterable": true
    }
  ]
}

Issue: Collection field problems

Symptoms: - Array data not indexing correctly - Filter operations failing on collections - Unexpected query results

Common Causes: - Incorrect collection type specification - Missing collection syntax - Inappropriate operations on collections

Solutions:

// String collection
{
  "name": "tags",
  "type": "Collection(Edm.String)",
  "searchable": true,
  "filterable": true,
  "facetable": true
}

// Complex type collection
{
  "name": "reviews",
  "type": "Collection(Edm.ComplexType)",
  "fields": [
    {
      "name": "rating",
      "type": "Edm.Int32",
      "filterable": true
    },
    {
      "name": "comment",
      "type": "Edm.String",
      "searchable": true
    }
  ]
}

Schema Evolution Issues

Issue: Cannot modify existing field

Symptoms: - Schema update operations fail - Field modification errors - Index update rejections

Common Causes: - Attempting to change immutable field properties - Modifying field types - Changing key field configuration

Solutions: 1. Understand Immutable Properties: - Field name cannot be changed - Field type cannot be changed - Key field designation cannot be changed

  1. Workaround Strategies:
  2. Add new field with desired configuration
  3. Migrate data to new field
  4. Remove old field if possible
  5. Consider creating new index for major changes

Issue: Schema version compatibility

Symptoms: - Application errors after schema updates - Unexpected query behavior - Data access issues

Common Causes: - Breaking changes in schema - Application code not updated - Cached schema information

Solutions: 1. Plan Schema Changes: - Analyze impact of changes - Update application code - Test in development environment - Implement gradual rollout

  1. Backward Compatibility:
  2. Add new fields as optional
  3. Maintain old field names when possible
  4. Use field aliases if supported
  5. Document breaking changes

Issue: Poor query performance due to schema design

Symptoms: - Slow query response times - High resource utilization - Timeout errors

Common Causes: - Too many searchable fields - Inappropriate field attributes - Large text fields with full indexing - Excessive facetable fields

Solutions: 1. Optimize Field Attributes:

// Before: Over-attributed field
{
  "name": "description",
  "type": "Edm.String",
  "searchable": true,
  "filterable": true,    // ❌ Unnecessary if not filtering
  "sortable": true,      // ❌ Unnecessary if not sorting
  "facetable": true      // ❌ Unnecessary if not faceting
}

// After: Optimized field
{
  "name": "description",
  "type": "Edm.String",
  "searchable": true     // ✅ Only necessary attribute
}

  1. Reduce Index Size:
  2. Remove unnecessary fields
  3. Use appropriate data types
  4. Optimize text field storage
  5. Consider field-specific analyzers

Diagnostic Techniques

Schema Validation

PUT https://[service-name].search.windows.net/indexes/[index-name]?api-version=2024-07-01
{
  "name": "test-index",
  "fields": [
    // Your field definitions
  ]
}

Field Testing

Test individual fields with sample data:

POST https://[service-name].search.windows.net/indexes/[index-name]/docs/index?api-version=2024-07-01
{
  "value": [
    {
      "@search.action": "upload",
      "id": "test1",
      "testField": "sample value"
    }
  ]
}

Analyzer Testing

Test analyzer behavior:

POST https://[service-name].search.windows.net/indexes/[index-name]/analyze?api-version=2024-07-01
{
  "text": "sample text to analyze",
  "analyzer": "en.lucene"
}

Common Schema Patterns

E-commerce Product Schema

{
  "name": "products",
  "fields": [
    {"name": "id", "type": "Edm.String", "key": true},
    {"name": "name", "type": "Edm.String", "searchable": true, "analyzer": "en.lucene"},
    {"name": "description", "type": "Edm.String", "searchable": true},
    {"name": "category", "type": "Edm.String", "filterable": true, "facetable": true},
    {"name": "price", "type": "Edm.Double", "filterable": true, "sortable": true},
    {"name": "inStock", "type": "Edm.Boolean", "filterable": true},
    {"name": "tags", "type": "Collection(Edm.String)", "searchable": true, "facetable": true}
  ]
}

Document Management Schema

{
  "name": "documents",
  "fields": [
    {"name": "id", "type": "Edm.String", "key": true},
    {"name": "title", "type": "Edm.String", "searchable": true},
    {"name": "content", "type": "Edm.String", "searchable": true},
    {"name": "author", "type": "Edm.String", "filterable": true, "facetable": true},
    {"name": "created", "type": "Edm.DateTimeOffset", "filterable": true, "sortable": true},
    {"name": "fileType", "type": "Edm.String", "filterable": true, "facetable": true},
    {"name": "size", "type": "Edm.Int64", "filterable": true, "sortable": true}
  ]
}

Prevention Strategies

Schema Design Review

  • Conduct thorough schema reviews before implementation
  • Validate with representative data
  • Test performance with expected data volumes
  • Document design decisions and rationale

Testing Procedures

  • Test schema with various data scenarios
  • Validate field attribute combinations
  • Test analyzer behavior with sample content
  • Perform load testing with realistic data

Documentation

  • Maintain comprehensive schema documentation
  • Document field purposes and usage
  • Keep track of schema evolution history
  • Document known limitations and workarounds

By following these troubleshooting guidelines and prevention strategies, you can avoid common schema issues and maintain robust, performant search indexes.