Skip to content

[E2E Testing] Add comprehensive test coverage for routing strategies and filters #667

@Xunzhuo

Description

@Xunzhuo

📋 Overview

This issue tracks the implementation of missing E2E test cases to achieve comprehensive coverage of semantic-router's core functionality. The current E2E test suite covers basic functionality, PII detection, jailbreak detection, domain classification, and semantic cache, but lacks coverage for several critical routing strategies and filters.

🎯 Scope

1️⃣ Routing Strategies (High Priority ⭐⭐⭐⭐⭐)

1.1 Keyword Routing

File: src/semantic-router/pkg/classification/keyword_classifier.go

Test Coverage Needed:

  • ✅ OR operator - any keyword matches
  • ✅ AND operator - all keywords must match
  • ✅ NOR operator - no keywords match
  • ✅ Case-sensitive vs case-insensitive matching
  • ✅ Regex pattern matching
  • ✅ Word boundary detection
  • ✅ Priority over embedding and intent-based routing

Example Test Data:

{
  "test_cases": [
    {
      "description": "OR operator - urgent request",
      "query": "I need urgent help with my account",
      "expected_category": "urgent_request",
      "expected_confidence": 1.0,
      "matched_keywords": ["urgent"]
    },
    {
      "description": "AND operator - sensitive data",
      "query": "My SSN and credit card were stolen",
      "expected_category": "sensitive_data",
      "expected_confidence": 1.0,
      "matched_keywords": ["SSN", "credit card"]
    }
  ]
}

Reference Config: config/intelligent-routing/in-tree/keyword.yaml


1.2 Embedding Routing

File: src/semantic-router/pkg/classification/embedding_classifier.go

Test Coverage Needed:

  • ✅ Semantic similarity matching with embeddings
  • ✅ Mean vs Max aggregation methods
  • ✅ Similarity threshold validation
  • ✅ Embedding model selection (auto, qwen3, gemma, bert)
  • ✅ Matryoshka dimensions (768, 512, 256, 128)
  • ✅ Quality vs Latency priority in auto mode

Example Test Data:

{
  "test_cases": [
    {
      "description": "Mean aggregation - technical query",
      "query": "How to implement async/await in Python?",
      "keywords": ["programming", "coding", "software"],
      "aggregation_method": "mean",
      "threshold": 0.7,
      "expected_category": "technical",
      "expected_similarity": 0.85
    },
    {
      "description": "Max aggregation - high similarity",
      "query": "What is machine learning?",
      "keywords": ["AI", "ML", "neural networks"],
      "aggregation_method": "max",
      "threshold": 0.8,
      "expected_category": "ai"
    }
  ]
}

Reference Config: config/intelligent-routing/in-tree/embedding.yaml


1.3 MCP Routing (Model Context Protocol)

File: src/semantic-router/pkg/classification/mcp_classifier.go

Test Coverage Needed:

  • ✅ MCP Stdio transport (process communication)
  • ✅ MCP HTTP transport (API calls)
  • ✅ Custom classification logic via external MCP servers
  • ✅ Model and reasoning decision from MCP response
  • ✅ Fallback to in-tree classifier on MCP failure
  • ✅ Probability distribution with with_probabilities parameter

Example Test Data:

{
  "test_cases": [
    {
      "description": "MCP stdio - regex classifier",
      "mcp_server": "server_keyword.py",
      "transport": "stdio",
      "query": "urgent: fix production bug",
      "expected_category": "urgent",
      "expected_model": "gpt-oss",
      "expected_use_reasoning": true
    },
    {
      "description": "MCP HTTP - embedding classifier",
      "mcp_server": "http://localhost:8080",
      "transport": "http",
      "query": "Explain quantum computing",
      "expected_category": "science"
    }
  ]
}

Reference Servers: examples/mcp-classifier-server/


1.4 Hybrid Routing

Test Coverage Needed:

  • ✅ Priority order: Keyword → Embedding → Intent-based → MCP
  • ✅ Fallback chain when high-priority methods fail
  • ✅ Combined strategy with multiple routing methods enabled
  • ✅ Confidence fusion from multiple classifiers

Example Test Data:

{
  "test_cases": [
    {
      "description": "Keyword takes priority over embedding",
      "query": "urgent: machine learning question",
      "keyword_match": "urgent_request",
      "embedding_match": "ai",
      "expected_category": "urgent_request",
      "expected_method": "keyword"
    },
    {
      "description": "Fallback to embedding when keyword fails",
      "query": "What is deep learning?",
      "keyword_match": null,
      "embedding_match": "ai",
      "expected_category": "ai",
      "expected_method": "embedding"
    }
  ]
}

1.5 Entropy-Based Routing

File: src/semantic-router/pkg/utils/entropy/entropy.go

Test Coverage Needed:

  • ✅ Shannon entropy and normalized entropy calculation
  • ✅ Uncertainty levels: very_high, high, medium, low, very_low
  • ✅ Reasoning decision based on entropy
  • ✅ Weighted decision for high uncertainty (top-2 categories)
  • ✅ Confidence adjustment based on uncertainty

Example Test Data:

{
  "test_cases": [
    {
      "description": "Very high entropy - enable reasoning",
      "probabilities": [0.25, 0.25, 0.25, 0.25],
      "expected_uncertainty": "very_high",
      "expected_use_reasoning": true,
      "expected_confidence": 0.3
    },
    {
      "description": "Very low entropy - trust classification",
      "probabilities": [0.95, 0.02, 0.02, 0.01],
      "expected_uncertainty": "very_low",
      "expected_use_reasoning": false,
      "expected_confidence": 0.90
    }
  ]
}

2️⃣ Filter Tests (High Priority ⭐⭐⭐⭐)

2.1 ReasoningControl Filter

File: src/semantic-router/pkg/extproc/req_filter_reason.go

Test Coverage Needed:

  • ✅ Enable/disable reasoning with enableReasoning
  • ✅ Reasoning effort levels: low, medium, high
  • ✅ Reasoning families: gpt-oss, deepseek, qwen3, claude
  • chat_template_kwargs for different model families
  • reasoning_effort parameter (OpenAI-style)
  • maxReasoningSteps limit

Example Config:

filters:
- type: ReasoningControl
  enabled: true
  config:
    reasonFamily: "gpt-oss"
    enableReasoning: true
    reasoningEffort: "high"
    maxReasoningSteps: 15

2.2 ToolSelection Filter

Test Coverage Needed:

  • ✅ Top-K tool selection
  • ✅ Similarity threshold filtering
  • ✅ Tools database loading from toolsDBPath
  • ✅ Fallback strategy with fallbackToEmpty
  • ✅ Category/tag-based tool filtering

Example Config:

filters:
- type: ToolSelection
  enabled: true
  config:
    toolsDBPath: "tools.json"
    topK: 3
    similarityThreshold: 0.7
    fallbackToEmpty: false

Reference: examples/semanticroute/tool-selection-example.yaml


2.3 Filter Chain Combination

Test Coverage Needed:

  • ✅ Multiple filter execution order
  • ✅ Filter short-circuit (e.g., PIIDetection blocks subsequent filters)
  • ✅ Filter independence (configs don't interfere)
  • ✅ Performance impact of multiple filters

Example Chain:

filters:
- type: PIIDetection
- type: PromptGuard
- type: SemanticCache
- type: ReasoningControl
- type: ToolSelection

3️⃣ Cache Tests (Medium Priority ⭐⭐⭐)

3.1 Different Cache Backends

File: src/semantic-router/pkg/cache/

Test Coverage Needed:

  • ✅ InMemory cache performance
  • ✅ Milvus cache with vector database
  • ✅ Hybrid cache (HNSW + Milvus)
  • ✅ TTL expiration mechanism
  • ✅ Eviction strategy when maxEntries reached

3.2 Different Embedding Models for Cache

Test Coverage Needed:

  • ✅ BERT (fast, 384-dim)
  • ✅ Qwen3 (high quality, 1024-dim, 32K context)
  • ✅ Gemma (balanced, 768-dim, 8K context)
  • ✅ Matryoshka dimensions impact on cache hit rate

4️⃣ Performance & Concurrency Tests (Medium Priority ⭐⭐⭐)

4.1 Concurrent Requests

Test Coverage Needed:

  • ✅ 100 concurrent classification requests
  • ✅ Thread safety of classifiers
  • ✅ Resource contention (cache, model loading)
  • ✅ QPS (queries per second) benchmarking

4.2 Long Text Handling

Test Coverage Needed:

  • ✅ 32K context with Qwen3
  • ✅ 8K context with Gemma
  • ✅ Token limit handling and truncation

5️⃣ Edge Cases & Error Handling (Low Priority ⭐⭐)

5.1 Configuration Errors

Test Coverage Needed:

  • ✅ Invalid category mapping
  • ✅ Invalid threshold values
  • ✅ Missing defaultModel
  • ✅ Invalid filter configurations

5.2 Network Errors

Test Coverage Needed:

  • ✅ Model service unavailable
  • ✅ MCP server timeout
  • ✅ Milvus connection failure
  • ✅ Network timeout handling

📁 Implementation Structure

Suggested file organization:

e2e/testcases/
├── keyword_routing.go
├── embedding_routing.go
├── mcp_routing.go
├── hybrid_routing.go
├── entropy_routing.go
├── reasoning_control.go
├── tool_selection.go
├── filter_chain.go
├── cache_backends.go
├── concurrent_requests.go
└── testdata/
    ├── keyword_routing_cases.json
    ├── embedding_routing_cases.json
    ├── mcp_routing_cases.json
    ├── hybrid_routing_cases.json
    ├── entropy_routing_cases.json
    ├── reasoning_control_cases.json
    ├── tool_selection_cases.json
    └── ...

🎯 Acceptance Criteria

  • All test cases pass consistently
  • Test data is stored in JSON files for maintainability
  • Tests follow existing E2E framework patterns
  • Documentation is updated with new test coverage
  • CI/CD pipeline includes new tests

📚 References

  • Existing E2E tests: e2e/testcases/
  • Keyword routing docs: website/docs/tutorials/intelligent-route/keyword-routing.md
  • MCP classification docs: website/docs/tutorials/mcp-classification/overview.md
  • Example configs: examples/semanticroute/
  • In-tree configs: config/intelligent-routing/in-tree/

🤝 Contributing

This is a great opportunity for new contributors! Each test case can be implemented independently. Feel free to:

  1. Pick any test case from the list above
  2. Comment on this issue to claim it
  3. Submit a PR with your implementation

For questions or guidance, please comment on this issue or join our community discussions.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions