[E2E Testing] Add comprehensive test coverage for routing strategies and filters

## 📋 Overview

This issue tracks the implementation of missing E2E test cases to achieve comprehensive coverage of semantic-router's core functionality. The current E2E test suite covers basic functionality, PII detection, jailbreak detection, domain classification, and semantic cache, but lacks coverage for several critical routing strategies and filters.

## 🎯 Scope

### 1️⃣ Routing Strategies (High Priority ⭐⭐⭐⭐⭐)

#### 1.1 Keyword Routing
**File**: `src/semantic-router/pkg/classification/keyword_classifier.go`

**Test Coverage Needed**:
- ✅ OR operator - any keyword matches
- ✅ AND operator - all keywords must match
- ✅ NOR operator - no keywords match
- ✅ Case-sensitive vs case-insensitive matching
- ✅ Regex pattern matching
- ✅ Word boundary detection
- ✅ Priority over embedding and intent-based routing

**Example Test Data**:
```json
{
  "test_cases": [
    {
      "description": "OR operator - urgent request",
      "query": "I need urgent help with my account",
      "expected_category": "urgent_request",
      "expected_confidence": 1.0,
      "matched_keywords": ["urgent"]
    },
    {
      "description": "AND operator - sensitive data",
      "query": "My SSN and credit card were stolen",
      "expected_category": "sensitive_data",
      "expected_confidence": 1.0,
      "matched_keywords": ["SSN", "credit card"]
    }
  ]
}
```

**Reference Config**: `config/intelligent-routing/in-tree/keyword.yaml`

---

#### 1.2 Embedding Routing
**File**: `src/semantic-router/pkg/classification/embedding_classifier.go`

**Test Coverage Needed**:
- ✅ Semantic similarity matching with embeddings
- ✅ Mean vs Max aggregation methods
- ✅ Similarity threshold validation
- ✅ Embedding model selection (auto, qwen3, gemma, bert)
- ✅ Matryoshka dimensions (768, 512, 256, 128)
- ✅ Quality vs Latency priority in auto mode

**Example Test Data**:
```json
{
  "test_cases": [
    {
      "description": "Mean aggregation - technical query",
      "query": "How to implement async/await in Python?",
      "keywords": ["programming", "coding", "software"],
      "aggregation_method": "mean",
      "threshold": 0.7,
      "expected_category": "technical",
      "expected_similarity": 0.85
    },
    {
      "description": "Max aggregation - high similarity",
      "query": "What is machine learning?",
      "keywords": ["AI", "ML", "neural networks"],
      "aggregation_method": "max",
      "threshold": 0.8,
      "expected_category": "ai"
    }
  ]
}
```

**Reference Config**: `config/intelligent-routing/in-tree/embedding.yaml`

---

#### 1.3 MCP Routing (Model Context Protocol)
**File**: `src/semantic-router/pkg/classification/mcp_classifier.go`

**Test Coverage Needed**:
- ✅ MCP Stdio transport (process communication)
- ✅ MCP HTTP transport (API calls)
- ✅ Custom classification logic via external MCP servers
- ✅ Model and reasoning decision from MCP response
- ✅ Fallback to in-tree classifier on MCP failure
- ✅ Probability distribution with `with_probabilities` parameter

**Example Test Data**:
```json
{
  "test_cases": [
    {
      "description": "MCP stdio - regex classifier",
      "mcp_server": "server_keyword.py",
      "transport": "stdio",
      "query": "urgent: fix production bug",
      "expected_category": "urgent",
      "expected_model": "gpt-oss",
      "expected_use_reasoning": true
    },
    {
      "description": "MCP HTTP - embedding classifier",
      "mcp_server": "http://localhost:8080",
      "transport": "http",
      "query": "Explain quantum computing",
      "expected_category": "science"
    }
  ]
}
```

**Reference Servers**: `examples/mcp-classifier-server/`

---

#### 1.4 Hybrid Routing
**Test Coverage Needed**:
- ✅ Priority order: Keyword → Embedding → Intent-based → MCP
- ✅ Fallback chain when high-priority methods fail
- ✅ Combined strategy with multiple routing methods enabled
- ✅ Confidence fusion from multiple classifiers

**Example Test Data**:
```json
{
  "test_cases": [
    {
      "description": "Keyword takes priority over embedding",
      "query": "urgent: machine learning question",
      "keyword_match": "urgent_request",
      "embedding_match": "ai",
      "expected_category": "urgent_request",
      "expected_method": "keyword"
    },
    {
      "description": "Fallback to embedding when keyword fails",
      "query": "What is deep learning?",
      "keyword_match": null,
      "embedding_match": "ai",
      "expected_category": "ai",
      "expected_method": "embedding"
    }
  ]
}
```

---

#### 1.5 Entropy-Based Routing
**File**: `src/semantic-router/pkg/utils/entropy/entropy.go`

**Test Coverage Needed**:
- ✅ Shannon entropy and normalized entropy calculation
- ✅ Uncertainty levels: very_high, high, medium, low, very_low
- ✅ Reasoning decision based on entropy
- ✅ Weighted decision for high uncertainty (top-2 categories)
- ✅ Confidence adjustment based on uncertainty

**Example Test Data**:
```json
{
  "test_cases": [
    {
      "description": "Very high entropy - enable reasoning",
      "probabilities": [0.25, 0.25, 0.25, 0.25],
      "expected_uncertainty": "very_high",
      "expected_use_reasoning": true,
      "expected_confidence": 0.3
    },
    {
      "description": "Very low entropy - trust classification",
      "probabilities": [0.95, 0.02, 0.02, 0.01],
      "expected_uncertainty": "very_low",
      "expected_use_reasoning": false,
      "expected_confidence": 0.90
    }
  ]
}
```

---

### 2️⃣ Filter Tests (High Priority ⭐⭐⭐⭐)

#### 2.1 ReasoningControl Filter
**File**: `src/semantic-router/pkg/extproc/req_filter_reason.go`

**Test Coverage Needed**:
- ✅ Enable/disable reasoning with `enableReasoning`
- ✅ Reasoning effort levels: low, medium, high
- ✅ Reasoning families: gpt-oss, deepseek, qwen3, claude
- ✅ `chat_template_kwargs` for different model families
- ✅ `reasoning_effort` parameter (OpenAI-style)
- ✅ `maxReasoningSteps` limit

**Example Config**:
```yaml
filters:
- type: ReasoningControl
  enabled: true
  config:
    reasonFamily: "gpt-oss"
    enableReasoning: true
    reasoningEffort: "high"
    maxReasoningSteps: 15
```

---

#### 2.2 ToolSelection Filter
**Test Coverage Needed**:
- ✅ Top-K tool selection
- ✅ Similarity threshold filtering
- ✅ Tools database loading from `toolsDBPath`
- ✅ Fallback strategy with `fallbackToEmpty`
- ✅ Category/tag-based tool filtering

**Example Config**:
```yaml
filters:
- type: ToolSelection
  enabled: true
  config:
    toolsDBPath: "tools.json"
    topK: 3
    similarityThreshold: 0.7
    fallbackToEmpty: false
```

**Reference**: `examples/semanticroute/tool-selection-example.yaml`

---

#### 2.3 Filter Chain Combination
**Test Coverage Needed**:
- ✅ Multiple filter execution order
- ✅ Filter short-circuit (e.g., PIIDetection blocks subsequent filters)
- ✅ Filter independence (configs don't interfere)
- ✅ Performance impact of multiple filters

**Example Chain**:
```yaml
filters:
- type: PIIDetection
- type: PromptGuard
- type: SemanticCache
- type: ReasoningControl
- type: ToolSelection
```

---

### 3️⃣ Cache Tests (Medium Priority ⭐⭐⭐)

#### 3.1 Different Cache Backends
**File**: `src/semantic-router/pkg/cache/`

**Test Coverage Needed**:
- ✅ InMemory cache performance
- ✅ Milvus cache with vector database
- ✅ Hybrid cache (HNSW + Milvus)
- ✅ TTL expiration mechanism
- ✅ Eviction strategy when `maxEntries` reached

---

#### 3.2 Different Embedding Models for Cache
**Test Coverage Needed**:
- ✅ BERT (fast, 384-dim)
- ✅ Qwen3 (high quality, 1024-dim, 32K context)
- ✅ Gemma (balanced, 768-dim, 8K context)
- ✅ Matryoshka dimensions impact on cache hit rate

---

### 4️⃣ Performance & Concurrency Tests (Medium Priority ⭐⭐⭐)

#### 4.1 Concurrent Requests
**Test Coverage Needed**:
- ✅ 100 concurrent classification requests
- ✅ Thread safety of classifiers
- ✅ Resource contention (cache, model loading)
- ✅ QPS (queries per second) benchmarking

---

#### 4.2 Long Text Handling
**Test Coverage Needed**:
- ✅ 32K context with Qwen3
- ✅ 8K context with Gemma
- ✅ Token limit handling and truncation

---

### 5️⃣ Edge Cases & Error Handling (Low Priority ⭐⭐)

#### 5.1 Configuration Errors
**Test Coverage Needed**:
- ✅ Invalid category mapping
- ✅ Invalid threshold values
- ✅ Missing defaultModel
- ✅ Invalid filter configurations

---

#### 5.2 Network Errors
**Test Coverage Needed**:
- ✅ Model service unavailable
- ✅ MCP server timeout
- ✅ Milvus connection failure
- ✅ Network timeout handling

---

## 📁 Implementation Structure

Suggested file organization:
```
e2e/testcases/
├── keyword_routing.go
├── embedding_routing.go
├── mcp_routing.go
├── hybrid_routing.go
├── entropy_routing.go
├── reasoning_control.go
├── tool_selection.go
├── filter_chain.go
├── cache_backends.go
├── concurrent_requests.go
└── testdata/
    ├── keyword_routing_cases.json
    ├── embedding_routing_cases.json
    ├── mcp_routing_cases.json
    ├── hybrid_routing_cases.json
    ├── entropy_routing_cases.json
    ├── reasoning_control_cases.json
    ├── tool_selection_cases.json
    └── ...
```

## 🎯 Acceptance Criteria

- [ ] All test cases pass consistently
- [ ] Test data is stored in JSON files for maintainability
- [ ] Tests follow existing E2E framework patterns
- [ ] Documentation is updated with new test coverage
- [ ] CI/CD pipeline includes new tests

## 📚 References

- Existing E2E tests: `e2e/testcases/`
- Keyword routing docs: `website/docs/tutorials/intelligent-route/keyword-routing.md`
- MCP classification docs: `website/docs/tutorials/mcp-classification/overview.md`
- Example configs: `examples/semanticroute/`
- In-tree configs: `config/intelligent-routing/in-tree/`

## 🤝 Contributing

This is a great opportunity for new contributors! Each test case can be implemented independently. Feel free to:
1. Pick any test case from the list above
2. Comment on this issue to claim it
3. Submit a PR with your implementation

For questions or guidance, please comment on this issue or join our community discussions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[E2E Testing] Add comprehensive test coverage for routing strategies and filters #667

📋 Overview

🎯 Scope

1️⃣ Routing Strategies (High Priority ⭐⭐⭐⭐⭐)

1.1 Keyword Routing

1.2 Embedding Routing

1.3 MCP Routing (Model Context Protocol)

1.4 Hybrid Routing

1.5 Entropy-Based Routing

2️⃣ Filter Tests (High Priority ⭐⭐⭐⭐)

2.1 ReasoningControl Filter

2.2 ToolSelection Filter

2.3 Filter Chain Combination

3️⃣ Cache Tests (Medium Priority ⭐⭐⭐)

3.1 Different Cache Backends

3.2 Different Embedding Models for Cache

4️⃣ Performance & Concurrency Tests (Medium Priority ⭐⭐⭐)

4.1 Concurrent Requests

4.2 Long Text Handling

5️⃣ Edge Cases & Error Handling (Low Priority ⭐⭐)

5.1 Configuration Errors

5.2 Network Errors

📁 Implementation Structure

🎯 Acceptance Criteria

📚 References

🤝 Contributing

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[E2E Testing] Add comprehensive test coverage for routing strategies and filters #667

Description

📋 Overview

🎯 Scope

1️⃣ Routing Strategies (High Priority ⭐⭐⭐⭐⭐)

1.1 Keyword Routing

1.2 Embedding Routing

1.3 MCP Routing (Model Context Protocol)

1.4 Hybrid Routing

1.5 Entropy-Based Routing

2️⃣ Filter Tests (High Priority ⭐⭐⭐⭐)

2.1 ReasoningControl Filter

2.2 ToolSelection Filter

2.3 Filter Chain Combination

3️⃣ Cache Tests (Medium Priority ⭐⭐⭐)

3.1 Different Cache Backends

3.2 Different Embedding Models for Cache

4️⃣ Performance & Concurrency Tests (Medium Priority ⭐⭐⭐)

4.1 Concurrent Requests

4.2 Long Text Handling

5️⃣ Edge Cases & Error Handling (Low Priority ⭐⭐)

5.1 Configuration Errors

5.2 Network Errors

📁 Implementation Structure

🎯 Acceptance Criteria

📚 References

🤝 Contributing

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions