-
Notifications
You must be signed in to change notification settings - Fork 296
Description
Summary
E2E tests for the signal-decision engine revealed that the domain classifier is returning empty strings or incorrect classifications for obvious queries. This affects both new E2E tests and pre-existing domain classification tests.
Severity
High (Functionality)
Symptom
- ❌ Returns empty strings for clear domain queries (e.g., "Explain photosynthesis" -> empty instead of "biology")
- ❌ Returns wrong classifications (e.g., "Tell me a joke" -> "psychology" instead of "other")
- ✅ Some obvious cases work (e.g., "What is 15 * 23?" -> "math")
Test Results
Tests affected:
decision-fallback-behavior: 40% accuracy (2/5 pass)plugin-config-variations: 50% accuracy (3/6 pass)- Pre-existing
domain-classify: 66+ misclassifications
Example Failures
| Query | Expected | Actual | Status |
|---|---|---|---|
| "Explain photosynthesis" | biology | (empty) | ❌ FAIL |
| "Tell me a joke" | other | psychology | ❌ FAIL |
| "What is 15 * 23?" | math | math | ✅ PASS |
| "Describe the water cycle" | biology | (empty) | ❌ FAIL |
| "How do I solve quadratic equations?" | math | (empty) | ❌ FAIL |
Steps to Reproduce
Run the E2E test workflows:
Option 1: Run AIBrix/AI Gateway tests
# Trigger integration-test-k8s.yml workflow
# This runs tests for AIBrix and AI Gateway profilesOption 2: Run Dynamic Config tests
# Trigger integration-test-dynamic-config.yml workflow
# This runs tests for Dynamic Config profileOr run locally:
# AIBrix profile
make e2e-test E2E_PROFILE=aibrix
# AI Gateway profile
make e2e-test E2E_PROFILE=ai-gateway
# Dynamic Config profile
make e2e-test E2E_PROFILE=dynamic-configAll profiles will show the same domain classification issues.
Root Cause
This appears to be an embedding-based signal issue. The domain classification signal needs to be working correctly and covered in tests.
Acceptance Criteria
- Obvious domain queries return correct classifications (e.g., "photosynthesis" -> biology)
- Generic queries fallback to "other" domain correctly
- Empty string classifications are eliminated
-
decision-fallback-behaviortest reaches 100% accuracy (5/5 pass) -
plugin-config-variationstest reaches 100% accuracy (6/6 pass) - Pre-existing
domain-classifytest shows <10% misclassification rate (currently 66+) - Add unit tests for domain classification with various query types
- Verify embedding-based domain classification signal is working correctly
Impact
This is a high-priority functionality issue as it breaks domain-based routing and decision-making. Queries are not being routed to the appropriate domain-specific handlers, leading to suboptimal responses.
Additional Context
The high misclassification rate (66+) in pre-existing tests suggests this may be a regression or a long-standing issue that needs immediate attention.
Related Issues
Part of the signal-decision engine backend issues affecting all deployment profiles (AIBrix, AI Gateway, Dynamic Config).