Skip to content

[Test] Domain Classifier Returns Empty or Wrong Classifications #714

@Xunzhuo

Description

@Xunzhuo

Summary

E2E tests for the signal-decision engine revealed that the domain classifier is returning empty strings or incorrect classifications for obvious queries. This affects both new E2E tests and pre-existing domain classification tests.

Severity

High (Functionality)

Symptom

  • ❌ Returns empty strings for clear domain queries (e.g., "Explain photosynthesis" -> empty instead of "biology")
  • ❌ Returns wrong classifications (e.g., "Tell me a joke" -> "psychology" instead of "other")
  • ✅ Some obvious cases work (e.g., "What is 15 * 23?" -> "math")

Test Results

Tests affected:

  • decision-fallback-behavior: 40% accuracy (2/5 pass)
  • plugin-config-variations: 50% accuracy (3/6 pass)
  • Pre-existing domain-classify: 66+ misclassifications

Example Failures

Query Expected Actual Status
"Explain photosynthesis" biology (empty) ❌ FAIL
"Tell me a joke" other psychology ❌ FAIL
"What is 15 * 23?" math math ✅ PASS
"Describe the water cycle" biology (empty) ❌ FAIL
"How do I solve quadratic equations?" math (empty) ❌ FAIL

Steps to Reproduce

Run the E2E test workflows:

Option 1: Run AIBrix/AI Gateway tests

# Trigger integration-test-k8s.yml workflow
# This runs tests for AIBrix and AI Gateway profiles

Option 2: Run Dynamic Config tests

# Trigger integration-test-dynamic-config.yml workflow
# This runs tests for Dynamic Config profile

Or run locally:

# AIBrix profile
make e2e-test E2E_PROFILE=aibrix

# AI Gateway profile
make e2e-test E2E_PROFILE=ai-gateway

# Dynamic Config profile
make e2e-test E2E_PROFILE=dynamic-config

All profiles will show the same domain classification issues.

Root Cause

This appears to be an embedding-based signal issue. The domain classification signal needs to be working correctly and covered in tests.

Acceptance Criteria

  • Obvious domain queries return correct classifications (e.g., "photosynthesis" -> biology)
  • Generic queries fallback to "other" domain correctly
  • Empty string classifications are eliminated
  • decision-fallback-behavior test reaches 100% accuracy (5/5 pass)
  • plugin-config-variations test reaches 100% accuracy (6/6 pass)
  • Pre-existing domain-classify test shows <10% misclassification rate (currently 66+)
  • Add unit tests for domain classification with various query types
  • Verify embedding-based domain classification signal is working correctly

Impact

This is a high-priority functionality issue as it breaks domain-based routing and decision-making. Queries are not being routed to the appropriate domain-specific handlers, leading to suboptimal responses.

Additional Context

The high misclassification rate (66+) in pre-existing tests suggests this may be a regression or a long-standing issue that needs immediate attention.

Related Issues

Part of the signal-decision engine backend issues affecting all deployment profiles (AIBrix, AI Gateway, Dynamic Config).

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions