-
Notifications
You must be signed in to change notification settings - Fork 242
Open
Description
Describe the bug
The k8s install guide does not work as documented.
To Reproduce
Follow the steps in the guide.
Expected behavior
The testing step to succeed.
Screenshots
Here are the results of the troubleshooting steps from the guide:
$ kubectl get gateway vllm-semantic-router -n vllm-semantic-router-system
...
status:
conditions:
- lastTransitionTime: "2025-10-09T19:33:02Z"
message: The Gateway has been scheduled by Envoy Gateway
observedGeneration: 1
reason: Accepted
status: "True"
type: Accepted
- lastTransitionTime: "2025-10-09T19:33:02Z"
message: No addresses have been assigned to the Gateway
observedGeneration: 1
reason: AddressNotAssigned
status: "False"
type: Programmed
$ kubectl get svc -n envoy-gateway-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
envoy-gateway ClusterIP 10.96.200.166 <none> 18000/TCP,18001/TCP,18002/TCP,19001/TCP,9443/TCP 27m
envoy-ratelimit ClusterIP 10.96.173.10 <none> 8081/TCP,19001/TCP 19m
envoy-vllm-semantic-router-system-vllm-semantic-router-16c03415 LoadBalancer 10.96.79.79 <pending> 80:30187/TCP 16m
$ kubectl describe inferencepool vllm-semantic-router -n vllm-semantic-router-system
...
Status:
Parent:
Conditions:
Last Transition Time: 2025-10-09T19:32:50Z
Message: InferencePool has been Accepted by controller ai-gateway-controller: InferencePool reconciled successfully
Observed Generation: 1
Reason: Accepted
Status: True
Type: Accepted
Last Transition Time: 2025-10-09T19:32:50Z
Message: Reference resolution by controller ai-gateway-controller: All references resolved successfully
Observed Generation: 1
Reason: ResolvedRefs
Status: True
Type: ResolvedRefs
Parent Ref:
Group: gateway.networking.k8s.io
Kind: Gateway
Name: vllm-semantic-router
Namespace: vllm-semantic-router-system
$ kubectl logs -n envoy-ai-gateway-system deployment/ai-gateway-controller
...
mantic-router/rule/0/match/0/*", "resource": "struct_value:{fields:{key:\"kind\" value:{string_value:\"HTTPRoute\"}} fields:{key:\"name\" value:{string_value:\"vllm-semantic-router\"}} fields:{key:\"namespace\" value:{string_value:\"vllm-semantic-router-system\"}}}"}
2025-10-09T19:32:51Z INFO envoy-gateway-extension-server Added extproc-uds cluster to the list of clusters
2025-10-09T19:33:02Z INFO envoy-gateway-extension-server Skipping non-AIGatewayRoute HTTPRoute cluster modification {"namespace": "vllm-semantic-router-system", "name": "vllm-semantic-router"}
2025-10-09T19:33:02Z INFO envoy-gateway-extension-server non-ai-gateway cluster name {"cluster_name": "vllm-semantic-router-system/vllm-semantic-router"}
2025-10-09T19:33:02Z INFO envoy-gateway-extension-server patching listener with inference pool filters {"listener": "vllm-semantic-router-system/vllm-semantic-router/http"}
2025-10-09T19:33:02Z INFO envoy-gateway-extension-server adding inference pool ext proc filter {"pool": "vllm-semantic-router"}
2025-10-09T19:33:02Z INFO envoy-gateway-extension-server patching virtual host with inference pool filters {"listener": "vllm-semantic-router-system/vllm-semantic-router/http", "virtual_host": "vllm-semantic-router-system/vllm-semantic-router/http/*"}
2025-10-09T19:33:02Z INFO envoy-gateway-extension-server no annotations found in the resource, skipping {"route": "httproute/vllm-semantic-router-system/vllm-semantic-router/rule/0/match/0/*", "resource": "struct_value:{fields:{key:\"kind\" value:{string_value:\"HTTPRoute\"}} fields:{key:\"name\" value:{string_value:\"vllm-semantic-router\"}} fields:{key:\"namespace\" value:{string_value:\"vllm-semantic-router-system\"}}}"}
2025-10-09T19:33:02Z INFO envoy-gateway-extension-server Added extproc-uds cluster to the list of clusters
$ kubectl get pods -n vllm-semantic-router-system
NAME READY STATUS RESTARTS AGE
semantic-router-768c85f796-z5knd 1/1 Running 0 52m
$ kubectl logs -n vllm-semantic-router-system deployment/semantic-router
...
{"level":"info","ts":"2025-10-09T19:52:52.602101969Z","caller":"observability/logging.go:140","msg":"Started processing a new request"}
{"level":"info","ts":"2025-10-09T19:52:52.602207219Z","caller":"observability/logging.go:140","msg":"Received request headers"}
{"level":"info","ts":"2025-10-09T19:52:52.602647302Z","caller":"observability/logging.go:140","msg":"Received request body {\n \"model\": \"auto\",\n \"messages\": [\n {\"role\": \"user\", \"content\": \"What is the derivative of f(x) = x^3 + 2x^2 - 5x + 7?\"}\n ]\n }"}
{"level":"info","ts":"2025-10-09T19:52:52.60273776Z","caller":"observability/logging.go:140","msg":"Original model: auto"}
{"level":"info","ts":"2025-10-09T19:52:52.74982476Z","caller":"observability/logging.go:140","msg":"Jailbreak classification result: {0 0.9999995}"}
{"level":"info","ts":"2025-10-09T19:52:52.749874344Z","caller":"observability/logging.go:140","msg":"BENIGN: 'benign' (confidence: 1.000, threshold: 0.700)"}
{"level":"info","ts":"2025-10-09T19:52:52.749879594Z","caller":"observability/logging.go:140","msg":"No jailbreak detected in request content"}
{"level":"info","ts":"2025-10-09T19:52:53.058280302Z","caller":"observability/logging.go:140","msg":"Using Auto Model Selection"}
{"level":"info","ts":"2025-10-09T19:52:53.198474594Z","caller":"observability/logging.go:140","msg":"Classification result: class=9, confidence=0.9716"}
{"level":"info","ts":"2025-10-09T19:52:53.198574177Z","caller":"observability/logging.go:140","msg":"Classified as category: math (mmlu=math)"}
{"level":"info","ts":"2025-10-09T19:52:53.198581886Z","caller":"observability/logging.go:140","msg":"Selected model openai/gpt-oss-20b for category math with score 1.0000"}
{"level":"info","ts":"2025-10-09T19:52:53.349998469Z","caller":"observability/logging.go:140","msg":"Classification result: class=9, confidence=0.9716"}
{"level":"info","ts":"2025-10-09T19:52:53.350044886Z","caller":"observability/logging.go:140","msg":"Classified as category: math (mmlu=math)"}
{"level":"info","ts":"2025-10-09T19:52:53.350054136Z","caller":"observability/logging.go:140","msg":"Routing to model: openai/gpt-oss-20b"}
{"level":"info","ts":"2025-10-09T19:52:53.492282594Z","caller":"observability/logging.go:140","msg":"Classification result: class=9, confidence=0.9716, entropy_available=true"}
{"level":"info","ts":"2025-10-09T19:52:53.492429011Z","caller":"observability/logging.go:140","msg":"Classified as category: math (mmlu=math), reasoning_decision: use=true, confidence=0.923, reason=very_low_uncertainty_trust_classification"}
{"level":"info","ts":"2025-10-09T19:52:53.492438177Z","caller":"observability/logging.go:140","msg":"Entropy-based reasoning decision: category='math', confidence=0.972, use_reasoning=true, reason=very_low_uncertainty_trust_classification, strategy=trust_top_category"}
{"level":"info","ts":"2025-10-09T19:52:53.492449844Z","caller":"observability/logging.go:140","msg":"Top predicted categories: [{math 0.9715921} {engineering 0.014712552} {chemistry 0.010437389}]"}
{"level":"info","ts":"2025-10-09T19:52:53.492453844Z","caller":"observability/logging.go:140","msg":"Entropy-based reasoning decision for this query: true on [openai/gpt-oss-20b] model (confidence: 0.923, reason: very_low_uncertainty_trust_classification)"}
{"level":"info","ts":"2025-10-09T19:52:53.492468844Z","caller":"observability/logging.go:140","msg":"Selected endpoint address: 127.0.0.1:8000 for model: openai/gpt-oss-20b"}
{"level":"info","ts":"2025-10-09T19:52:53.492559219Z","caller":"observability/logging.go:140","msg":"Applied reasoning mode (enabled: true) with effort (high) to model: openai/gpt-oss-20b"}
{"level":"info","ts":"2025-10-09T19:52:53.492585094Z","caller":"observability/logging.go:140","msg":"Use new model: openai/gpt-oss-20b"}
{"level":"info","ts":"2025-10-09T19:52:53.492598677Z","caller":"observability/logging.go:136","msg":"routing_decision","request_id":"a293fc18-8c17-4b68-8538-8354b1d7c57e","original_model":"auto","selected_model":"openai/gpt-oss-20b","category":"math","reasoning_enabled":true,"routing_latency_ms":889,"event":"routing_decision","reason_code":"auto_routing","reasoning_effort":"high","selected_endpoint":"127.0.0.1:8000"}
{"level":"info","ts":"2025-10-09T19:52:53.494066177Z","caller":"observability/logging.go:140","msg":"Stream ended gracefully"}
Based on the above troubleshooting steps, it appears the semantic router makes a routing decision but the request fails to get routed to the selected endpoint:
curl -i -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [
{"role": "user", "content": "What is the derivative of f(x) = x^3 + 2x^2 - 5x + 7?"}
]
}'
HTTP/1.1 503 Service Unavailable
content-length: 167
content-type: text/plain
date: Thu, 09 Oct 2025 19:52:53 GMT
It's unclear as to how the proxy routes the request to the selected endpoint when the instructions do not include a step for deploying the model server.
Additional context
I used the port-forwarding method for testing. The Envoy service is being port-forwared:
$ kubectl port-forward -n envoy-gateway-system svc/$ENVOY_SERVICE 8080:80
Forwarding from 127.0.0.1:8080 -> 10080
Forwarding from [::1]:8080 -> 10080
Handling connection for 8080
Handling connection for 8080
Metadata
Metadata
Assignees
Labels
No labels