Skip to content

Docs: Kubernetes Install Guide Test Step Fails #382

@danehans

Description

@danehans

Describe the bug
The k8s install guide does not work as documented.

To Reproduce
Follow the steps in the guide.

Expected behavior
The testing step to succeed.

Screenshots
Here are the results of the troubleshooting steps from the guide:

$ kubectl get gateway vllm-semantic-router -n vllm-semantic-router-system
...
status:
  conditions:
  - lastTransitionTime: "2025-10-09T19:33:02Z"
    message: The Gateway has been scheduled by Envoy Gateway
    observedGeneration: 1
    reason: Accepted
    status: "True"
    type: Accepted
  - lastTransitionTime: "2025-10-09T19:33:02Z"
    message: No addresses have been assigned to the Gateway
    observedGeneration: 1
    reason: AddressNotAssigned
    status: "False"
    type: Programmed

$ kubectl get svc -n envoy-gateway-system
NAME                                                              TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                                            AGE
envoy-gateway                                                     ClusterIP      10.96.200.166   <none>        18000/TCP,18001/TCP,18002/TCP,19001/TCP,9443/TCP   27m
envoy-ratelimit                                                   ClusterIP      10.96.173.10    <none>        8081/TCP,19001/TCP                                 19m
envoy-vllm-semantic-router-system-vllm-semantic-router-16c03415   LoadBalancer   10.96.79.79     <pending>     80:30187/TCP                                       16m

$ kubectl describe inferencepool vllm-semantic-router -n vllm-semantic-router-system
...
Status:
  Parent:
    Conditions:
      Last Transition Time:  2025-10-09T19:32:50Z
      Message:               InferencePool has been Accepted by controller ai-gateway-controller: InferencePool reconciled successfully
      Observed Generation:   1
      Reason:                Accepted
      Status:                True
      Type:                  Accepted
      Last Transition Time:  2025-10-09T19:32:50Z
      Message:               Reference resolution by controller ai-gateway-controller: All references resolved successfully
      Observed Generation:   1
      Reason:                ResolvedRefs
      Status:                True
      Type:                  ResolvedRefs
    Parent Ref:
      Group:      gateway.networking.k8s.io
      Kind:       Gateway
      Name:       vllm-semantic-router
      Namespace:  vllm-semantic-router-system

$ kubectl logs -n envoy-ai-gateway-system deployment/ai-gateway-controller
...
mantic-router/rule/0/match/0/*", "resource": "struct_value:{fields:{key:\"kind\" value:{string_value:\"HTTPRoute\"}} fields:{key:\"name\" value:{string_value:\"vllm-semantic-router\"}} fields:{key:\"namespace\" value:{string_value:\"vllm-semantic-router-system\"}}}"}
2025-10-09T19:32:51Z	INFO	envoy-gateway-extension-server	Added extproc-uds cluster to the list of clusters
2025-10-09T19:33:02Z	INFO	envoy-gateway-extension-server	Skipping non-AIGatewayRoute HTTPRoute cluster modification	{"namespace": "vllm-semantic-router-system", "name": "vllm-semantic-router"}
2025-10-09T19:33:02Z	INFO	envoy-gateway-extension-server	non-ai-gateway cluster name	{"cluster_name": "vllm-semantic-router-system/vllm-semantic-router"}
2025-10-09T19:33:02Z	INFO	envoy-gateway-extension-server	patching listener with inference pool filters	{"listener": "vllm-semantic-router-system/vllm-semantic-router/http"}
2025-10-09T19:33:02Z	INFO	envoy-gateway-extension-server	adding inference pool ext proc filter	{"pool": "vllm-semantic-router"}
2025-10-09T19:33:02Z	INFO	envoy-gateway-extension-server	patching virtual host with inference pool filters	{"listener": "vllm-semantic-router-system/vllm-semantic-router/http", "virtual_host": "vllm-semantic-router-system/vllm-semantic-router/http/*"}
2025-10-09T19:33:02Z	INFO	envoy-gateway-extension-server	no annotations found in the resource, skipping	{"route": "httproute/vllm-semantic-router-system/vllm-semantic-router/rule/0/match/0/*", "resource": "struct_value:{fields:{key:\"kind\" value:{string_value:\"HTTPRoute\"}} fields:{key:\"name\" value:{string_value:\"vllm-semantic-router\"}} fields:{key:\"namespace\" value:{string_value:\"vllm-semantic-router-system\"}}}"}
2025-10-09T19:33:02Z	INFO	envoy-gateway-extension-server	Added extproc-uds cluster to the list of clusters

$ kubectl get pods -n vllm-semantic-router-system
NAME                               READY   STATUS    RESTARTS   AGE
semantic-router-768c85f796-z5knd   1/1     Running   0          52m

$ kubectl logs -n vllm-semantic-router-system deployment/semantic-router
...
{"level":"info","ts":"2025-10-09T19:52:52.602101969Z","caller":"observability/logging.go:140","msg":"Started processing a new request"}
{"level":"info","ts":"2025-10-09T19:52:52.602207219Z","caller":"observability/logging.go:140","msg":"Received request headers"}
{"level":"info","ts":"2025-10-09T19:52:52.602647302Z","caller":"observability/logging.go:140","msg":"Received request body {\n    \"model\": \"auto\",\n    \"messages\": [\n      {\"role\": \"user\", \"content\": \"What is the derivative of f(x) = x^3 + 2x^2 - 5x + 7?\"}\n    ]\n  }"}
{"level":"info","ts":"2025-10-09T19:52:52.60273776Z","caller":"observability/logging.go:140","msg":"Original model: auto"}
{"level":"info","ts":"2025-10-09T19:52:52.74982476Z","caller":"observability/logging.go:140","msg":"Jailbreak classification result: {0 0.9999995}"}
{"level":"info","ts":"2025-10-09T19:52:52.749874344Z","caller":"observability/logging.go:140","msg":"BENIGN: 'benign' (confidence: 1.000, threshold: 0.700)"}
{"level":"info","ts":"2025-10-09T19:52:52.749879594Z","caller":"observability/logging.go:140","msg":"No jailbreak detected in request content"}
{"level":"info","ts":"2025-10-09T19:52:53.058280302Z","caller":"observability/logging.go:140","msg":"Using Auto Model Selection"}
{"level":"info","ts":"2025-10-09T19:52:53.198474594Z","caller":"observability/logging.go:140","msg":"Classification result: class=9, confidence=0.9716"}
{"level":"info","ts":"2025-10-09T19:52:53.198574177Z","caller":"observability/logging.go:140","msg":"Classified as category: math (mmlu=math)"}
{"level":"info","ts":"2025-10-09T19:52:53.198581886Z","caller":"observability/logging.go:140","msg":"Selected model openai/gpt-oss-20b for category math with score 1.0000"}
{"level":"info","ts":"2025-10-09T19:52:53.349998469Z","caller":"observability/logging.go:140","msg":"Classification result: class=9, confidence=0.9716"}
{"level":"info","ts":"2025-10-09T19:52:53.350044886Z","caller":"observability/logging.go:140","msg":"Classified as category: math (mmlu=math)"}
{"level":"info","ts":"2025-10-09T19:52:53.350054136Z","caller":"observability/logging.go:140","msg":"Routing to model: openai/gpt-oss-20b"}
{"level":"info","ts":"2025-10-09T19:52:53.492282594Z","caller":"observability/logging.go:140","msg":"Classification result: class=9, confidence=0.9716, entropy_available=true"}
{"level":"info","ts":"2025-10-09T19:52:53.492429011Z","caller":"observability/logging.go:140","msg":"Classified as category: math (mmlu=math), reasoning_decision: use=true, confidence=0.923, reason=very_low_uncertainty_trust_classification"}
{"level":"info","ts":"2025-10-09T19:52:53.492438177Z","caller":"observability/logging.go:140","msg":"Entropy-based reasoning decision: category='math', confidence=0.972, use_reasoning=true, reason=very_low_uncertainty_trust_classification, strategy=trust_top_category"}
{"level":"info","ts":"2025-10-09T19:52:53.492449844Z","caller":"observability/logging.go:140","msg":"Top predicted categories: [{math 0.9715921} {engineering 0.014712552} {chemistry 0.010437389}]"}
{"level":"info","ts":"2025-10-09T19:52:53.492453844Z","caller":"observability/logging.go:140","msg":"Entropy-based reasoning decision for this query: true on [openai/gpt-oss-20b] model (confidence: 0.923, reason: very_low_uncertainty_trust_classification)"}
{"level":"info","ts":"2025-10-09T19:52:53.492468844Z","caller":"observability/logging.go:140","msg":"Selected endpoint address: 127.0.0.1:8000 for model: openai/gpt-oss-20b"}
{"level":"info","ts":"2025-10-09T19:52:53.492559219Z","caller":"observability/logging.go:140","msg":"Applied reasoning mode (enabled: true) with effort (high) to model: openai/gpt-oss-20b"}
{"level":"info","ts":"2025-10-09T19:52:53.492585094Z","caller":"observability/logging.go:140","msg":"Use new model: openai/gpt-oss-20b"}
{"level":"info","ts":"2025-10-09T19:52:53.492598677Z","caller":"observability/logging.go:136","msg":"routing_decision","request_id":"a293fc18-8c17-4b68-8538-8354b1d7c57e","original_model":"auto","selected_model":"openai/gpt-oss-20b","category":"math","reasoning_enabled":true,"routing_latency_ms":889,"event":"routing_decision","reason_code":"auto_routing","reasoning_effort":"high","selected_endpoint":"127.0.0.1:8000"}
{"level":"info","ts":"2025-10-09T19:52:53.494066177Z","caller":"observability/logging.go:140","msg":"Stream ended gracefully"}

Based on the above troubleshooting steps, it appears the semantic router makes a routing decision but the request fails to get routed to the selected endpoint:

curl -i -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [
      {"role": "user", "content": "What is the derivative of f(x) = x^3 + 2x^2 - 5x + 7?"}
    ]
  }'
HTTP/1.1 503 Service Unavailable
content-length: 167
content-type: text/plain
date: Thu, 09 Oct 2025 19:52:53 GMT

It's unclear as to how the proxy routes the request to the selected endpoint when the instructions do not include a step for deploying the model server.

Additional context
I used the port-forwarding method for testing. The Envoy service is being port-forwared:

$ kubectl port-forward -n envoy-gateway-system svc/$ENVOY_SERVICE 8080:80
Forwarding from 127.0.0.1:8080 -> 10080
Forwarding from [::1]:8080 -> 10080
Handling connection for 8080
Handling connection for 8080

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions