Docs: Kubernetes Install Guide Test Step Fails

**Describe the bug**
The [k8s install guide](https://vllm-semantic-router.com/docs/installation/kubernetes) does not work as documented.

**To Reproduce**
Follow the steps in the guide.

**Expected behavior**
[The testing step](https://vllm-semantic-router.com/docs/installation/kubernetes#send-test-requests) to succeed.

**Screenshots**
Here are the results of the troubleshooting steps from the guide:

```sh
$ kubectl get gateway vllm-semantic-router -n vllm-semantic-router-system
...
status:
  conditions:
  - lastTransitionTime: "2025-10-09T19:33:02Z"
    message: The Gateway has been scheduled by Envoy Gateway
    observedGeneration: 1
    reason: Accepted
    status: "True"
    type: Accepted
  - lastTransitionTime: "2025-10-09T19:33:02Z"
    message: No addresses have been assigned to the Gateway
    observedGeneration: 1
    reason: AddressNotAssigned
    status: "False"
    type: Programmed

$ kubectl get svc -n envoy-gateway-system
NAME                                                              TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                                            AGE
envoy-gateway                                                     ClusterIP      10.96.200.166   <none>        18000/TCP,18001/TCP,18002/TCP,19001/TCP,9443/TCP   27m
envoy-ratelimit                                                   ClusterIP      10.96.173.10    <none>        8081/TCP,19001/TCP                                 19m
envoy-vllm-semantic-router-system-vllm-semantic-router-16c03415   LoadBalancer   10.96.79.79     <pending>     80:30187/TCP                                       16m

$ kubectl describe inferencepool vllm-semantic-router -n vllm-semantic-router-system
...
Status:
  Parent:
    Conditions:
      Last Transition Time:  2025-10-09T19:32:50Z
      Message:               InferencePool has been Accepted by controller ai-gateway-controller: InferencePool reconciled successfully
      Observed Generation:   1
      Reason:                Accepted
      Status:                True
      Type:                  Accepted
      Last Transition Time:  2025-10-09T19:32:50Z
      Message:               Reference resolution by controller ai-gateway-controller: All references resolved successfully
      Observed Generation:   1
      Reason:                ResolvedRefs
      Status:                True
      Type:                  ResolvedRefs
    Parent Ref:
      Group:      gateway.networking.k8s.io
      Kind:       Gateway
      Name:       vllm-semantic-router
      Namespace:  vllm-semantic-router-system

$ kubectl logs -n envoy-ai-gateway-system deployment/ai-gateway-controller
...
mantic-router/rule/0/match/0/*", "resource": "struct_value:{fields:{key:\"kind\" value:{string_value:\"HTTPRoute\"}} fields:{key:\"name\" value:{string_value:\"vllm-semantic-router\"}} fields:{key:\"namespace\" value:{string_value:\"vllm-semantic-router-system\"}}}"}
2025-10-09T19:32:51Z	INFO	envoy-gateway-extension-server	Added extproc-uds cluster to the list of clusters
2025-10-09T19:33:02Z	INFO	envoy-gateway-extension-server	Skipping non-AIGatewayRoute HTTPRoute cluster modification	{"namespace": "vllm-semantic-router-system", "name": "vllm-semantic-router"}
2025-10-09T19:33:02Z	INFO	envoy-gateway-extension-server	non-ai-gateway cluster name	{"cluster_name": "vllm-semantic-router-system/vllm-semantic-router"}
2025-10-09T19:33:02Z	INFO	envoy-gateway-extension-server	patching listener with inference pool filters	{"listener": "vllm-semantic-router-system/vllm-semantic-router/http"}
2025-10-09T19:33:02Z	INFO	envoy-gateway-extension-server	adding inference pool ext proc filter	{"pool": "vllm-semantic-router"}
2025-10-09T19:33:02Z	INFO	envoy-gateway-extension-server	patching virtual host with inference pool filters	{"listener": "vllm-semantic-router-system/vllm-semantic-router/http", "virtual_host": "vllm-semantic-router-system/vllm-semantic-router/http/*"}
2025-10-09T19:33:02Z	INFO	envoy-gateway-extension-server	no annotations found in the resource, skipping	{"route": "httproute/vllm-semantic-router-system/vllm-semantic-router/rule/0/match/0/*", "resource": "struct_value:{fields:{key:\"kind\" value:{string_value:\"HTTPRoute\"}} fields:{key:\"name\" value:{string_value:\"vllm-semantic-router\"}} fields:{key:\"namespace\" value:{string_value:\"vllm-semantic-router-system\"}}}"}
2025-10-09T19:33:02Z	INFO	envoy-gateway-extension-server	Added extproc-uds cluster to the list of clusters

$ kubectl get pods -n vllm-semantic-router-system
NAME                               READY   STATUS    RESTARTS   AGE
semantic-router-768c85f796-z5knd   1/1     Running   0          52m

$ kubectl logs -n vllm-semantic-router-system deployment/semantic-router
...
{"level":"info","ts":"2025-10-09T19:52:52.602101969Z","caller":"observability/logging.go:140","msg":"Started processing a new request"}
{"level":"info","ts":"2025-10-09T19:52:52.602207219Z","caller":"observability/logging.go:140","msg":"Received request headers"}
{"level":"info","ts":"2025-10-09T19:52:52.602647302Z","caller":"observability/logging.go:140","msg":"Received request body {\n    \"model\": \"auto\",\n    \"messages\": [\n      {\"role\": \"user\", \"content\": \"What is the derivative of f(x) = x^3 + 2x^2 - 5x + 7?\"}\n    ]\n  }"}
{"level":"info","ts":"2025-10-09T19:52:52.60273776Z","caller":"observability/logging.go:140","msg":"Original model: auto"}
{"level":"info","ts":"2025-10-09T19:52:52.74982476Z","caller":"observability/logging.go:140","msg":"Jailbreak classification result: {0 0.9999995}"}
{"level":"info","ts":"2025-10-09T19:52:52.749874344Z","caller":"observability/logging.go:140","msg":"BENIGN: 'benign' (confidence: 1.000, threshold: 0.700)"}
{"level":"info","ts":"2025-10-09T19:52:52.749879594Z","caller":"observability/logging.go:140","msg":"No jailbreak detected in request content"}
{"level":"info","ts":"2025-10-09T19:52:53.058280302Z","caller":"observability/logging.go:140","msg":"Using Auto Model Selection"}
{"level":"info","ts":"2025-10-09T19:52:53.198474594Z","caller":"observability/logging.go:140","msg":"Classification result: class=9, confidence=0.9716"}
{"level":"info","ts":"2025-10-09T19:52:53.198574177Z","caller":"observability/logging.go:140","msg":"Classified as category: math (mmlu=math)"}
{"level":"info","ts":"2025-10-09T19:52:53.198581886Z","caller":"observability/logging.go:140","msg":"Selected model openai/gpt-oss-20b for category math with score 1.0000"}
{"level":"info","ts":"2025-10-09T19:52:53.349998469Z","caller":"observability/logging.go:140","msg":"Classification result: class=9, confidence=0.9716"}
{"level":"info","ts":"2025-10-09T19:52:53.350044886Z","caller":"observability/logging.go:140","msg":"Classified as category: math (mmlu=math)"}
{"level":"info","ts":"2025-10-09T19:52:53.350054136Z","caller":"observability/logging.go:140","msg":"Routing to model: openai/gpt-oss-20b"}
{"level":"info","ts":"2025-10-09T19:52:53.492282594Z","caller":"observability/logging.go:140","msg":"Classification result: class=9, confidence=0.9716, entropy_available=true"}
{"level":"info","ts":"2025-10-09T19:52:53.492429011Z","caller":"observability/logging.go:140","msg":"Classified as category: math (mmlu=math), reasoning_decision: use=true, confidence=0.923, reason=very_low_uncertainty_trust_classification"}
{"level":"info","ts":"2025-10-09T19:52:53.492438177Z","caller":"observability/logging.go:140","msg":"Entropy-based reasoning decision: category='math', confidence=0.972, use_reasoning=true, reason=very_low_uncertainty_trust_classification, strategy=trust_top_category"}
{"level":"info","ts":"2025-10-09T19:52:53.492449844Z","caller":"observability/logging.go:140","msg":"Top predicted categories: [{math 0.9715921} {engineering 0.014712552} {chemistry 0.010437389}]"}
{"level":"info","ts":"2025-10-09T19:52:53.492453844Z","caller":"observability/logging.go:140","msg":"Entropy-based reasoning decision for this query: true on [openai/gpt-oss-20b] model (confidence: 0.923, reason: very_low_uncertainty_trust_classification)"}
{"level":"info","ts":"2025-10-09T19:52:53.492468844Z","caller":"observability/logging.go:140","msg":"Selected endpoint address: 127.0.0.1:8000 for model: openai/gpt-oss-20b"}
{"level":"info","ts":"2025-10-09T19:52:53.492559219Z","caller":"observability/logging.go:140","msg":"Applied reasoning mode (enabled: true) with effort (high) to model: openai/gpt-oss-20b"}
{"level":"info","ts":"2025-10-09T19:52:53.492585094Z","caller":"observability/logging.go:140","msg":"Use new model: openai/gpt-oss-20b"}
{"level":"info","ts":"2025-10-09T19:52:53.492598677Z","caller":"observability/logging.go:136","msg":"routing_decision","request_id":"a293fc18-8c17-4b68-8538-8354b1d7c57e","original_model":"auto","selected_model":"openai/gpt-oss-20b","category":"math","reasoning_enabled":true,"routing_latency_ms":889,"event":"routing_decision","reason_code":"auto_routing","reasoning_effort":"high","selected_endpoint":"127.0.0.1:8000"}
{"level":"info","ts":"2025-10-09T19:52:53.494066177Z","caller":"observability/logging.go:140","msg":"Stream ended gracefully"}
```

Based on the above troubleshooting steps, it appears the semantic router makes a routing decision but the request fails to get routed to the selected endpoint:

```sh
curl -i -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [
      {"role": "user", "content": "What is the derivative of f(x) = x^3 + 2x^2 - 5x + 7?"}
    ]
  }'
HTTP/1.1 503 Service Unavailable
content-length: 167
content-type: text/plain
date: Thu, 09 Oct 2025 19:52:53 GMT
```

It's unclear as to how the proxy routes the request to the selected endpoint when the instructions do not include a step for deploying the model server.

**Additional context**
I used the [port-forwarding method](https://vllm-semantic-router.com/docs/installation/kubernetes#method-1-port-forwarding-recommended-for-local-testing) for testing. The Envoy service is being port-forwared:

```sh
$ kubectl port-forward -n envoy-gateway-system svc/$ENVOY_SERVICE 8080:80
Forwarding from 127.0.0.1:8080 -> 10080
Forwarding from [::1]:8080 -> 10080
Handling connection for 8080
Handling connection for 8080
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Docs: Kubernetes Install Guide Test Step Fails #382

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Docs: Kubernetes Install Guide Test Step Fails #382

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions