Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

During hitless rollout testing, on average one early request to vLLM times out. #557

Open
smarterclayton opened this issue Mar 21, 2025 · 1 comment
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@smarterclayton
Copy link
Contributor

smarterclayton commented Mar 21, 2025

During rollout testing for #550, I noticed that every rollout I was getting 1 503, caused by a connection timeout between the gateway and the backend. Without request tracing, it's too hard in the current logs to identify which request is actually failing correlated with the LB logs in GKE, but the timeout is recorded 3m13s after the first request is received by vLLM, and my backend timeout is 3m, so it's one of the very early requests.

The test in this case is generating load such that the p99 request latency is about 87s, so it's unlikely the request itself requires 3m to complete. I don't see a preemption record.

Deserves further investigation with request tracing on.

The startup probe delays readiness start until 2025-03-21T13:11:22Z and you can see the first request timestamp happens very close to then (EPP starts sending traffic to the backend fairly quickly due to ready -> endpointslice -> EPP propagation)

Image

GKE L7LB logs:

{
  "insertId": "pk00bbef2dvu",
  "jsonPayload": {
    "@type": "type.googleapis.com/google.cloud.loadbalancing.type.LoadBalancerLogEntry",
    "backendTargetProjectNumber": "projects/563452970168",
    "proxyStatus": "error=\"connection_terminated\"; details=\"backend_connection_closed\""
  },
  "httpRequest": {
    "requestMethod": "POST",
    "requestUrl": "http://localhost:8000/v1/completions",
    "requestSize": "359",
    "status": 503,
    "responseSize": "190",
    "userAgent": "Go-http-client/1.1",
    "remoteIp": "xx.xx.xx.xx:24406",
    "serverIp": "10.36.9.20:8000",
    "latency": "0.002088s",
    "protocol": "HTTP/1.1"
  },
  "resource": {
    "type": "http_external_regional_lb_rule",
    "labels": {
      "network_name": "default",
      "matched_url_path_rule": "/",
      "forwarding_rule_name": "gkegw1-o3ey-default-e2e-inference-gateway-lh8x45d0jxkw",
      "backend_name": "k8s1-3e4ad3be-default-my-pool-ip-aeff52c1-54321-26b71417",
      "backend_scope": "us-central1-a",
      "backend_scope_type": "ZONE",
      "project_id": "claytoncoleman-gke-dev",
      "backend_target_name": "gkegw1-o3ey-default-my-pool-ip-aeff52c1-54321-qysads06jk9i",
      "region": "us-central1",
      "backend_target_type": "BACKEND_SERVICE",
      "url_map_name": "gkegw1-o3ey-default-e2e-inference-gateway-stfg914qzjms",
      "backend_type": "NETWORK_ENDPOINT_GROUP",
      "target_proxy_name": "gkegw1-o3ey-default-e2e-inference-gateway-stfg914qzjms"
    }
  },
  "timestamp": "2025-03-21T13:14:35.782703Z",
  "severity": "WARNING",
  "logName": "projects/claytoncoleman-gke-dev/logs/loadbalancing.googleapis.com%2Fexternal_regional_requests",
  "receiveTimestamp": "2025-03-21T13:14:37.018539280Z"
}
@smarterclayton smarterclayton added the kind/bug Categorizes issue or PR as related to a bug. label Mar 21, 2025
@smarterclayton
Copy link
Contributor Author

One related note - in this run I observe about 1/min

Post "http://localhost:8000/v1/completions": EOF

Not sure if this related, but probably not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

1 participant