Skip to content

fix(api): restore HTTP GET /health endpoint#25234

Merged
pront merged 5 commits into
masterfrom
restore-http-health-endpoint
Apr 21, 2026
Merged

fix(api): restore HTTP GET /health endpoint#25234
pront merged 5 commits into
masterfrom
restore-http-health-endpoint

Conversation

@pront
Copy link
Copy Markdown
Member

@pront pront commented Apr 21, 2026

Summary

Restores the HTTP GET /health endpoint on the Vector API to avoid user disruption. The endpoint is served on the same port as the gRPC API, so existing HTTP probes (AWS ALB health checks, Kubernetes HTTP liveness/readiness probes, etc.) keep working unchanged.

  • Response matches the pre-0.55 shape: 200 {"ok":true} while serving, 503 {"ok":false} during drain.
  • HEAD /health is also accepted for load balancers that prefer it.
  • The standard gRPC health service (grpc.health.v1.Health/Check) is unchanged. Both probes share the same serving flag so they agree during shutdown.

Flagged in #25210.

No changelog fragment: no release has shipped without /health, so there is no user-visible change relative to 0.54.0. The no-changelog label applies.

Implementation

src/api/grpc_server.rs now builds the tonic Server into an axum::Router via into_router(), merges an axum router exposing GET/HEAD /health, and serves the combined router via hyper::Server over the existing TcpListener. accept_http1(true) lets plain HTTP/1.1 requests reach the axum routes while gRPC continues to use HTTP/2. set_not_serving() now flips both the gRPC HealthReporter and the shared HTTP serving flag.

Vector configuration

Any config with the API enabled works, e.g.:

api:
  enabled: true
  address: 127.0.0.1:8686

sources:
  demo:
    type: demo_logs
    format: json

sinks:
  out:
    type: blackhole
    inputs: [demo]

How did you test this PR?

  • Added two integration tests in tests/vector_api/health.rs covering GET /health (200 + body) and HEAD /health (200). Existing gRPC grpc.health.v1.Health/Check test still passes.
  • Manual: curl -i http://127.0.0.1:8686/health returns 200 {"ok":true}; grpcurl -plaintext 127.0.0.1:8686 grpc.health.v1.Health/Check still returns SERVING.
  • Local make check-fmt, make check-clippy, make check-markdown, and website/ make cue-build all pass.

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

pront and others added 5 commits April 21, 2026 12:19
Convert tonic's Server to an axum Router via into_router(), then serve
over the same TcpListener via hyper::Server. Enables HTTP/1.1 acceptance
so additional HTTP routes can be added alongside gRPC on the same port.
Behavior-preserving.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Re-expose the HTTP health endpoint that was removed as part of the
GraphQL-to-gRPC migration (#24364). The endpoint matches the pre-0.55
response shape: 200 with body {"ok": true} while serving and 503 with
body {"ok": false} after set_not_serving() is called during drain.
HEAD is also handled.

gRPC clients continue to use grpc.health.v1.Health/Check; both probes
now share the same serving state so they agree during shutdown.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two integration tests hitting the restored HTTP health endpoint
via reqwest:
- GET returns 200 with body {"ok":true}
- HEAD returns 200

Exposes harness.api_port() so tests can reach the API port directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Document the HTTP GET/HEAD /health endpoint served alongside the gRPC
API, framed as compatibility with Vector 0.54.0 and earlier. Updates
the reference endpoints schema to allow HEAD, adds HEAD/GET entries
for /health in api.cue with 200/503 responses, and adds a curl example
to the API reference page.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added work in progress domain: external docs Anything related to Vector's external, public documentation and removed work in progress labels Apr 21, 2026
@pront pront added the no-changelog Changes in this PR do not need user-facing explanations in the release changelog label Apr 21, 2026
@pront pront marked this pull request as ready for review April 21, 2026 16:56
@pront pront requested review from a team as code owners April 21, 2026 16:56
Copy link
Copy Markdown
Member

@thomasqueirozb thomasqueirozb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This will make upgrading simpler 🙂. We can look into removal at a later date

@pront pront added this pull request to the merge queue Apr 21, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 281bbf9d3c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/api/grpc_server.rs
Comment on lines +101 to +102
.into_router()
.merge(http_router(router_serving));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep honoring gRPC deadlines after router merge

For gRPC clients that set request deadlines via grpc-timeout metadata, converting the tonic server into an axum router and serving it directly through hyper bypasses tonic's transport MakeSvc, which is where the previous serve_with_incoming_shutdown path wrapped routes in GrpcTimeout. Those deadline-bounded API calls/streams will no longer be cancelled with a deadline-exceeded status by the server, and can keep consuming server work until they otherwise finish or the client disconnects; please preserve tonic's timeout handling when serving the merged router.

Useful? React with 👍 / 👎.

Merged via the queue into master with commit 1c70988 Apr 21, 2026
122 of 123 checks passed
@pront pront deleted the restore-http-health-endpoint branch April 21, 2026 17:32
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 21, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

domain: external docs Anything related to Vector's external, public documentation no-changelog Changes in this PR do not need user-facing explanations in the release changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants