Skip to content

Conversation

tao12345666333
Copy link
Contributor

What type of PR is this?

o11y: Add TTFT and TPOT histograms for SLOs

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #121

Release Notes: Yes/No

Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>
Copy link

netlify bot commented Sep 13, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit 6a6916b
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/68c5d8696336a100086236bc
😎 Deploy Preview https://deploy-preview-126--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link

github-actions bot commented Sep 13, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 src

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

  • src/semantic-router/pkg/extproc/metrics_integration_test.go
  • src/semantic-router/go.mod
  • src/semantic-router/pkg/extproc/processor.go
  • src/semantic-router/pkg/extproc/request_handler.go
  • src/semantic-router/pkg/extproc/request_processing_test.go
  • src/semantic-router/pkg/extproc/response_handler.go
  • src/semantic-router/pkg/extproc/testing_helpers_test.go
  • src/semantic-router/pkg/metrics/metrics.go

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

// handleResponseHeaders processes the response headers
func (r *OpenAIRouter) handleResponseHeaders(_ *ext_proc.ProcessingRequest_ResponseHeaders) (*ext_proc.ProcessingResponse, error) {
func (r *OpenAIRouter) handleResponseHeaders(_ *ext_proc.ProcessingRequest_ResponseHeaders, ctx *RequestContext) (*ext_proc.ProcessingResponse, error) {
// Best-effort TTFT measurement: record on first response headers if we have a start time and model
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now we haven't tried streaming mode. In the buffered mode, the response from LLM has to be fully received before the response is received. If you can add an issue to track TTFT in streaming mode, that'll be great.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#128 for tracking

rootfs
rootfs previously approved these changes Sep 13, 2025
@rootfs
Copy link
Collaborator

rootfs commented Sep 13, 2025

@tao12345666333 the go mod needs an update. Once the CI is green, this is ready to go. Thanks.

Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>
@rootfs rootfs merged commit 65e114c into vllm-project:main Sep 13, 2025
9 checks passed
@tao12345666333 tao12345666333 deleted the o11y-TTFT-TPOT branch September 13, 2025 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metrics: Add TTFT and TPOT histograms for SLOs
4 participants