Skip to content

Conversation

@cryo-zd
Copy link
Collaborator

@cryo-zd cryo-zd commented Sep 11, 2025

What type of PR is this?

Refactor: use worker pool for batch classification concurrency

This PR replaced per-task goroutines with a fixed-size worker pool (<= max_concurrency). This ensures accurate concurrency metrics, aligns with API semantics, and avoids overhead when batch size is large.

What this PR does / why we need it:
This PR refactors processConcurrently to address two issues:

  • The current implementation spawns one goroutine per text and increments ConcurrentGoroutines at creation time. This makes this metric does not accurately reflect the number of simultaneously running goroutines, which is limited by the semaphore to maxConcurrency.
    As a result, monitoring may show thousands of goroutines even when max_concurrency is set to 8, which diverges from the API contract that defines max_concurrency as the maximum concurrent goroutines.
  • Large batches could create excessive goroutines, adding unnecessary scheduling overhead.

link
api

Signed-off-by: cryo <zdtna412@gmail.com>
@netlify
Copy link

netlify bot commented Sep 11, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit 57db53b
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/68c2fcb2e03a3e0007eb610a
😎 Deploy Preview https://deploy-preview-115--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions
Copy link

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 src

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

  • src/semantic-router/pkg/api/server.go

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@rootfs
Copy link
Collaborator

rootfs commented Sep 11, 2025

@cryo-zd this is interesting! do you happen to have a test to see the relationship between concurrency and CPU/GPU usage?

@rootfs
Copy link
Collaborator

rootfs commented Sep 11, 2025

cc @aeft for recent semaphore refactor

@aeft
Copy link
Collaborator

aeft commented Sep 11, 2025

cc @aeft for recent semaphore refactor

looks good to me

@rootfs rootfs merged commit 35f0d70 into vllm-project:main Sep 11, 2025
9 checks passed
@rootfs
Copy link
Collaborator

rootfs commented Sep 11, 2025

@cryo-zd thanks for this work! If possible, can you follow up with a PR to share the concurrency benchmark?

@cryo-zd cryo-zd deleted the concurrent branch September 13, 2025 06:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants