-
Notifications
You must be signed in to change notification settings - Fork 4.4k
feat(server): refactor concurrency model and enhance task management #3257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat(server): refactor concurrency model and enhance task management #3257
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done an initial pass over this and wanted to leave some early feedback (or feedback in chucks at least so there are not too many in one go) and I'll go through this in more detail.
@@ -41,10 +43,10 @@ const std::string vjson_format = "verbose_json"; | |||
const std::string vtt_format = "vtt"; | |||
|
|||
std::function<void(int)> shutdown_handler; | |||
std::atomic_flag is_terminating = ATOMIC_FLAG_INIT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that std::atomic_flag
might be more appropriate here as it is guaranteed to be lockfree. And it is only used in a signal handler where we don't want any potential blocking. So I'd prefer to keep this as is.
const httplib::Request* request_ptr; // For abort callback | ||
std::atomic<AbortReason> abort_reason{AbortReason::NotAborted}; | ||
std::atomic<bool>* stop_flag_ptr{nullptr}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: remove empty space (not showing here but is visible using command line git). There are few more of these in the file but I won't list them all to avoid cluttering the review.
WhisperTask() = default; | ||
|
||
// Move constructor | ||
WhisperTask(WhisperTask&& other) noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: remove trailing white space (not showing here but is visible using command line git). There are few more of these in the file but I won't list them all to avoid cluttering the review.
std::vector<std::vector<float>> pcmf32s; | ||
whisper_params params; | ||
std::string filename; | ||
const httplib::Request* request_ptr; // For abort callback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Pointers/references are preferred to be "non-leaning". There are few more of these.
|
||
class WhisperTaskQueue { | ||
public: | ||
WhisperTaskQueue(struct whisper_context* ctx, size_t n_workers = 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering about the default of 2
for n_workers
as the default value is 1 elsewhere in the code and perhaps this should be also be 1?
wparams.token_timestamps = !task.params.no_timestamps && task.params.response_format == vjson_format; | ||
wparams.no_context = task.params.no_context; | ||
wparams.suppress_nst = task.params.suppress_nst; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps also add VAD parameters here similar to what is done in
whisper.cpp/examples/server/server.cpp
Lines 924 to 932 in 2a4d6db
wparams.vad = params.vad; | |
wparams.vad_model_path = params.vad_model.c_str(); | |
wparams.vad_params.threshold = params.vad_threshold; | |
wparams.vad_params.min_speech_duration_ms = params.vad_min_speech_duration_ms; | |
wparams.vad_params.min_silence_duration_ms = params.vad_min_silence_duration_ms; | |
wparams.vad_params.max_speech_duration_s = params.vad_max_speech_duration_s; | |
wparams.vad_params.speech_pad_ms = params.vad_speech_pad_ms; | |
wparams.vad_params.samples_overlap = params.vad_samples_overlap; |
std::mutex queue_mutex_; | ||
std::condition_variable queue_cv_; | ||
std::mutex whisper_mutex_; // Protect whisper context access | ||
std::atomic<bool> stop_flag_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think stop_flag_
could just be a bool as the accesses to it are all guarded by the queue_cv_
lock.
Lte's hold on this one. I ran some becnhmarks and tracing with NVIDIA Nsight and CPU is not the bottleneck |
This pull request makes significant enhancements to the server implementation in whisper.cpp, especially around concurrency and request handling. Here’s a summary of the key changes:
Major Changes
1. Multi-threaded Task Queue for Inference
WhisperTaskQueue
class, which manages a queue of inference tasks and processes them using multiple worker threads (the number of workers is now configurable via a new--workers
server argument).2. Graceful Shutdown and Robust Handling
503
for server shutdown,499
for client disconnect).3. Refactoring of Main Server Logic
POST /inference_path
) is refactored to create a fresh copy of the parameter set for each request, improving thread safety.4. Command-Line Interface and Usability Improvements
--workers N
argument to control the number of concurrent worker threads for handling requests.5. Minor Fixes and Improvements
std::atomic_flag
for a more idiomaticstd::atomic<bool>
for the termination signal.Example: New Option
This will launch the server with 4 worker threads for handling requests concurrently.