feat(server): refactor concurrency model and enhance task management #3257

sachaarbonel · 2025-06-16T09:46:35Z

This pull request makes significant enhancements to the server implementation in whisper.cpp, especially around concurrency and request handling. Here’s a summary of the key changes:

Major Changes

1. Multi-threaded Task Queue for Inference

Introduces a new WhisperTaskQueue class, which manages a queue of inference tasks and processes them using multiple worker threads (the number of workers is now configurable via a new --workers server argument).
All incoming POST requests for transcription are now handled asynchronously via this queue, allowing multiple requests to be processed in parallel.
Each task encapsulates all request information and handles its own completion signaling and abort reasons (e.g., client disconnect, server shutdown).

2. Graceful Shutdown and Robust Handling

The server now handles shutdowns more gracefully: ongoing and queued tasks are notified and can be aborted with meaningful error responses (503 for server shutdown, 499 for client disconnect).
The task queue is properly destroyed and re-created when a new model is loaded or on shutdown, ensuring clean resource management.

3. Refactoring of Main Server Logic

The request handler for audio transcription (POST /inference_path) is refactored to create a fresh copy of the parameter set for each request, improving thread safety.
The actual inference is now performed inside the worker threads, so the server remains responsive to new requests.
The code for response formatting (plain text, SRT, VTT, verbose JSON) is encapsulated within the task result logic.

4. Command-Line Interface and Usability Improvements

Adds a --workers N argument to control the number of concurrent worker threads for handling requests.
Usage help text and parameter parsing are updated accordingly.

5. Minor Fixes and Improvements

Swaps the use of std::atomic_flag for a more idiomatic std::atomic<bool> for the termination signal.
Reduces use of global and shared state, improving modularity and reliability.

Example: New Option

./server --workers 4

This will launch the server with 4 worker threads for handling requests concurrently.

danbev

I've done an initial pass over this and wanted to leave some early feedback (or feedback in chucks at least so there are not too many in one go) and I'll go through this in more detail.

danbev · 2025-06-16T13:33:22Z

examples/server/server.cpp

@@ -41,10 +43,10 @@ const std::string vjson_format  = "verbose_json";
 const std::string vtt_format    = "vtt";

 std::function<void(int)> shutdown_handler;
-std::atomic_flag is_terminating = ATOMIC_FLAG_INIT;


I think that std::atomic_flag might be more appropriate here as it is guaranteed to be lockfree. And it is only used in a signal handler where we don't want any potential blocking. So I'd prefer to keep this as is.

danbev · 2025-06-16T13:34:56Z

examples/server/server.cpp

+    const httplib::Request* request_ptr; // For abort callback
+    std::atomic<AbortReason> abort_reason{AbortReason::NotAborted};
+    std::atomic<bool>* stop_flag_ptr{nullptr};
+


Nit: remove empty space (not showing here but is visible using command line git). There are few more of these in the file but I won't list them all to avoid cluttering the review.

danbev · 2025-06-16T13:36:15Z

examples/server/server.cpp

+    WhisperTask() = default;
+
+    // Move constructor
+    WhisperTask(WhisperTask&& other) noexcept 


Nit: remove trailing white space (not showing here but is visible using command line git). There are few more of these in the file but I won't list them all to avoid cluttering the review.

danbev · 2025-06-16T13:46:59Z

examples/server/server.cpp

+    std::vector<std::vector<float>> pcmf32s;
+    whisper_params params;
+    std::string filename;
+    const httplib::Request* request_ptr; // For abort callback


Nit: Pointers/references are preferred to be "non-leaning". There are few more of these.

danbev · 2025-06-17T05:56:30Z

examples/server/server.cpp

+
+class WhisperTaskQueue {
+public:
+    WhisperTaskQueue(struct whisper_context* ctx, size_t n_workers = 2) 


Just wondering about the default of 2 for n_workers as the default value is 1 elsewhere in the code and perhaps this should be also be 1?

danbev · 2025-06-17T06:02:01Z

examples/server/server.cpp

+                wparams.token_timestamps = !task.params.no_timestamps && task.params.response_format == vjson_format;
+                wparams.no_context       = task.params.no_context;
+                wparams.suppress_nst     = task.params.suppress_nst;
+


Perhaps also add VAD parameters here similar to what is done in

whisper.cpp/examples/server/server.cpp

Lines 924 to 932 in 2a4d6db

wparams.vad = params.vad;

wparams.vad_model_path = params.vad_model.c_str();

wparams.vad_params.threshold = params.vad_threshold;

wparams.vad_params.min_speech_duration_ms = params.vad_min_speech_duration_ms;

wparams.vad_params.min_silence_duration_ms = params.vad_min_silence_duration_ms;

wparams.vad_params.max_speech_duration_s = params.vad_max_speech_duration_s;

wparams.vad_params.speech_pad_ms = params.vad_speech_pad_ms;

wparams.vad_params.samples_overlap = params.vad_samples_overlap;

danbev · 2025-06-17T06:02:16Z

examples/server/server.cpp

+    std::mutex queue_mutex_;
+    std::condition_variable queue_cv_;
+    std::mutex whisper_mutex_; // Protect whisper context access
+    std::atomic<bool> stop_flag_;


I think stop_flag_ could just be a bool as the accesses to it are all guarded by the queue_cv_ lock.

sachaarbonel · 2025-06-18T08:53:00Z

Lte's hold on this one. I ran some becnhmarks and tracing with NVIDIA Nsight and CPU is not the bottleneck

server: refactor concurrency model and enhance task management

a4df1af

danbev reviewed Jun 17, 2025

View reviewed changes

sachaarbonel marked this pull request as draft June 18, 2025 08:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(server): refactor concurrency model and enhance task management #3257

feat(server): refactor concurrency model and enhance task management #3257

Uh oh!

sachaarbonel commented Jun 16, 2025 •

edited

Loading

Uh oh!

danbev left a comment

Uh oh!

danbev Jun 16, 2025

Uh oh!

danbev Jun 16, 2025

Uh oh!

danbev Jun 16, 2025

Uh oh!

danbev Jun 16, 2025

Uh oh!

danbev Jun 17, 2025

Uh oh!

danbev Jun 17, 2025

Uh oh!

danbev Jun 17, 2025

Uh oh!

sachaarbonel commented Jun 18, 2025

Uh oh!

Uh oh!

	wparams.vad = params.vad;
	wparams.vad_model_path = params.vad_model.c_str();

	wparams.vad_params.threshold = params.vad_threshold;
	wparams.vad_params.min_speech_duration_ms = params.vad_min_speech_duration_ms;
	wparams.vad_params.min_silence_duration_ms = params.vad_min_silence_duration_ms;
	wparams.vad_params.max_speech_duration_s = params.vad_max_speech_duration_s;
	wparams.vad_params.speech_pad_ms = params.vad_speech_pad_ms;
	wparams.vad_params.samples_overlap = params.vad_samples_overlap;

feat(server): refactor concurrency model and enhance task management #3257

Are you sure you want to change the base?

feat(server): refactor concurrency model and enhance task management #3257

Uh oh!

Conversation

sachaarbonel commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Major Changes

1. Multi-threaded Task Queue for Inference

2. Graceful Shutdown and Robust Handling

3. Refactoring of Main Server Logic

4. Command-Line Interface and Usability Improvements

5. Minor Fixes and Improvements

Example: New Option

Uh oh!

danbev left a comment

Choose a reason for hiding this comment

Uh oh!

danbev Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

danbev Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

danbev Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

danbev Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

danbev Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

danbev Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

danbev Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

sachaarbonel commented Jun 18, 2025

Uh oh!

Uh oh!

sachaarbonel commented Jun 16, 2025 •

edited

Loading