Skip to content

2.25.2.0-b294

@spolitov spolitov tagged this 08 Apr 05:04
Summary:
## Adding Idle Timeout to rpc::ThreadPool

### Current Situation

The existing `rpc::ThreadPool` implementation has the following behavior:
- Threads are created when needed to handle tasks
- Once created, threads run indefinitely
- There is no mechanism to terminate idle threads

This design was originally sufficient because:
1. The thread pool had limited usage in our codebase
2. The simple implementation met initial requirements
3. Long-running threads didn't pose significant resource concerns

### Emerging Need

As our usage of `rpc::ThreadPool` has expanded:
- More components now rely on this thread pool
- Usage patterns have become more variable (bursts of activity followed by idle periods)
- Resource efficiency has become more important

### Proposed Enhancement

We should add idle timeout functionality that would:
- Allow specifying a maximum idle duration for worker threads
- Automatically scale down excess threads during low activity periods

### Benefits

1. **Resource Efficiency**: Reclaims memory and system resources from unused threads
2. **Adaptive Scaling**: Better matches thread count to actual workload
3. **Consistency**: Aligns with common thread pool implementations in other systems
4. **Configurability**: Allows tuning based on specific use case requirements

## Improved Task Submission in rpc::ThreadPool

### Current Implementation

The existing task submission works as follows:
1. **Task Queueing**: All tasks are unconditionally added to a central task queue
2. **Worker Notification**: A worker thread is notified to process the task
3. **Queue Processing**: The worker retrieves the task from the queue

### New Implementation

We've optimized the workflow to be more direct:
1. **Worker Availability Check**: First look for an idle worker thread
2. **Direct Task Assignment**: If available, pass the task directly to the worker
3. **Fallback to Queueing**: Only use the task queue if no workers are immediately available

## Switched from Queue to Stack for Waiting Workers

### Problem with Queue-Based Worker Waiting

When using idle timeouts with a traditional queue approach:
1. All waiting workers form a FIFO queue
2. New tasks wake workers in the order they went to sleep
3. This spreads tasks across many workers unnecessarily
4. Makes it harder to release idle workers since many get occasional work

### Stack-Based Solution

We now use a stack (LIFO) structure for waiting workers because it has the following key advantages:
   - Keeps a minimal set of workers active
   - Lets other workers hit their idle timeout and terminate
Jira: DB-16084

Test Plan: ThreadPoolTest.TestMultiProducers

Reviewers: timur, slingam

Reviewed By: timur

Subscribers: slingam, ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D43015
Assets 2
Loading