Feature Request: Dynamic CPU/GPU Switching

## Is your feature request related to a problem? Please describe.

Currently,  BERT processing typically runs on a fixed device (CPU or GPU), without dynamic adaptation based on query complexity. This leads to suboptimal resource utilization:
- **Simple queries** processed on GPU waste valuable GPU resources.
- **Complex queries** processed on CPU suffer from slow inference times.
- There is no mechanism to **dynamically switch** between CPU and GPU based on actual computational demands.

## Describe the solution you'd like

###  Automatic CPU/GPU Switching
Implement a **dynamic resource manager** that:
- Estimates the computational complexity of incoming queries.
- Automatically routes simple queries to CPU and complex ones to GPU.
- Balances latency, throughput, and hardware utilization.

This s could leverage profiling metrics such as token length, syntactic complexity, or historical processing times to make real-time decisions. 

To reduce GPU transfer overhead and improve throughput, introduce a **queue-based batching** inspired by **continuous batching** in vLLM:
- GPU-bound queries are batched efficiently.
- Data transfers and computation are overlapped where possible.
- CPU and GPU workloads are decoupled via a shared batch queue.

Example design:

```go
type ResourceManager struct {
    CPUProcessor  *CPUProcessor
    GPUProcessor  *GPUProcessor
    Profiler      *ComputeProfiler
    Queue         *BatchQueue
}

func (rm *ResourceManager) ProcessQuery(query string) (*ClassificationResult, error) {
    // 1. Estimate query complexity
    complexity := rm.Profiler.EstimateComplexity(query)
    
    // 2. Route based on compute bounds
    if complexity.IsCPUBound() {
        return rm.CPUProcessor.Process(query)
    } else if complexity.IsGPUBound() {
        return rm.Queue.AddToBatch(query) // Enqueue for GPU batch processing
    }
    
    // Fallback to CPU
    return rm.CPUProcessor.Process(query)
}
```

## Describe alternatives you've considered
#### Static Profiling at Startup: 
Run a benchmark with sample data during initialization to calibrate CPU compute-bound vs. I/O-bound thresholds based on system configuration.

## Additional context
If this feature aligns with the project's roadmap, would it be possible to assign this issue to me? I’d appreciate the opportunity to start working on it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Dynamic CPU/GPU Switching #338

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Automatic CPU/GPU Switching

Describe alternatives you've considered

Static Profiling at Startup:

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Dynamic CPU/GPU Switching #338

Description

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Automatic CPU/GPU Switching

Describe alternatives you've considered

Static Profiling at Startup:

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions