Dynamic batching not working

### System Info

x86_64, Debian, GPU A100

### Who can help?

 @byshiue @schetlur-nv

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [X] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

1. Compile LLAMA3.1 8B Instruct to trt llm
2. Fill template with command
```
python3 tools/fill_template.py -i all_models/inflight_batcher_llm/tensorrt_llm/config.pbtxt triton_backend:tensorrtllm,triton_max_batch_size:64,decoupled_mode:False,max_beam_width:1,engine_dir:${ENGINE_PATH},max_tokens_in_paged_kv_cache:2560,max_attention_window_size:2560,kv_cache_free_gpu_mem_fraction:0.5,exclude_input_in_output:True,enable_kv_cache_reuse:False,batching_strategy:inflight_fused_batching,max_queue_delay_microseconds:0
```
4. Start triton with all_models/inflight_batcher_llm/ensemble
2. Use a multi-threaded client to send multiple requests in parallel
5. Check the logs

### Expected behavior

When Active Request Count is greater than 1, Scheduled Requests should also be greater than 1

### actual behavior

See this in the log
```
I1002 23:45:42.282246 136 model_instance_state.cc:1115] "{\"Active Request Count\":8,\"Iteration Counter\":6189,\"Max Request Count\":8,\"Runtime CPU Memory Usage\":3060,\"Runtime GPU Memory Usage\":1427313739,\"Runtime Pinned Memory Usage\":637534388,\"Timestamp\":\"10-02-2024 23:45:42.275675\",\"Context Requests\":0,\"Generation Requests\":1,\"MicroBatch ID\":0,\"Paused Requests\":0,\"Scheduled Requests\":1,\"Total Context Tokens\":0,\"Free KV cache blocks\":25,\"Max KV cache blocks\":40,\"Tokens per KV cache block\":64,\"Used KV cache blocks\":15,\"Reused KV cache blocks\":0}"
```

### additional notes

Why is `Scheduled Requests` always 1?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dynamic batching not working #609

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dynamic batching not working #609

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions