Skip to content

Dynamic batching not working #609

@ShuaiShao93

Description

@ShuaiShao93

System Info

x86_64, Debian, GPU A100

Who can help?

@byshiue @schetlur-nv

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. Compile LLAMA3.1 8B Instruct to trt llm
  2. Fill template with command
python3 tools/fill_template.py -i all_models/inflight_batcher_llm/tensorrt_llm/config.pbtxt triton_backend:tensorrtllm,triton_max_batch_size:64,decoupled_mode:False,max_beam_width:1,engine_dir:${ENGINE_PATH},max_tokens_in_paged_kv_cache:2560,max_attention_window_size:2560,kv_cache_free_gpu_mem_fraction:0.5,exclude_input_in_output:True,enable_kv_cache_reuse:False,batching_strategy:inflight_fused_batching,max_queue_delay_microseconds:0
  1. Start triton with all_models/inflight_batcher_llm/ensemble
  2. Use a multi-threaded client to send multiple requests in parallel
  3. Check the logs

Expected behavior

When Active Request Count is greater than 1, Scheduled Requests should also be greater than 1

actual behavior

See this in the log

I1002 23:45:42.282246 136 model_instance_state.cc:1115] "{\"Active Request Count\":8,\"Iteration Counter\":6189,\"Max Request Count\":8,\"Runtime CPU Memory Usage\":3060,\"Runtime GPU Memory Usage\":1427313739,\"Runtime Pinned Memory Usage\":637534388,\"Timestamp\":\"10-02-2024 23:45:42.275675\",\"Context Requests\":0,\"Generation Requests\":1,\"MicroBatch ID\":0,\"Paused Requests\":0,\"Scheduled Requests\":1,\"Total Context Tokens\":0,\"Free KV cache blocks\":25,\"Max KV cache blocks\":40,\"Tokens per KV cache block\":64,\"Used KV cache blocks\":15,\"Reused KV cache blocks\":0}"

additional notes

Why is Scheduled Requests always 1?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions