System Info
tensorrtllm backend doesn't work for us because of this bug: #598. So I have to use python backend. However, it only supports detached model which we don't need.
Can we add support for non-detached mode?
Who can help?
@ncomly-nvidia
Information
Tasks
Reproduction
N/A
Expected behavior
N/A
actual behavior
N/A
additional notes
N/A