[Usage]: Vllm inference slower for LoRA models #3979

akrish2011 · 2024-04-10T18:20:49Z

The output of `python collect_env.py`

when running LoRA trained models using Vllm see lower inference speed when compared to Non-LoRA trained models. Is there anything causing this ?

The text was updated successfully, but these errors were encountered:

jeejeelee · 2024-04-11T01:52:16Z

please refer to: #2829 and #1804

mgoin · 2024-08-02T15:25:30Z

This should be lessened by about 2x with the new landed Triton kernels #5036

akrish2011 added the usage How to use vllm label Apr 10, 2024

mgoin closed this as completed Aug 2, 2024

Provide feedback