Add multi-LoRA support for more architectures #2602

Yard1 · 2024-01-25T17:29:54Z

Currently, multi-LoRA supports only Llama and Mistral architectures. We should extend this functionality to all architectures.

Yi, Qwen, Phi and Mixtral architectures seem to be the most demanded right now.

One challenge will be ensuring that all allowed weight shapes are supported by punica kernels. We may need to investigate some sort of padding there.

Originally posted by @Yard1 in #1804 (comment)

TaeWoo21 · 2024-01-26T07:33:52Z

Is it possible to add "GPT-NeoX" as well?

Cloopen-ReLiNK · 2024-01-30T11:40:32Z

how about "chatglm" ?

FurtherAI · 2024-02-05T00:17:28Z

@Yard1 I'm interested in extending this to other architectures, do you want to meet and talk about the problems that need to be solved to get it working?

nightflight-dk · 2024-04-26T18:01:19Z

+1, bump for Phi-3.0 (any Phi right now)

jjjjohnson · 2024-05-09T10:20:36Z

+1 for Qwen

jiauy · 2024-06-14T08:34:10Z

+1 for Qwen

hmellor added the feature request label Apr 4, 2024

skonto mentioned this issue Jun 18, 2024

Multi-Lora support kserve/kserve#3750

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-LoRA support for more architectures #2602

Add multi-LoRA support for more architectures #2602

Yard1 commented Jan 25, 2024

TaeWoo21 commented Jan 26, 2024

Cloopen-ReLiNK commented Jan 30, 2024

FurtherAI commented Feb 5, 2024

nightflight-dk commented Apr 26, 2024

jjjjohnson commented May 9, 2024

jiauy commented Jun 14, 2024

Add multi-LoRA support for more architectures #2602

Add multi-LoRA support for more architectures #2602

Comments

Yard1 commented Jan 25, 2024

TaeWoo21 commented Jan 26, 2024

Cloopen-ReLiNK commented Jan 30, 2024

FurtherAI commented Feb 5, 2024

nightflight-dk commented Apr 26, 2024

jjjjohnson commented May 9, 2024

jiauy commented Jun 14, 2024