Open
Description
Hello,
Thank you for your interesting research work.
I have 10 experts trained based on the Phi 3 model (datasets selected based on paper cluttering). I have used the TRL and PEFT libraries for training, ensuring the checkpoint structures are suitable for these libraries.
In training the experts, I used LoRA in 4-bit quantized mode. Additionally, I utilized the o and kqv attention in each layer during training.
I would like to know how I can use your code to execute Arrow for merging these experts for each token in every model layer.
I have some errors in the code.
please explain step by step. I am a beginner in this field.
Thank you, and I would appreciate your response.
Metadata
Metadata
Assignees
Labels
No labels