You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi team,
I tried to do QLora for 30B llama with unsloth. I found that there is no much improvement on speed and memory usage. The detaild are as following.
seq_length=8192
batch size=1
use flash attn=true
gradient_checkpointing=true
1. The speed has only increased by about 3s, which is very different from the acceleration ratio mentioned in the document.
2. nvidia-smi: with unsloth: 35G, w/o unsloth: 39G. Only 10% less.
3. The value of the loss is abnormal.
Hey @mxjyst. Do you have a reproducible example for non unsloth? Have you tried our Colab notebooks to confirm?
Also did you do benchmarking with unsloth first then hf in one script since unsloth first patches it.
All our benchmarking code is public for everyone to confirm ie see our HF blog post https://huggingface.co/blog/unsloth-trl in which HF did 3rd party benchmarking. Likewise llama factory and many others have confirmed our benchmarking
Hi team,
I tried to do QLora for 30B llama with unsloth. I found that there is no much improvement on speed and memory usage. The detaild are as following.
seq_length=8192
batch size=1
use flash attn=true
gradient_checkpointing=true
With unslosh:
without unslosh:
1. The speed has only increased by about 3s, which is very different from the acceleration ratio mentioned in the document.
2. nvidia-smi: with unsloth: 35G, w/o unsloth: 39G. Only 10% less.
3. The value of the loss is abnormal.
here is the code:
Is there some setting I'm missing? Looking forward to your reply.
The text was updated successfully, but these errors were encountered: