Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange grad_norm spikes with rsLoRA on LLaMA-3 #577

Closed
gotzmann opened this issue Jun 2, 2024 · 5 comments
Closed

Strange grad_norm spikes with rsLoRA on LLaMA-3 #577

gotzmann opened this issue Jun 2, 2024 · 5 comments

Comments

@gotzmann
Copy link

gotzmann commented Jun 2, 2024

I always see not expected grad_norm spikes when training LLaMA-3 models with Unsloth and rsLoRA:

{'loss': 1.9848, 'grad_norm': 4.210731506347656, 'learning_rate': 1e-05, 'epoch': 0.0}
{'loss': 2.1115, 'grad_norm': 9.386985778808594, 'learning_rate': 1e-05, 'epoch': 0.0}
{'loss': 1.8286, 'grad_norm': 2.2225828170776367, 'learning_rate': 1e-05, 'epoch': 0.0}
{'loss': 1.9464, 'grad_norm': 1.6064600944519043, 'learning_rate': 1e-05, 'epoch': 0.01}
{'loss': 1.8348, 'grad_norm': 1.2319456338882446, 'learning_rate': 1e-05, 'epoch': 0.01}
{'loss': 1.744, 'grad_norm': 0.8763050436973572, 'learning_rate': 1e-05, 'epoch': 0.01}
{'loss': 1.9628, 'grad_norm': 1.1812210083007812, 'learning_rate': 1e-05, 'epoch': 0.01}
{'loss': 1.8272, 'grad_norm': 1.1029285192489624, 'learning_rate': 1e-05, 'epoch': 0.01}

It's OK without rsLoRA:

{'loss': 1.9862, 'grad_norm': 0.5418848991394043, 'learning_rate': 1e-05, 'epoch': 0.0}
{'loss': 1.9772, 'grad_norm': 0.5150394439697266, 'learning_rate': 1e-05, 'epoch': 0.0}
{'loss': 2.1005, 'grad_norm': 0.5437270998954773, 'learning_rate': 1e-05, 'epoch': 0.0}
{'loss': 2.0131, 'grad_norm': 0.4860772490501404, 'learning_rate': 1e-05, 'epoch': 0.0}
{'loss': 1.864, 'grad_norm': 0.43185532093048096, 'learning_rate': 1e-05, 'epoch': 0.0}
{'loss': 1.9323, 'grad_norm': 0.4057900011539459, 'learning_rate': 1e-05, 'epoch': 0.0}

It was OK too when I've trained LLaMA-2 models with Unsloth and rsLoRA.

So now I'm in doubt should I just give up rsLoRA or is it possible to fix?

@danielhanchen
Copy link
Contributor

RSLoRA scales the lora matrices by alpha / sqrt(rank), whilst normal lora is alpha / rank. This means RSLoRA will amplify the learning rate, so when using RSLoRA, use the learning rate as 5e-5 (or same in pretraining), to reduce grad spikes

@gotzmann
Copy link
Author

gotzmann commented Jun 5, 2024

RSLoRA will amplify the learning rate, so when using RSLoRA, use the learning rate as 5e-5

Do you mean 5e-6 ?

I'm already using even smaller value 1e-5, so not sure how higher might help.

@danielhanchen
Copy link
Contributor

Sadly this isn't an Unsloth issue, but llama-3 being more sensitive to rsLoRA :( Another way is to reduce alpha by 1/2

@gotzmann gotzmann closed this as completed Jun 7, 2024
@BugReporterZ
Copy link

It sounds like you shouldn't configure Alpha at all with rsLora.

https://huggingface.co/blog/damjan-k/rslora

Note again that to use rsLoRA by default instead of manually changing lora_alpha, one just adds use_rslora=True to the initializaiton of the LoraConfig, which in the alignment-handbook codebase would be changed in alignment-handbook/src/alignment /model_utils.py:

@danielhanchen
Copy link
Contributor

@BugReporterZ Ye so the suggested alpha for RSLoRA is the same as the rank

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants