Strange grad_norm spikes with rsLoRA on LLaMA-3 #577

gotzmann · 2024-06-02T09:56:18Z

I always see not expected grad_norm spikes when training LLaMA-3 models with Unsloth and rsLoRA:

{'loss': 1.9848, 'grad_norm': 4.210731506347656, 'learning_rate': 1e-05, 'epoch': 0.0}
{'loss': 2.1115, 'grad_norm': 9.386985778808594, 'learning_rate': 1e-05, 'epoch': 0.0}
{'loss': 1.8286, 'grad_norm': 2.2225828170776367, 'learning_rate': 1e-05, 'epoch': 0.0}
{'loss': 1.9464, 'grad_norm': 1.6064600944519043, 'learning_rate': 1e-05, 'epoch': 0.01}
{'loss': 1.8348, 'grad_norm': 1.2319456338882446, 'learning_rate': 1e-05, 'epoch': 0.01}
{'loss': 1.744, 'grad_norm': 0.8763050436973572, 'learning_rate': 1e-05, 'epoch': 0.01}
{'loss': 1.9628, 'grad_norm': 1.1812210083007812, 'learning_rate': 1e-05, 'epoch': 0.01}
{'loss': 1.8272, 'grad_norm': 1.1029285192489624, 'learning_rate': 1e-05, 'epoch': 0.01}

It's OK without rsLoRA:

{'loss': 1.9862, 'grad_norm': 0.5418848991394043, 'learning_rate': 1e-05, 'epoch': 0.0}
{'loss': 1.9772, 'grad_norm': 0.5150394439697266, 'learning_rate': 1e-05, 'epoch': 0.0}
{'loss': 2.1005, 'grad_norm': 0.5437270998954773, 'learning_rate': 1e-05, 'epoch': 0.0}
{'loss': 2.0131, 'grad_norm': 0.4860772490501404, 'learning_rate': 1e-05, 'epoch': 0.0}
{'loss': 1.864, 'grad_norm': 0.43185532093048096, 'learning_rate': 1e-05, 'epoch': 0.0}
{'loss': 1.9323, 'grad_norm': 0.4057900011539459, 'learning_rate': 1e-05, 'epoch': 0.0}

It was OK too when I've trained LLaMA-2 models with Unsloth and rsLoRA.

So now I'm in doubt should I just give up rsLoRA or is it possible to fix?

The text was updated successfully, but these errors were encountered:

danielhanchen · 2024-06-02T15:56:53Z

RSLoRA scales the lora matrices by alpha / sqrt(rank), whilst normal lora is alpha / rank. This means RSLoRA will amplify the learning rate, so when using RSLoRA, use the learning rate as 5e-5 (or same in pretraining), to reduce grad spikes

gotzmann · 2024-06-05T10:03:52Z

RSLoRA will amplify the learning rate, so when using RSLoRA, use the learning rate as 5e-5

Do you mean 5e-6 ?

I'm already using even smaller value 1e-5, so not sure how higher might help.

danielhanchen · 2024-06-06T16:10:30Z

Sadly this isn't an Unsloth issue, but llama-3 being more sensitive to rsLoRA :( Another way is to reduce alpha by 1/2

BugReporterZ · 2024-09-06T19:52:57Z

It sounds like you shouldn't configure Alpha at all with rsLora.

https://huggingface.co/blog/damjan-k/rslora

Note again that to use rsLoRA by default instead of manually changing lora_alpha, one just adds use_rslora=True to the initializaiton of the LoraConfig, which in the alignment-handbook codebase would be changed in alignment-handbook/src/alignment /model_utils.py:

danielhanchen · 2024-09-07T07:02:21Z

@BugReporterZ Ye so the suggested alpha for RSLoRA is the same as the rank

gotzmann closed this as completed Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange grad_norm spikes with rsLoRA on LLaMA-3 #577

Strange grad_norm spikes with rsLoRA on LLaMA-3 #577

gotzmann commented Jun 2, 2024

danielhanchen commented Jun 2, 2024

gotzmann commented Jun 5, 2024

danielhanchen commented Jun 6, 2024

BugReporterZ commented Sep 6, 2024

danielhanchen commented Sep 7, 2024

Strange grad_norm spikes with rsLoRA on LLaMA-3 #577

Strange grad_norm spikes with rsLoRA on LLaMA-3 #577

Comments

gotzmann commented Jun 2, 2024

danielhanchen commented Jun 2, 2024

gotzmann commented Jun 5, 2024

danielhanchen commented Jun 6, 2024

BugReporterZ commented Sep 6, 2024

danielhanchen commented Sep 7, 2024