use a smaller LR? #4

152334H · 2023-03-14T05:55:48Z

The Karparthy Constant used currently might be too high? The loss for this training run is not going down after LR increases beyond ~1e-4:

tloen · 2023-03-14T06:56:50Z

I've observed similar — updating to 2e-5 as in the original paper. I'll check in the morning if it works better.

152334H · 2023-03-14T07:28:02Z

this doesn't seem to actually help. Purple line is with 2e-5, green line 3e-4.

Maybe this is just the best result obtainable with the q_proj/v_proj parameters?

tloen · 2023-03-14T07:30:01Z

Have you evaluated the model quality? I've always suspected that instruct-tuning is much less data-intensive than most people think.

tloen · 2023-03-14T07:31:15Z

Could also be worth trying to disable the int8 quantization or increase the matrix rank. Will check tomorrow.

kesar · 2023-03-15T21:55:03Z

same here.

only changed:
BATCH_SIZE = 256
MICRO_BATCH_SIZE = 5

tloen · 2023-03-16T03:59:27Z

Fwiw I've been able to eke out some small gains setting LORA_R to 8 instead of 4. Otherwise, seems (until proven otherwise) like both learning rates are perfectly fine.

tloen closed this as completed Mar 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use a smaller LR? #4

use a smaller LR? #4

152334H commented Mar 14, 2023

tloen commented Mar 14, 2023 •

edited

Loading

152334H commented Mar 14, 2023

tloen commented Mar 14, 2023

tloen commented Mar 14, 2023 •

edited

Loading

kesar commented Mar 15, 2023

tloen commented Mar 16, 2023

use a smaller LR? #4

use a smaller LR? #4

Comments

152334H commented Mar 14, 2023

tloen commented Mar 14, 2023 • edited Loading

152334H commented Mar 14, 2023

tloen commented Mar 14, 2023

tloen commented Mar 14, 2023 • edited Loading

kesar commented Mar 15, 2023

tloen commented Mar 16, 2023

tloen commented Mar 14, 2023 •

edited

Loading

tloen commented Mar 14, 2023 •

edited

Loading