hight perplexity when Further Pre-Training #19

MrRace · 2021-10-02T04:51:44Z

When do further pre-training on my own datas the ppl is too much high for example 709. I have 3582619 examples, and use batch size=8, epoch=3, learing rate=5e-5. Is there any advice ? Thanks a lot!

xuyige · 2021-10-09T19:07:24Z

the further pre-trained task is masked language model, not language model, therefore using ppl i think may not be a good metric.
can you set your batch size larger or using gradient accumulate? and you can check a accruacy of masked language model as well as the loss curve to check the further pre-training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hight perplexity when Further Pre-Training #19

hight perplexity when Further Pre-Training #19

MrRace commented Oct 2, 2021 •

edited

Loading

xuyige commented Oct 9, 2021

hight perplexity when Further Pre-Training #19

hight perplexity when Further Pre-Training #19

Comments

MrRace commented Oct 2, 2021 • edited Loading

xuyige commented Oct 9, 2021

MrRace commented Oct 2, 2021 •

edited

Loading