You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When do further pre-training on my own datas the ppl is too much high for example 709. I have 3582619 examples, and use batch size=8, epoch=3, learing rate=5e-5. Is there any advice ? Thanks a lot!
The text was updated successfully, but these errors were encountered:
the further pre-trained task is masked language model, not language model, therefore using ppl i think may not be a good metric.
can you set your batch size larger or using gradient accumulate? and you can check a accruacy of masked language model as well as the loss curve to check the further pre-training.
When do further pre-training on my own datas the ppl is too much high for example 709. I have 3582619 examples, and use batch size=8, epoch=3, learing rate=5e-5. Is there any advice ? Thanks a lot!
The text was updated successfully, but these errors were encountered: