New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pretraining loss is increasing #124
Comments
Can I see your command line for running train_gpu.py? |
I have the same issue and below is my params.
The loss is about 5.76 ,which is increasing and too big
|
after 200k steps, still not decreasing,
this is my command running python3 train_gpu.py \
--corpus_info_path=save-location/corpus_info.json \
--record_info_dir=save-location/tfrecords \
--train_batch_size=4 \
--seq_len=512 \
--reuse_len=256 \
--mem_len=384 \
--perm_size=256 \
--n_layer=12 \
--d_model=512 \
--d_embed=512 \
--n_head=16 \
--d_head=64 \
--d_inner=2048 \
--untie_r=True \
--mask_alpha=6 \
--mask_beta=1 \
--num_predict=85 \
--model_dir=output-model \
--uncased=True \
--num_core_per_host=1 \
--train_steps=200000 Sounds like |
After 200k, just try my luck to finetune on 200 sentences, purposely make it small and to show that the model able to overfit (means that it able to learn). After run for different 20 random seeds, the loss is maintain, learning_rate = 5e-5
batch_size = 10
MAX_SEQ_LENGTH = 128 I just duplicate same code from my notebook, finetune xlnet large, https://github.com/huseinzol05/NLP-Models-Tensorflow/blob/master/text-classification/72.xlnet-large.ipynb (able to learn properly). train minibatch loop: 100%|██████████| 20/20 [00:02<00:00, 8.70it/s, accuracy=0.6, cost=0.678]
|
Sure!
|
Wow, your models seems so big 👍 , What is your GPU memory? |
I have 2 GPUs (Tesla V100) whose ram is 32GB each :) |
I tried to increase batch size to become 32, but need to reduce sequence length, and tested on very small dataset (100 sentences), surprisingly the loss can reduce to 0.4X. Seems like batch size is very important here. I pretrained BERT model before, and because hardware limitation, my batch size is very small and accuracy from my pretrained still very good, plus able to finetune very well and beat multilanguage BERT. But for XLNet, I cannot achieved the same / similar thing with very small batch size. |
After change learning rate to |
After 10k steps, my loss around 4.X, looks good to me. |
|
Yeah,you have better decrease params such as batch size and seq_len |
Done 200k steps,
Looks perfect, closing this issue. |
@bzantium |
for 200k, it took 5 days with two 32GB Tesla V100 GPUs. |
@bzantium I have 1,600,000 sentences. |
Right now I do pretraining for Malaysia language. I got my own dataset collected from wikipedia, social media and public news. Everything is perfect, it just, the loss increasing,
First 12k steps are fine, after that, its increasing. totally normal or not?
The text was updated successfully, but these errors were encountered: