Could Anybody Reproduce the Results? The BERT and RoBERT results? (Resolved) #12

Hyusheng · 2021-12-21T04:24:41Z

I run their scripts, and only got lower F1-scores (dev_result:{'dev_F1': 61.39554434636402, 'dev_F1_ign': 59.42344205967282, 'dev_re_p': 63.68710211912444, 'dev_re_r': 59.2631664367443, 'dev_average_loss': 0.3790786044299603}).

TimelordRi · 2021-12-27T14:39:32Z

Hi! There are two probable reasons for this issue:

To fine-tune the RoBERTa-large model, a larger batch size may help you get higher F1-score.
We use 4 GPUs with models NVIDIA GeForce RTX 3090 in training time. Thus the hyper-parameters tuning may be necessary to reproduce the result in your configurations.

Since our model and default setting of hyper-parameters is friendly to the BERT-base and RoBERTa-base model fine-tuning, it's more efficient to reproduce the result on the X-base models. We suggest you try this way.
Thx!

MingYangi · 2022-03-16T02:16:24Z

When I was running CDR, I also encountered this problem, the performance was very low, around 62. However, I did not change any code, only changed some paths, I do not know why this happened, have you solved this problem?

zxlzr · 2022-03-16T02:30:04Z

Hi buddy, there are many reasons for this situation. I suggest you re-check the following steps, if you have any problems. feel free to contact us:

Do you use the correct pre-trained language model? For CDR, the model is SciBERT-base.
Do you use the correct hyper-parameters? Because we set different hyper-parameters (learning rate, batch size) for different datasets. I think it is necessary to conduct hyper-parameters tuning with dev set. Besides, maybe you are using different GPU so the batch size are different which will definitely influence the results. A large batch size may help you obtain a better F1 score.

I hope those tips can help you reproduce the results.

Thx!

MingYangi · 2022-03-16T02:35:46Z

First of all, thank you very much for your reply and for being so timely. Thank you again! Yes, for CDR, I did use Scibert, but for GPU, I only have one, so I changed the batch size to 2, but the other super parameters are not changed, run_cdr.sh is run, will the batch size affect so much, looking forward to your advice!

zxlzr · 2022-03-16T02:46:43Z

I think the major reason may be the batch size, you can watch the loss to check whether the model convergence or not. Maybe training more steps will result better performance. Besides, you can use fp16 for large batchsize.

MingYangi · 2022-03-16T02:51:36Z

I think the major reason may be the batch size, you can watch the loss to check whether the model convergence or not. Maybe training more steps will result better performance. Besides, you can use fp16 for large batchsize.

Thanks for the advice, but what I don't understand is that the batch size will affect so much? Would it affect 10%+? Have you tried a similar experiment?

zxlzr · 2022-03-16T03:00:00Z

I think the major reason may be the batch size, you can watch the loss to check whether the model convergence or not. Maybe training more steps will result better performance. Besides, you can use fp16 for large batchsize.

Thanks for the advice, but what I don't understand is that the batch size will affect so much? Would it affect 10%+? Have you tried a similar experiment?

Maybe deep learning is such a hyperparameter sensitive methodology, and we don't want this to happen either. We will try to conduct an analysis on batch size in future.

ZhangYi0621 · 2022-04-03T04:18:20Z

When I was running CDR, I also encountered this problem, the performance was very low, around 62. However, I did not change any code, only changed some paths, I do not know why this happened, have you solved this problem?

Hello, I have the same question with you! I use 1 GPU with batch_size=4 and get f1=0.64, it is even lower than ATLOP with the same hyper-parameters.
So, have you achieved the best scores?

zxlzr · 2022-04-03T04:32:25Z

When I was running CDR, I also encountered this problem, the performance was very low, around 62. However, I did not change any code, only changed some paths, I do not know why this happened, have you solved this problem?

Hello, I have the same question with you! I use 1 GPU with batch_size=4 and get f1=0.64, it is even lower than ATLOP with the same hyper-parameters. So, have you achieved the best scores?

Hello, do you use the default experimental setting? Some other researchers have already reproduce this performance and even obtain much better results with hyperparameter tuning (such as #13 (comment)). Maybe the following situation account for the reason.

Do you use the SciBERTbase as the pre-trained language model?
For the CDR dataset, we use one NVIDIA V100 16GB GPU
and evaluated our model with Ign F1, and F，do you use the right evaluation metric?

If you have any question, feel free to contact us.

ZhangYi0621 · 2022-04-03T04:41:15Z

When I was running CDR, I also encountered this problem, the performance was very low, around 62. However, I did not change any code, only changed some paths, I do not know why this happened, have you solved this problem?

Hello, I have the same question with you! I use 1 GPU with batch_size=4 and get f1=0.64, it is even lower than ATLOP with the same hyper-parameters. So, have you achieved the best scores?

Hello, do you use the default experimental setting? Some other researchers have already reproduce this performance and even obtain much better results with hyperparameter tuning (such as #13 (comment)). Maybe the following situation account for the reason.

Do you use the SciBERTbase as the pre-trained language model? For the CDR dataset, we use one NVIDIA V100 16GB GPU and evaluated our model with Ign F1, and F，do you use the right evaluation metric?

If you have any question, feel free to contact us.

Thanks for your work and reply!
I just repreoduced the result by replacing the train.data to train_filter.data!

zxlzr closed this as completed Jan 5, 2022

zxlzr changed the title ~~Could Anybody Reproduce the Results? The BERT and RoBERT results?~~ Could Anybody Reproduce the Results? The BERT and RoBERT results? (Resolved) Apr 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could Anybody Reproduce the Results? The BERT and RoBERT results? (Resolved) #12

Could Anybody Reproduce the Results? The BERT and RoBERT results? (Resolved) #12

Hyusheng commented Dec 21, 2021 •

edited

Loading

TimelordRi commented Dec 27, 2021 •

edited

Loading

MingYangi commented Mar 16, 2022

zxlzr commented Mar 16, 2022

MingYangi commented Mar 16, 2022

zxlzr commented Mar 16, 2022

MingYangi commented Mar 16, 2022

zxlzr commented Mar 16, 2022

ZhangYi0621 commented Apr 3, 2022

zxlzr commented Apr 3, 2022

ZhangYi0621 commented Apr 3, 2022

Could Anybody Reproduce the Results? The BERT and RoBERT results? (Resolved) #12

Could Anybody Reproduce the Results? The BERT and RoBERT results? (Resolved) #12

Comments

Hyusheng commented Dec 21, 2021 • edited Loading

TimelordRi commented Dec 27, 2021 • edited Loading

MingYangi commented Mar 16, 2022

zxlzr commented Mar 16, 2022

MingYangi commented Mar 16, 2022

zxlzr commented Mar 16, 2022

MingYangi commented Mar 16, 2022

zxlzr commented Mar 16, 2022

ZhangYi0621 commented Apr 3, 2022

zxlzr commented Apr 3, 2022

ZhangYi0621 commented Apr 3, 2022

Hyusheng commented Dec 21, 2021 •

edited

Loading

TimelordRi commented Dec 27, 2021 •

edited

Loading