You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In terms of training procedure, based on what described in the paper, should someone use this to train on Squad dataset from bert-large-uncased-whole-word-masking-finetuned-squad first, then train on TechQA to reach those checkpoint accuracy performance?
Did you use the default gradient_accumulation_steps 16?
Did you train it on multi-GPU with the same learning rate?
The text was updated successfully, but these errors were encountered:
Are the hyperparameters you have in the README what you used to produce the checkpoint-1 & 2 you shared (with different seeds)?
In terms of training procedure, based on what described in the paper, should someone use this to train on Squad dataset from
bert-large-uncased-whole-word-masking-finetuned-squad
first, then train on TechQA to reach those checkpoint accuracy performance?The text was updated successfully, but these errors were encountered: