Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training to reproduce your shared checkpoint-1 &2 #1

Open
yinyinl opened this issue Sep 27, 2022 · 0 comments
Open

Training to reproduce your shared checkpoint-1 &2 #1

yinyinl opened this issue Sep 27, 2022 · 0 comments

Comments

@yinyinl
Copy link

yinyinl commented Sep 27, 2022

Are the hyperparameters you have in the README what you used to produce the checkpoint-1 & 2 you shared (with different seeds)?

python run_techqa.py \
    --model_type bert \
    --model_name_or_path bert-large-uncased-whole-word-masking-finetuned-squad \
    --do_lower_case \
    --num_train_epochs 20 \
    --learning_rate 5.5e-6 \
    --do_train \
    --train_file <PATH TO training_Q_A.json> \
    --do_eval \
    --predict_file <PATH TO dev_Q_A.json> \
    --input_corpus_file <PATH TO training_dev_technotes.json> \
    --per_gpu_train_batch_size 4 \
    --predict_batch_size 16 \
    --overwrite_output_dir \
    --output_dir <PATH TO OUTPUT FOLDER> \ 
    --add_doc_title_to_passage \

In terms of training procedure, based on what described in the paper, should someone use this to train on Squad dataset from bert-large-uncased-whole-word-masking-finetuned-squad first, then train on TechQA to reach those checkpoint accuracy performance?

  • Did you use the default gradient_accumulation_steps 16?
  • Did you train it on multi-GPU with the same learning rate?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant