Training to reproduce your shared checkpoint-1 &2 #1

yinyinl · 2022-09-27T15:59:33Z

Are the hyperparameters you have in the README what you used to produce the checkpoint-1 & 2 you shared (with different seeds)?

python run_techqa.py \
    --model_type bert \
    --model_name_or_path bert-large-uncased-whole-word-masking-finetuned-squad \
    --do_lower_case \
    --num_train_epochs 20 \
    --learning_rate 5.5e-6 \
    --do_train \
    --train_file <PATH TO training_Q_A.json> \
    --do_eval \
    --predict_file <PATH TO dev_Q_A.json> \
    --input_corpus_file <PATH TO training_dev_technotes.json> \
    --per_gpu_train_batch_size 4 \
    --predict_batch_size 16 \
    --overwrite_output_dir \
    --output_dir <PATH TO OUTPUT FOLDER> \ 
    --add_doc_title_to_passage \

In terms of training procedure, based on what described in the paper, should someone use this to train on Squad dataset from bert-large-uncased-whole-word-masking-finetuned-squad first, then train on TechQA to reach those checkpoint accuracy performance?

Did you use the default gradient_accumulation_steps 16?
Did you train it on multi-GPU with the same learning rate?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training to reproduce your shared checkpoint-1 &2 #1

Training to reproduce your shared checkpoint-1 &2 #1

yinyinl commented Sep 27, 2022

Training to reproduce your shared checkpoint-1 &2 #1

Training to reproduce your shared checkpoint-1 &2 #1

Comments

yinyinl commented Sep 27, 2022