-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replicating SuperGLUE benchmark #19
Comments
Hi @Shamdan17, sure! Here's the exact command for training ALBERT with iPET on RTE: python3 cli.py \
--method ipet \
--pattern_ids 0 1 2 3 \
--data_dir ${PATH_TO_YOUR_DATA_DIR} \
--model_type albert \
--model_name_or_path albert-xxlarge-v2 \
--task_name rte \
--output_dir ${PATH_TO_YOUR_OUTPUT_DIR} \
--do_train \
--do_eval \
--pet_per_gpu_eval_batch_size 8 \
--pet_per_gpu_train_batch_size 2 \
--pet_gradient_accumulation_steps 8 \
--pet_max_steps 250 \
--pet_max_seq_length 256 \
--sc_per_gpu_train_batch_size 2 \
--sc_per_gpu_unlabeled_batch_size 2 \
--sc_gradient_accumulation_steps 8 \
--sc_max_steps 5000 \
--sc_max_seq_length 256 If you want to train a model with PET rather than iPET, simply replace
--pet_per_gpu_train_batch_size 1 \
--pet_gradient_accumulation_steps 16 \
--sc_per_gpu_train_batch_size 1 \
--sc_per_gpu_unlabeled_batch_size 1 \
--sc_gradient_accumulation_steps 16 \
--pet_per_gpu_train_batch_size 4 \
--pet_gradient_accumulation_steps 4 \
--sc_per_gpu_train_batch_size 4 \
--sc_per_gpu_unlabeled_batch_size 4 \
--sc_gradient_accumulation_steps 4 \
|
Thank you very much! I greatly appreciate it. Looking forward to your future work :) |
Another question, do you not use auxiliary LM loss for superGLUE? |
No, we did not use auxiliary LM loss. This would have required a batch size of at least 4 (the ratio of labeled and unlabeled examples in the original PET paper is 1:3), which was not possible using a single GPU. |
I see, makes sense, thanks again. Just one last question (I hope), just to make sure I'm not doing something wrong, is the sc_per_gpu_train_batch_size flag necessary in this case? From what I saw in the code, once you are using the use_logits flag for distillation, you only work with the unlabeled dataloader and discard the original training dataloader. Is that correct? or is there another place where you need this flag? Thanks a lot again for your time :) |
You are absolutely correct! I've used the same script for both regular training and PET/iPET, which is why I always updated both |
Hi, as I see RTE has 5 patterns with 0,1,2,3,4 in the codes of pvp.py, is this intentional to only use 0,1,2,3 in the command above you mentioned? similarly multirc has 0,1,2,3 patterns and not 0,1,2, thanks |
Hi @dorost1234, for these tasks, the last pattern is always the one used by GPT-3. We originally did not include these patterns, so if you want to reproduce our main results in Table 1, you should not use them. However, if you want to reproduce the pcomb results in Table 2 (where we use a combination of our patterns and the GPT-3 pattern, which leads to better performance), you should include it. |
Hello! I'm trying to replicate your work, and I'm currently comparing the performance of my replication to your implementation. Just to be sure, could you please provide me the exact commands you used for training the SuperGLUE tasks? I would be very grateful. Thank you!
The text was updated successfully, but these errors were encountered: