Unable to reproduce the PT-2 results of RTE in Table 1 #9

CSerxy · 2021-11-18T05:53:20Z

I have some questions about rebuilding the PT-2 results of RTE in Table 1.

My base model is RoBERTa-large, I trained the model for 10 epochs with the recommended parameters (prompt length = 4, learning rate = 1e-2 as suggested in previous issue).

However, I can only get roughly 58% accuracy on the RTE dev set.

I am not sure whether the below factor would cause this, hope the authors can give me some hints, many thanks!

what is the training epoch you used for training RTE?
if I understand correctly, you are tuning both the classification head and the inserted prompts in each layer, right? In this case, would the initialization matter? And a followed question is how did you do the initialization?
I notice that you insert the prompts before the [CLS], is there any specific reason to insert them before the [CLS]?
I wonder if you are using the vanilla roberta-large checkpoint?

Xiao9905 · 2021-11-23T13:10:21Z

Thanks for your interest in our work! Our newly released codes may be helpful on your problem.

CSerxy · 2021-12-05T03:50:20Z

Hi Xiao,

I found your newly released codes do help to rebuild the PT-2 results. However, I am only able to rebuild it based on "--prefix" instead of "--prompt". Can you clarify what's the main difference between them? In addition, can you clarify which one is the one you described in your paper? Many thanks!

Xiao9905 · 2021-12-05T04:02:04Z

Hi @CSerxy ,
--prefix means P-tuning v2, while --prompt means PT (i.e., vanilla P-tuning & Prompt Tuning). We have not verified the correctness of released codes for PT. Please use at your own risk.

CSerxy · 2021-12-05T04:21:37Z

Hi Xiao,

Thanks for the quick response!!

I am a little confused by the --prefix part code. Can you help me address the below question?

It seems in your paper (Ptuningv2) you describe a model that inserts prompt in front of each layer. For example, let's assume it is prompt with length 5. Then the tunable parameter will be 5 * 1024 * 24 assuming the hidden size is 1024 and the number of layers is 24. This is what you described in your paper, right?

However, when I look at your implementation, first of all, I found the number of parameters is doubled. The tunable parameter is 5 * 1024 * 24 * 2 when I set --pre_seq_len=5 in model/prefix_encoder.py. And I found the doubled parameters is to generate a length-5 key_layer and a length-5 value_layer, which is different from the model you mentioned in the paper. I wonder if I understand correctly?

Sincerely appreciate your help and look forward to your reply!

Xiao9905 · 2021-12-07T13:50:54Z

This is an implementation trick we inherit from prefix-tuning, that if we do not want to change the original BERT codes, we have to leverage the past_key_values argument, which passes previous computed keys and values to attention computation.

Originally, keys and values of prefix tokens should be computed from their hidden states using projection matrix K & V in an attention head. Here we directly passes keys and values into it without using K & V to compute from hidden states (as prefix-tuning does), which in fact doubles the parameters of prefix embeddings. In practice, we find it has almost the same performance with the original implementation.

Xiao9905 mentioned this issue Dec 5, 2021

hi, not found finetune and prompt tuning code #12

Closed

CSerxy closed this as completed Dec 7, 2021

Xiao9905 mentioned this issue Dec 11, 2021

Bug for BertPrompt series code? #15

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to reproduce the PT-2 results of RTE in Table 1 #9

Unable to reproduce the PT-2 results of RTE in Table 1 #9

CSerxy commented Nov 18, 2021

Xiao9905 commented Nov 23, 2021

CSerxy commented Dec 5, 2021

Xiao9905 commented Dec 5, 2021

CSerxy commented Dec 5, 2021 •

edited

Xiao9905 commented Dec 7, 2021

Unable to reproduce the PT-2 results of RTE in Table 1 #9

Unable to reproduce the PT-2 results of RTE in Table 1 #9

Comments

CSerxy commented Nov 18, 2021

Xiao9905 commented Nov 23, 2021

CSerxy commented Dec 5, 2021

Xiao9905 commented Dec 5, 2021

CSerxy commented Dec 5, 2021 • edited

Xiao9905 commented Dec 7, 2021

CSerxy commented Dec 5, 2021 •

edited