Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to reproduce the PT-2 results of RTE in Table 1 #9

Closed
CSerxy opened this issue Nov 18, 2021 · 5 comments
Closed

Unable to reproduce the PT-2 results of RTE in Table 1 #9

CSerxy opened this issue Nov 18, 2021 · 5 comments

Comments

@CSerxy
Copy link

CSerxy commented Nov 18, 2021

I have some questions about rebuilding the PT-2 results of RTE in Table 1.

My base model is RoBERTa-large, I trained the model for 10 epochs with the recommended parameters (prompt length = 4, learning rate = 1e-2 as suggested in previous issue).

However, I can only get roughly 58% accuracy on the RTE dev set.

I am not sure whether the below factor would cause this, hope the authors can give me some hints, many thanks!

  1. what is the training epoch you used for training RTE?
  2. if I understand correctly, you are tuning both the classification head and the inserted prompts in each layer, right? In this case, would the initialization matter? And a followed question is how did you do the initialization?
  3. I notice that you insert the prompts before the [CLS], is there any specific reason to insert them before the [CLS]?
  4. I wonder if you are using the vanilla roberta-large checkpoint?
@Xiao9905
Copy link
Member

Thanks for your interest in our work! Our newly released codes may be helpful on your problem.

@CSerxy
Copy link
Author

CSerxy commented Dec 5, 2021

Hi Xiao,

I found your newly released codes do help to rebuild the PT-2 results. However, I am only able to rebuild it based on "--prefix" instead of "--prompt". Can you clarify what's the main difference between them? In addition, can you clarify which one is the one you described in your paper? Many thanks!

@Xiao9905
Copy link
Member

Xiao9905 commented Dec 5, 2021

Hi @CSerxy ,
--prefix means P-tuning v2, while --prompt means PT (i.e., vanilla P-tuning & Prompt Tuning). We have not verified the correctness of released codes for PT. Please use at your own risk.

@CSerxy
Copy link
Author

CSerxy commented Dec 5, 2021

Hi Xiao,

Thanks for the quick response!!

I am a little confused by the --prefix part code. Can you help me address the below question?

It seems in your paper (Ptuningv2) you describe a model that inserts prompt in front of each layer. For example, let's assume it is prompt with length 5. Then the tunable parameter will be 5 * 1024 * 24 assuming the hidden size is 1024 and the number of layers is 24. This is what you described in your paper, right?

However, when I look at your implementation, first of all, I found the number of parameters is doubled. The tunable parameter is 5 * 1024 * 24 * 2 when I set --pre_seq_len=5 in model/prefix_encoder.py. And I found the doubled parameters is to generate a length-5 key_layer and a length-5 value_layer, which is different from the model you mentioned in the paper. I wonder if I understand correctly?

Sincerely appreciate your help and look forward to your reply!

@Xiao9905
Copy link
Member

Xiao9905 commented Dec 7, 2021

This is an implementation trick we inherit from prefix-tuning, that if we do not want to change the original BERT codes, we have to leverage the past_key_values argument, which passes previous computed keys and values to attention computation.

Originally, keys and values of prefix tokens should be computed from their hidden states using projection matrix K & V in an attention head. Here we directly passes keys and values into it without using K & V to compute from hidden states (as prefix-tuning does), which in fact doubles the parameters of prefix embeddings. In practice, we find it has almost the same performance with the original implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants