New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeBERTa P-Tuning v2 speed #36
Comments
The same holds for DeBERTa V1. |
I have a question that which is faster between p-tuning-v2 and fine-tune. To my understanding, p-tuning-v2 should be slowly than p-tuning, that becaues it updates more parameter than p-tuning. |
While P-Tuning v1 has fewer parameters to update, the evaluation of the attention mechanism for this scheme requires For P-Tuning v2, the complexity is equal to |
I observed that the evaluation of P-Tuning v2 for a prefix with the length |
Alright, thanks for your patient explanations, that make me relaize i have a shortage of model's architecture details. And i don't know much about DeBERTa. Maybe somebody can help you. Best, |
@kefirski Hi, Thanks for your interest in our work! Sorry for the late reply.
Yes, I think this is the reason. At the time when we are experimenting P-Tuning v2, DeBERTa had not yet being officially implemented in huggingface transformers, and thus we implement the past_key_value functions ourselves. We are sorry that our own implementation can be slower than the huggingface official implemented one. |
I've observed that training DeBERTaV2 with P-Tuning v2 takes significantly more time to evaluate than other methods. Have you observed such behaviour?
It even takes significantly more time than P-Tuning v1 despite the fact that v1 have larger complexity to evaluate attention.
It seems like the issue is the ad-hoc implementation of
past_key_values
for DeBERTa, which is the only difference in the code of backbone model between v1 and v2, but can't figure out the specific reason for so.The text was updated successfully, but these errors were encountered: