Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RoBERTa weights do not have encoders with norm_first: true #30

Closed
seurimas opened this issue Apr 30, 2024 · 1 comment · Fixed by #34
Closed

RoBERTa weights do not have encoders with norm_first: true #30

seurimas opened this issue Apr 30, 2024 · 1 comment · Fixed by #34

Comments

@seurimas
Copy link
Contributor

Through tracing transformers running the same model on a fill-mask task, I was able to determine that the execution of transformers and bert-burn diverge at the point where normalization happens.

Furthermore, using the lm_head weights for roberta-base and attaching a LM head model, I was able to verify that bert-burn's results are correct with norm_first: false, but entirely wrong for norm_first: true.

I'd be happy to provide a pull request, but I'm not sure whether other BERT models do use norm_first: true. I'm very new to machine learning and am not familiar with this family of models.

@nathanielsimard
Copy link
Member

nathanielsimard commented May 8, 2024

Yeah I don't think Roberta actually uses norm_frist: true, maybe that was a mistake, happy to review your PR :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants