RoBERTa weights do not have encoders with norm_first: true #30

seurimas · 2024-04-30T22:51:46Z

Through tracing transformers running the same model on a fill-mask task, I was able to determine that the execution of transformers and bert-burn diverge at the point where normalization happens.

Furthermore, using the lm_head weights for roberta-base and attaching a LM head model, I was able to verify that bert-burn's results are correct with norm_first: false, but entirely wrong for norm_first: true.

I'd be happy to provide a pull request, but I'm not sure whether other BERT models do use norm_first: true. I'm very new to machine learning and am not familiar with this family of models.

The text was updated successfully, but these errors were encountered:

nathanielsimard · 2024-05-08T20:27:34Z

Yeah I don't think Roberta actually uses norm_frist: true, maybe that was a mistake, happy to review your PR :)

seurimas mentioned this issue May 17, 2024

Add LM head and masked fill_mask for bert-burn. #34

Merged

nathanielsimard closed this as completed in #34 May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RoBERTa weights do not have encoders with norm_first: true #30

RoBERTa weights do not have encoders with norm_first: true #30

seurimas commented Apr 30, 2024

nathanielsimard commented May 8, 2024 •

edited

Loading

RoBERTa weights do not have encoders with norm_first: true #30

RoBERTa weights do not have encoders with norm_first: true #30

Comments

seurimas commented Apr 30, 2024

nathanielsimard commented May 8, 2024 • edited Loading

nathanielsimard commented May 8, 2024 •

edited

Loading