[WIP]Add TF BEiT Implementation #18559

MadElf1337 · 2022-08-10T15:55:27Z

Porting BEiT model from PyTorch to TensorFlow backend

What does this PR do?

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@amyeroberts @gante @LysandreJik @NielsRogge

MadElf1337 · 2022-08-10T15:59:02Z

@gante @amyeroberts Here's the WIP draft of BEiT!

Please tell me if I have done anything wrong, I'll make the changes right away!

Thanks!

HuggingFaceDocBuilderDev · 2022-08-10T16:07:58Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

amyeroberts · 2022-08-15T13:30:20Z

Hi @MadElf1337 - thanks for opening a PR and for adding this model! Outline looks good.

As a quick overview, I see two main things that you'll want to add (alongside docs and tests):

# Copied from in the TF data2vec model definition
TFBeitForXxx classes

Looking forward to seeing the full PR and having this model available for our TF users :)

MadElf1337 · 2022-08-15T15:55:28Z

@amyeroberts Sure! I'll make the changes!

MadElf1337 · 2022-09-04T05:59:39Z

@amyeroberts @gante So I think I'm done with the model, can you just look over it once while I'll finish writing the tests?

gante · 2022-09-05T08:51:20Z

@MadElf1337 From a quick glance, the model code looks fine 👍 As always, the devil is in the details, so you likely come across issues in the tests. Lets us know if you get stuck in a particular test (tip: breakpoint() + comparing to PT are your friends)

Will do an in-depth review when the tests are added.

amyeroberts · 2022-09-21T12:13:22Z

@MadElf1337 As discussed on the issue #18085 here for this model, we want to copy the relevant code in data2vec to modeling_tf_beit.py, then add the necessary #Copied from statements in modeling_tf_data2vec.py i.e. modeling_tf_beit.py and modeling_tf_data2vec.pyshould have the same structure and equivalent#Copied fromstatements as inmodeling_beit.pyandmodeling_data2vec.py`. Let me know if any of this isn't clear or you need any help.

MadElf1337 · 2022-09-21T14:32:54Z

Yeah yeah it was clear, just wanted to see if the broad architecture was written correctly or not, once I complete the tests(I’m a bit stuck on the attention output test for tf), I’ll do the formatting, add the comments and then ask for a complete review

amyeroberts · 2022-09-29T11:17:17Z

If you follow the same structure as the pytorch data2vec vision and beit, including the copied from statements, then almost all of the architecture considerations will be taken care of for you, and it will be easier for us as reviewers.

If you need any help with the tests, let us know and we can try and lend a hand.

MadElf1337 · 2022-09-29T16:07:26Z

Yeah so as I said, I just am stuck on the seq_len part in the attention output for TF, since that is one thing which is present in data2vec but not in BEIT, So just need to figure out that test

gante · 2022-10-03T14:18:41Z

Hey @MadElf1337 -- we've just released a guide for TF conversions, might come handy to you :D

https://huggingface.co/docs/transformers/main/en/add_tensorflow_model

MadElf1337 · 2022-10-03T15:13:13Z

Yep thanks!

Mostly done with the tests as well, just a little hiccup that will be solved soon, else I’ll make sure to ask for help!

MadElf1337 · 2022-10-28T15:12:35Z

@gante @amyeroberts Terribly sorry for the delay, had to deal with some personal stuff that could not be avoided.

I think I'm done writing the tests and the model, can I get a review to see if I've missed anything/done anything wrong?

Thanks!

(Also I'll add the comments of #Copied from TFData2vec in the final commit)

MadElf1337 · 2022-11-11T04:58:54Z

@amyeroberts @gante

Can I get a review please?

amyeroberts

Thanks for the update and for implementing this first pass. Structure looks good and ready for addition of all extra pieces of work e.g. making the models importable.

Few comments:

TFBeitModel is missing and will need to be implemented.
Some small copy-pasta nits with torch and data2vec
I'm asking you again to implement with the #Copied from statements. I will only review again once this is done. This isn't just for completeness - it helps checking that the architecture is correct and makes everything easier for both the reviewer and the implementer. As almost all of the architecture for data2vec is a copy of beit it does not require you to write, or us to review, a new stand-alone architecture implementation. This will ensure your PR gets merged faster. If you have any questions about how to do this, please do not hesitate to ask.

amyeroberts · 2022-09-21T12:03:56Z

src/transformers/models/beit/modeling_tf_beit.py

+        >>> image = Image.open(requests.get(url, stream=True).raw)
+        >>> feature_extractor = AutoFeatureExtractor.from_pretrained("microsoft/beit-base-patch16-224-pt22k-ft22k")
+        >>> model = TFBeitForSemanticSegmentation.from_pretrained("microsoft/beit-base-patch16-224-pt22k-ft22k")
+        >>> inputs = feature_extractor(images=image, return_tensors="pt")


Suggested change

>>> inputs = feature_extractor(images=image, return_tensors="pt")

>>> inputs = feature_extractor(images=image, return_tensors="tf")

tests/models/beit/test_modeling_tf_beit.py

src/transformers/models/beit/modeling_tf_beit.py

HuggingFaceDocBuilderDev · 2022-11-12T03:19:11Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

MadElf1337 · 2022-11-12T03:35:16Z

@amyeroberts Thanks for the review!

As suggested I've added the comments of #Copied from...(Sorry that you had to ask twice, I thought they were just comments and did not know that it was a part of the review process)
I've also added the missing code and the torch references have been changed!

HuggingFaceDocBuilderDev · 2022-11-12T03:45:12Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

amyeroberts · 2022-11-16T15:48:08Z

Hi @MadElf1337 - thanks for the updates and iterating so quickly after review.

There's still a few files that need to be added for the model to be importable and fully integrated into the library. The guidelines in the document @gante details these. Here's a recent model PR for reference. As the overall architecture looks good, this is the next step for this PR.

HuggingFaceDocBuilderDev · 2022-11-16T19:42:52Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

MadElf1337 · 2022-11-29T05:08:15Z

@amyeroberts @gante So I've done everything as specified in the docs(I think), can I get a review to see if I've missed anything?

MadElf1337 · 2022-12-05T12:12:34Z

Hey @amyeroberts @gante Can I get a review please?

amyeroberts · 2022-12-09T13:03:20Z

@MadElf1337 Thanks for the update!

The next stage for this PR is getting all of the tests running - the fun part! The tests aren't running at the moment as the models can't be imported:

E   ImportError: cannot import name 'TFBeitForImageClassification' from 'transformers' (/home/circleci/transformers/src/transformers/__init__.py)

One thing I can see that needs to be added is included the beit models in import_structure in the __init__.py e.g. here.

Some of the tests that are failing e.g. check_code_quality you can fix and/or find the issues by running make fixup locally.

Finally, the # Copied from statements should be added to the data2vec vision model in modeling_tf_data2vec_vision.py
and the ones in modeling_tf_beit.py removed.
# Copied from transformers.models.beit.modeling_beit.TFBeitModelOutputWithPooling with Beit->Data2VecVision

MadElf1337 · 2022-12-18T08:20:05Z

@amyeroberts Thanks for the review!

I can see that the original repo does not have the import structures in init.py, however I have added those to the init file in my dev branch, which is why it is showing a conflict for the same file

github-actions · 2023-01-11T15:05:30Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

MadElf1337 · 2023-01-11T15:27:09Z

Hey, can I know what to do next to solve the merge conflict?

gante · 2023-01-16T12:36:20Z

Hey @MadElf1337 -- You will have to rebase your PR with main :)

Get the latest main

git checkout main
git pull

Rebase

git checkout your_branch
git rebase origin/main

Handle conflicts manually (i.e. keep the desired changes and remove the unwanted ones in the conflicting files, and follow the instructions that git gives you)
Force-push your changes (force to avoid GitHub showing a diff of 666 files)

git push -u origin your_branch -f

MadElf1337 · 2023-01-17T04:57:49Z

There, I think I've solved the conflict but the test errors are occurring due to errors in data2vecvision

amyeroberts · 2023-01-17T16:28:02Z

@MadElf1337 Some of the failures are because the # Copied from statements point to a path that doesn't exist e.g.
# Copied from transformers.models.data2vec.modeling_data2vec_vision.TFData2VecVisionEmbeddings with Data2VecVision->Beit is copying the object TFData2VecVisionEmbeddings but is referring to the pytorch modeling file transformers.models.data2vec.modeling_data2vec_vision.

Note: The copied from statement should be in the modeling_tf_data2vec_vision.py file and should copy from the beit model e.g. # Copied from transformers.models.beit.modeling_tf_beit.TFBeitEmbeddings with Beit->Data2VecVision. There shouldn't be any # Copied from comments in the BEiT modeling file modeling_tf_beit.py.

If you run make fixup locally in the repo, you'll be able to reproduce the check_copies and it will make
the check_code_quality checks pass.

MadElf1337 · 2024-01-20T07:50:26Z

@amyeroberts Seems like now just the assertion errors remain, how do I go about solving those?

amyeroberts · 2024-01-23T12:55:28Z

@MadElf1337 That's not completely true. As I have asked many times in the past, please look at the circle CI errors e.g. these ones.

The assertion errors have the process to resolve as I have mentioned in the past here and here.

MadElf1337 · 2024-01-23T19:56:09Z

@amyeroberts Oh my bad, I overlooked the documentation errors :(

I'll fix them and the assertion errors immediately!

MadElf1337 · 2024-02-16T19:32:42Z

@amyeroberts I fixed the data2vec layer errors, now I get this for the output of the hidden_states and activations -

{'attentions[0]': 6.446242e-05,
 'attentions[10]': 3.9696693e-05,
 'attentions[11]': 0.00011232495,
 'attentions[1]': 4.529953e-06,
 'attentions[2]': 3.1590462e-06,
 'attentions[3]': 6.765127e-06,
 'attentions[4]': 1.8686056e-05,
 'attentions[5]': 1.2725592e-05,
 'attentions[6]': 1.424551e-05,
 'attentions[7]': 8.791685e-06,
 'attentions[8]': 7.587671e-05,
 'attentions[9]': 8.711219e-05,
 'pooler_output': 8.150935e-06}


{'hidden_states[0]': 1.7166138e-05,
 'hidden_states[10]': 0.00036287308,
 'hidden_states[11]': 0.000667572,
 'hidden_states[12]': 0.0011978149,
 'hidden_states[1]': 7.6293945e-05,
 'hidden_states[2]': 5.9127808e-05,
 'hidden_states[3]': 7.9631805e-05,
 'hidden_states[4]': 0.00015258789,
 'hidden_states[5]': 0.00015258789,
 'hidden_states[6]': 0.00018310547,
 'hidden_states[7]': 0.00018310547,
 'hidden_states[8]': 0.00022506714,
 'hidden_states[9]': 0.0003066063,
 'last_hidden_state': 0.0011978149}

Additionally, I get the following warning when I run the test - Some weights of BeitModel were not initialized from the model checkpoint at microsoft/beit-base-patch16-224-pt22k and are newly initialized: ['beit.pooler.layernorm.bias', 'beit.pooler.layernorm.weight']

I'm wondering if that's why I'm getting those assertion errors?

amyeroberts · 2024-02-19T19:04:58Z

@MadElf1337

Two things:

Some weights of BeitModel were not initialized from the model checkpoint at microsoft/beit-base-patch16-224-pt22k and are newly initialized: ['beit.pooler.layernorm.bias', 'beit.pooler.layernorm.weight']

This shouldn't be happening. These weights should be loaded in when you load a checkpoint. I'd investigate this first.

I'm wondering if that's why I'm getting those assertion errors?

I don't know. You'll be able to answer that by comparing the activations of the TF and PT model and see if they're similar before the pooler layer and not after.

MadElf1337 · 2024-02-19T20:36:19Z

@amyeroberts Yes, I fixed the weight init issue by using the MaskedImageModeling fix as mentioned in one of the issues, and I'm getting 0.00011232495 as the difference between the attentions before the pooling layers, when I compute the differences by using output.attentions[-1] for both PT and TF

amyeroberts · 2024-02-20T12:40:36Z

@MadElf1337 This is an incredibly long running PR. For context, we want most of our model PRs to be open for a few weeks at most - this has been open for over a year and a half. There's been a lot of upstream changes to our TF models, in particular how they are built, which would need to be incorporated here. For example, I can see in the diff for modeling_tf_data2vec_vision.py many of these necessary methods are now being removed.

We can of course help if there's weird behaviours in the repo, or you don't know how to add something, but adding and debugging the model is ultimately the contributor's responsibility. This includes finding out why there's differences between the models, which, looking at the tests at the moment are large. If you don't think you'll be able to resolve the conflicts and find make the TF and PT models equivalent within a month, then I'd suggest closing this PR.

MadElf1337 · 2024-02-20T14:25:23Z

@amyeroberts Yep, I'll fix everything and wrap it up now

MadElf1337 · 2024-03-04T08:51:05Z

@amyeroberts So I went through the entire model layers, and I found out where the difference is occuring.

It's the layer before the pooler, so there must be a problem in the layernorm

Attaching layers and differences below:

outputs.last_hidden_state
-------------2.9802322e-08
outputs.pooler_output
-------------4.7683716e-07
outputs.hidden_states_0
-------------0.0
outputs.hidden_states_1
-------------1.4901161e-08
outputs.hidden_states_2
-------------1.4901161e-08
outputs.hidden_states_3
-------------1.4901161e-08
outputs.hidden_states_4
-------------2.9802322e-08
outputs.attentions_0
-------------1.8626451e-09
outputs.attentions_1
-------------1.8626451e-09
outputs.attentions_2
-------------1.8626451e-09
outputs.attentions_3
-------------1.8626451e-09
outputs.last_hidden_state
-------------0.9371357

This last_hidden_state occurs just before the final pooler_output

amyeroberts · 2024-03-04T12:58:23Z

@MadElf1337 In your example, there's two outputs.last_hidden_state listed. What's the difference between the two? It seems very odd that a layer notm would cause this large a difference to suddenly arise but you can confirm by comparing the differences between the arrays for the PT and TF models before and after that layer

MadElf1337 · 2024-03-05T02:42:36Z

@amyeroberts Yeah I think you're right, because the test only errors out on layernorm but when I got the layerwise max abs diff here's what I get:

embeddings: Max Absolute Difference = 1.71661376953125e-05
encoder.layer_.0: Max Absolute Difference = 10.488643646240234
encoder.layer_.1: Max Absolute Difference = 6.2569780349731445
encoder.layer_.2: Max Absolute Difference = 63.09259033203125
encoder.layer_.3: Max Absolute Difference = 182.445556640625
encoder.layer_.4: Max Absolute Difference = 138.13844299316406
encoder.layer_.5: Max Absolute Difference = 95.3775634765625
encoder.layer_.6: Max Absolute Difference = 49.490692138671875
encoder.layer_.7: Max Absolute Difference = 19.959197998046875
encoder.layer_.8: Max Absolute Difference = 33.18023681640625
encoder.layer_.9: Max Absolute Difference = 145.5604248046875
encoder.layer_.10: Max Absolute Difference = 183.478515625
encoder.layer_.11: Max Absolute Difference = 145.6247100830078
layernorm: Max Absolute Difference = 0.0010194778442382812
pooler: Max Absolute Difference = 6.467103958129883e-06

The code I'm using is this:

import numpy as np
import torch
from PIL import Image

from transformers import BeitImageProcessor
from transformers.models.beit.configuration_beit import BeitConfig
from transformers.models.beit.modeling_beit import BeitModel
from transformers.models.beit.tf_test import TFBeitModel
from transformers.models.beit.modeling_tf_beit import TFBeitModelOutputWithPooling

img = Image.open("/home/madelf1337/Projects/transformers/tests/fixtures/tests_samples/COCO/000000039769.png")

img_processor = BeitImageProcessor.from_pretrained("microsoft/beit-base-patch16-224")

image1 = img_processor(images=img, return_tensors="pt")
image2 = img_processor(images=img, return_tensors="tf")

config = BeitConfig.from_pretrained(
    "microsoft/beit-base-patch16-224-pt22k", output_hidden_states=True, output_attentions=True
)

pt_outputs = BeitModel.from_pretrained("microsoft/beit-base-patch16-224-pt22k", config=config)
tf_outputs = TFBeitModel.from_pretrained("microsoft/beit-base-patch16-224-pt22k", config=config, from_pt=True)



with torch.no_grad():
    op1 = pt_outputs(**image1)
op2, tf_layerwise_outputs = tf_outputs(image2)
hidden_states = op1.hidden_states
layerwise_outputs = {}
layerwise_outputs["embeddings"] = hidden_states[0].numpy()
for i, layer_output in enumerate(hidden_states[1:]):
    layer_name = f"encoder.layer_.{i}"
    layerwise_outputs[layer_name] = layer_output.numpy()

layerwise_outputs["layernorm"] = op1.last_hidden_state.numpy()
if op1.pooler_output is not None:
    layerwise_outputs["pooler"] = op1.pooler_output.numpy()

for layer_name, pt_output in layerwise_outputs.items():
    tf_output = tf_layerwise_outputs[layer_name]
    abs_diff = np.amax(np.abs(pt_output - tf_output))
    print(f"{layer_name}: Max Absolute Difference = {abs_diff}")

Earlier I was adding stuff to the test itself so that I could see where the diff was occuring, and it errored out at the same layernorm with the message - AssertionError: outputs.last_hidden_state: Difference between torch and tf is 0.0010194778442382812 (>= 0.0002).

amyeroberts · 2024-03-05T13:07:39Z

@MadElf1337 Having large errors on the outputs of the layer doesn't tell you it's the layernorm - it tells you that the final activation differences are large. You'll need to compare the activations at each step within the layer to see where the differences are coming from

MadElf1337 · 2024-03-05T18:42:17Z

@amyeroberts Yes I've started going through each encoder layer now

MadElf1337 · 2024-03-21T07:06:54Z

@amyeroberts I finally got everything!
I know this has been quite long, but I really want to see this through to completion!

Max absolute difference for layer embeddings: 1.71661376953125e-05

Max absolute difference for layer encoder.layer_.0: 1.71661376953125e-05

Max absolute difference for layer encoder.layer_.0.attention_output: 1.2278556823730469e-05

Max absolute difference for layer encoder.layer_.0.attention_output_w_lambda: 1.4901161193847656e-05

Max absolute difference for layer encoder.layer_.0.residual_1: 1.71661376953125e-05

Max absolute difference for layer encoder.layer_.0.layernorm_after: 8.45193862915039e-05

Max absolute difference for layer encoder.layer_.0.residual_2: 5.817413330078125e-05

Max absolute difference for layer encoder.layer_.1: 5.817413330078125e-05

Max absolute difference for layer encoder.layer_.1.attention_output: 5.781650543212891e-06

Max absolute difference for layer encoder.layer_.1.attention_output_w_lambda: 2.3484230041503906e-05

Max absolute difference for layer encoder.layer_.1.residual_1: 5.054473876953125e-05

Max absolute difference for layer encoder.layer_.1.layernorm_after: 4.100799560546875e-05

Max absolute difference for layer encoder.layer_.1.residual_2: 4.291534423828125e-05

Max absolute difference for layer encoder.layer_.2: 4.291534423828125e-05

Max absolute difference for layer encoder.layer_.2.attention_output: 1.9431114196777344e-05

Max absolute difference for layer encoder.layer_.2.attention_output_w_lambda: 5.91278076171875e-05

Max absolute difference for layer encoder.layer_.2.residual_1: 6.532669067382812e-05

Max absolute difference for layer encoder.layer_.2.layernorm_after: 5.218386650085449e-05

Max absolute difference for layer encoder.layer_.2.residual_2: 6.771087646484375e-05

Max absolute difference for layer encoder.layer_.3: 6.771087646484375e-05

Max absolute difference for layer encoder.layer_.3.attention_output: 1.5497207641601562e-05

Max absolute difference for layer encoder.layer_.3.attention_output_w_lambda: 2.7179718017578125e-05

Max absolute difference for layer encoder.layer_.3.residual_1: 5.4836273193359375e-05

Max absolute difference for layer encoder.layer_.3.layernorm_after: 4.427134990692139e-05

Max absolute difference for layer encoder.layer_.3.residual_2: 9.1552734375e-05

Max absolute difference for layer encoder.layer_.4: 9.1552734375e-05

Max absolute difference for layer encoder.layer_.4.attention_output: 2.1338462829589844e-05

Max absolute difference for layer encoder.layer_.4.attention_output_w_lambda: 2.956390380859375e-05

Max absolute difference for layer encoder.layer_.4.residual_1: 9.1552734375e-05

Max absolute difference for layer encoder.layer_.4.layernorm_after: 4.151463508605957e-05

Max absolute difference for layer encoder.layer_.4.residual_2: 0.0001220703125

Max absolute difference for layer encoder.layer_.5: 0.0001220703125

Max absolute difference for layer encoder.layer_.5.attention_output: 1.0967254638671875e-05

Max absolute difference for layer encoder.layer_.5.attention_output_w_lambda: 3.814697265625e-05

Max absolute difference for layer encoder.layer_.5.residual_1: 0.0001220703125

Max absolute difference for layer encoder.layer_.5.layernorm_after: 6.580352783203125e-05

Max absolute difference for layer encoder.layer_.5.residual_2: 0.000152587890625

Max absolute difference for layer encoder.layer_.6: 0.000152587890625

Max absolute difference for layer encoder.layer_.6.attention_output: 1.5497207641601562e-05

Max absolute difference for layer encoder.layer_.6.attention_output_w_lambda: 5.227327346801758e-05

Max absolute difference for layer encoder.layer_.6.residual_1: 0.000152587890625

Max absolute difference for layer encoder.layer_.6.layernorm_after: 5.996227264404297e-05

Max absolute difference for layer encoder.layer_.6.residual_2: 0.000152587890625

Max absolute difference for layer encoder.layer_.7: 0.000152587890625

Max absolute difference for layer encoder.layer_.7.attention_output: 2.47955322265625e-05

Max absolute difference for layer encoder.layer_.7.attention_output_w_lambda: 0.00010347366333007812

Max absolute difference for layer encoder.layer_.7.residual_1: 0.000152587890625

Max absolute difference for layer encoder.layer_.7.layernorm_after: 5.507469177246094e-05

Max absolute difference for layer encoder.layer_.7.residual_2: 0.0002384185791015625

Max absolute difference for layer encoder.layer_.8: 0.0002384185791015625

Max absolute difference for layer encoder.layer_.8.attention_output: 6.711483001708984e-05

Max absolute difference for layer encoder.layer_.8.attention_output_w_lambda: 0.00019747018814086914

Max absolute difference for layer encoder.layer_.8.residual_1: 0.0002651214599609375

Max absolute difference for layer encoder.layer_.8.layernorm_after: 5.412101745605469e-05

Max absolute difference for layer encoder.layer_.8.residual_2: 0.000263214111328125

Max absolute difference for layer encoder.layer_.9: 0.000263214111328125

Max absolute difference for layer encoder.layer_.9.attention_output: 5.364418029785156e-05

Max absolute difference for layer encoder.layer_.9.attention_output_w_lambda: 0.000217437744140625

Max absolute difference for layer encoder.layer_.9.residual_1: 0.00028967857360839844

Max absolute difference for layer encoder.layer_.9.layernorm_after: 4.7206878662109375e-05

Max absolute difference for layer encoder.layer_.9.residual_2: 0.000339508056640625

Max absolute difference for layer encoder.layer_.10: 0.000339508056640625

Max absolute difference for layer encoder.layer_.10.attention_output: 2.3066997528076172e-05

Max absolute difference for layer encoder.layer_.10.attention_output_w_lambda: 0.00016999244689941406

Max absolute difference for layer encoder.layer_.10.residual_1: 0.000339508056640625

Max absolute difference for layer encoder.layer_.10.layernorm_after: 4.2825937271118164e-05

Max absolute difference for layer encoder.layer_.10.residual_2: 0.00052642822265625

Max absolute difference for layer encoder.layer_.11: 0.00052642822265625

Max absolute difference for layer encoder.layer_.11.attention_output: 9.34600830078125e-05

Max absolute difference for layer encoder.layer_.11.attention_output_w_lambda: 0.0004742145538330078

Max absolute difference for layer encoder.layer_.11.residual_1: 0.00067901611328125

Max absolute difference for layer encoder.layer_.11.layernorm_after: 7.05718994140625e-05

Max absolute difference for layer encoder.layer_.11.residual_2: 0.0010194778442382812

Max absolute difference for layer layernorm: 0.0010194778442382812

Here is the max abs diff across all layers, which is not spiking across the encoder layers!

MadElf1337 · 2024-03-22T00:21:57Z

@amyeroberts I think the error was occuring in the test because the test might be considering the base model, for which the checkpoint weights are not the correct initialization as described by @NielsRogge on this issue

amyeroberts · 2024-03-26T12:07:02Z

Here is the max abs diff across all layers, which is not spiking across the encoder layers!

@MadElf1337 You'll notice that it's still very high for the residual layers.

The linked initialization issue shouldn't affect the TF-PT cross tests, as whatever the weights are for the PT (randomly initialized or loaded from a checkpoint) they should be the same for the TF model.

In order for the PR to be reviewable, all the failing tests would need to be addressed.

amyeroberts · 2024-04-17T12:50:24Z

Hi @MadElf1337, I'm closing this PR.

There's a lot of upstream changes which have happened with TF models and even updates to the BEIT model, which mean this PR is increasingly diverging and hard to reconcile with the changes upstream. Model PRs should be open on the timescale of days or weeks, and now we're approaching two years. Thanks for your efforts in porting this model. Adding models is always a very large piece of work, particularly handling compatibility between frameworks.

If you're still interested in contributing to transformers, I'd suggest looking through issues tagged with Good first issue or Good second issue and seeing if any interest you. They're far more likely to be small in scope and enable you to add something quickly into the codebase.

MadElf1337 · 2024-04-17T15:18:27Z

@amyeroberts I understand, thanks for all the help till now!

I'll still continue with this model offline, and make all the adjustments necessary. Once done, I'll add it to the Hub, and if it's a valuable contribution maybe we can revisit this PR

Thanks for bearing with me!

amyeroberts reviewed Nov 11, 2022

View reviewed changes

MadElf1337 force-pushed the MadElf1337-dev branch from 01a38b5 to c214c57 Compare January 17, 2023 04:52

MadElf1337 added 2 commits January 19, 2024 23:41

fixing in_channels error

2b0d3ab

style fixes

b4bf5c1

Merge branch 'main' into MadElf1337-dev

964a18e

MadElf1337 and others added 6 commits February 3, 2024 12:23

test for doc errors

84257c4

Merge branch 'main' into MadElf1337-dev

7ad9506

fix whitespace

374094c

style fix

c454537

running make fix-copies

ad8f080

fixing keras error

07e06de

fixing errors and adding a test file

e28e801

testing file addition

e59422a

amyeroberts closed this Apr 17, 2024

	>>> inputs = feature_extractor(images=image, return_tensors="pt")
	>>> inputs = feature_extractor(images=image, return_tensors="tf")

[WIP]Add TF BEiT Implementation #18559

[WIP]Add TF BEiT Implementation #18559

Uh oh!

Conversation

MadElf1337 commented Aug 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

MadElf1337 commented Aug 10, 2022

Uh oh!

HuggingFaceDocBuilderDev commented Aug 10, 2022

Uh oh!

amyeroberts commented Aug 15, 2022

Uh oh!

MadElf1337 commented Aug 15, 2022

Uh oh!

MadElf1337 commented Sep 4, 2022

Uh oh!

gante commented Sep 5, 2022

Uh oh!

amyeroberts commented Sep 21, 2022

Uh oh!

MadElf1337 commented Sep 21, 2022

Uh oh!

amyeroberts commented Sep 29, 2022

Uh oh!

MadElf1337 commented Sep 29, 2022

Uh oh!

gante commented Oct 3, 2022

Uh oh!

MadElf1337 commented Oct 3, 2022

Uh oh!

MadElf1337 commented Oct 28, 2022

Uh oh!

MadElf1337 commented Nov 11, 2022

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

amyeroberts Sep 21, 2022

Choose a reason for hiding this comment

Uh oh!

MadElf1337 Nov 12, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Nov 12, 2022

Uh oh!

MadElf1337 commented Nov 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Nov 12, 2022

Uh oh!

amyeroberts commented Nov 16, 2022

Uh oh!

HuggingFaceDocBuilderDev commented Nov 16, 2022

Uh oh!

MadElf1337 commented Nov 29, 2022

Uh oh!

MadElf1337 commented Dec 5, 2022

Uh oh!

amyeroberts commented Dec 9, 2022

Uh oh!

MadElf1337 commented Dec 18, 2022

Uh oh!

github-actions bot commented Jan 11, 2023

Uh oh!

MadElf1337 commented Jan 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gante commented Jan 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MadElf1337 commented Jan 17, 2023

Uh oh!

MadElf1337 commented Aug 10, 2022 •

edited

Loading

MadElf1337 commented Nov 12, 2022 •

edited

Loading

MadElf1337 commented Jan 11, 2023 •

edited

Loading

gante commented Jan 16, 2023 •

edited

Loading

MadElf1337 commented Feb 19, 2024 •

edited

Loading