-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP]Add TF BEiT Implementation #18559
Conversation
@gante @amyeroberts Here's the WIP draft of BEiT! Please tell me if I have done anything wrong, I'll make the changes right away! Thanks! |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
Hi @MadElf1337 - thanks for opening a PR and for adding this model! Outline looks good. As a quick overview, I see two main things that you'll want to add (alongside docs and tests):
Looking forward to seeing the full PR and having this model available for our TF users :) |
@amyeroberts Sure! I'll make the changes! |
@amyeroberts @gante So I think I'm done with the model, can you just look over it once while I'll finish writing the tests? |
@MadElf1337 From a quick glance, the model code looks fine 👍 As always, the devil is in the details, so you likely come across issues in the tests. Lets us know if you get stuck in a particular test (tip: Will do an in-depth review when the tests are added. |
@MadElf1337 As discussed on the issue #18085 here for this model, we want to copy the relevant code in data2vec to |
Yeah yeah it was clear, just wanted to see if the broad architecture was written correctly or not, once I complete the tests(I’m a bit stuck on the attention output test for tf), I’ll do the formatting, add the comments and then ask for a complete review |
If you follow the same structure as the pytorch data2vec vision and beit, including the copied from statements, then almost all of the architecture considerations will be taken care of for you, and it will be easier for us as reviewers. If you need any help with the tests, let us know and we can try and lend a hand. |
Yeah so as I said, I just am stuck on the seq_len part in the attention output for TF, since that is one thing which is present in data2vec but not in BEIT, So just need to figure out that test |
Hey @MadElf1337 -- we've just released a guide for TF conversions, might come handy to you :D https://huggingface.co/docs/transformers/main/en/add_tensorflow_model |
Yep thanks! Mostly done with the tests as well, just a little hiccup that will be solved soon, else I’ll make sure to ask for help! |
@gante @amyeroberts Terribly sorry for the delay, had to deal with some personal stuff that could not be avoided. I think I'm done writing the tests and the model, can I get a review to see if I've missed anything/done anything wrong? Thanks! (Also I'll add the comments of #Copied from TFData2vec in the final commit) |
Can I get a review please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update and for implementing this first pass. Structure looks good and ready for addition of all extra pieces of work e.g. making the models importable.
Few comments:
TFBeitModel
is missing and will need to be implemented.- Some small copy-pasta nits with torch and data2vec
- I'm asking you again to implement with the
#Copied from
statements. I will only review again once this is done. This isn't just for completeness - it helps checking that the architecture is correct and makes everything easier for both the reviewer and the implementer. As almost all of the architecture for data2vec is a copy of beit it does not require you to write, or us to review, a new stand-alone architecture implementation. This will ensure your PR gets merged faster. If you have any questions about how to do this, please do not hesitate to ask.
>>> image = Image.open(requests.get(url, stream=True).raw) | ||
>>> feature_extractor = AutoFeatureExtractor.from_pretrained("microsoft/beit-base-patch16-224-pt22k-ft22k") | ||
>>> model = TFBeitForSemanticSegmentation.from_pretrained("microsoft/beit-base-patch16-224-pt22k-ft22k") | ||
>>> inputs = feature_extractor(images=image, return_tensors="pt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> inputs = feature_extractor(images=image, return_tensors="pt") | |
>>> inputs = feature_extractor(images=image, return_tensors="tf") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
@amyeroberts Thanks for the review!
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
Hi @MadElf1337 - thanks for the updates and iterating so quickly after review. There's still a few files that need to be added for the model to be importable and fully integrated into the library. The guidelines in the document @gante details these. Here's a recent model PR for reference. As the overall architecture looks good, this is the next step for this PR. |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
@amyeroberts @gante So I've done everything as specified in the docs(I think), can I get a review to see if I've missed anything? |
Hey @amyeroberts @gante Can I get a review please? |
@MadElf1337 Thanks for the update! The next stage for this PR is getting all of the tests running - the fun part! The tests aren't running at the moment as the models can't be imported:
One thing I can see that needs to be added is included the beit models in Some of the tests that are failing e.g. Finally, the |
@amyeroberts Thanks for the review! I can see that the original repo does not have the import structures in init.py, however I have added those to the init file in my dev branch, which is why it is showing a conflict for the same file |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hey, can I know what to do next to solve the merge conflict? |
Hey @MadElf1337 -- You will have to rebase your PR with main :)
|
01a38b5
to
c214c57
Compare
There, I think I've solved the conflict but the test errors are occurring due to errors in data2vecvision |
@MadElf1337 Some of the failures are because the Note: The copied from statement should be in the If you run |
@amyeroberts Seems like now just the assertion errors remain, how do I go about solving those? |
@MadElf1337 That's not completely true. As I have asked many times in the past, please look at the circle CI errors e.g. these ones. The assertion errors have the process to resolve as I have mentioned in the past here and here. |
@amyeroberts Oh my bad, I overlooked the documentation errors :( I'll fix them and the assertion errors immediately! |
@amyeroberts I fixed the data2vec layer errors, now I get this for the output of the
Additionally, I get the following warning when I run the test - I'm wondering if that's why I'm getting those assertion errors? |
Two things:
This shouldn't be happening. These weights should be loaded in when you load a checkpoint. I'd investigate this first.
I don't know. You'll be able to answer that by comparing the activations of the TF and PT model and see if they're similar before the pooler layer and not after. |
@amyeroberts Yes, I fixed the weight init issue by using the MaskedImageModeling fix as mentioned in one of the issues, and I'm getting |
@MadElf1337 This is an incredibly long running PR. For context, we want most of our model PRs to be open for a few weeks at most - this has been open for over a year and a half. There's been a lot of upstream changes to our TF models, in particular how they are built, which would need to be incorporated here. For example, I can see in the diff for We can of course help if there's weird behaviours in the repo, or you don't know how to add something, but adding and debugging the model is ultimately the contributor's responsibility. This includes finding out why there's differences between the models, which, looking at the tests at the moment are large. If you don't think you'll be able to resolve the conflicts and find make the TF and PT models equivalent within a month, then I'd suggest closing this PR. |
@amyeroberts Yep, I'll fix everything and wrap it up now |
@amyeroberts So I went through the entire model layers, and I found out where the difference is occuring. It's the layer before the pooler, so there must be a problem in the Attaching layers and differences below:
This |
@MadElf1337 In your example, there's two |
@amyeroberts Yeah I think you're right, because the test only errors out on layernorm but when I got the layerwise max abs diff here's what I get:
The code I'm using is this:
Earlier I was adding stuff to the test itself so that I could see where the diff was occuring, and it errored out at the same layernorm with the message - |
@MadElf1337 Having large errors on the outputs of the layer doesn't tell you it's the layernorm - it tells you that the final activation differences are large. You'll need to compare the activations at each step within the layer to see where the differences are coming from |
@amyeroberts Yes I've started going through each encoder layer now |
@amyeroberts I finally got everything!
Here is the max abs diff across all layers, which is not spiking across the encoder layers! |
@amyeroberts I think the error was occuring in the test because the test might be considering the base model, for which the checkpoint weights are not the correct initialization as described by @NielsRogge on this issue |
@MadElf1337 You'll notice that it's still very high for the residual layers. The linked initialization issue shouldn't affect the TF-PT cross tests, as whatever the weights are for the PT (randomly initialized or loaded from a checkpoint) they should be the same for the TF model. In order for the PR to be reviewable, all the failing tests would need to be addressed. |
Hi @MadElf1337, I'm closing this PR. There's a lot of upstream changes which have happened with TF models and even updates to the BEIT model, which mean this PR is increasingly diverging and hard to reconcile with the changes upstream. Model PRs should be open on the timescale of days or weeks, and now we're approaching two years. Thanks for your efforts in porting this model. Adding models is always a very large piece of work, particularly handling compatibility between frameworks. If you're still interested in contributing to transformers, I'd suggest looking through issues tagged with |
@amyeroberts I understand, thanks for all the help till now! I'll still continue with this model offline, and make all the adjustments necessary. Once done, I'll add it to the Hub, and if it's a valuable contribution maybe we can revisit this PR Thanks for bearing with me! |
Porting BEiT model from PyTorch to TensorFlow backend
What does this PR do?
Fixes #18085
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@amyeroberts @gante @LysandreJik @NielsRogge