Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Phi-3.5-vision #36036

Open
wants to merge 33 commits into
base: main
Choose a base branch
from
Open

Conversation

Dahlbomii
Copy link
Contributor

@Dahlbomii Dahlbomii commented Feb 4, 2025

Draft PR for now, still need to add tests and convert to modular

cc @Rocketknight1

Fixes #36071

@Rocketknight1
Copy link
Member

Rocketknight1 commented Feb 5, 2025

Looks good so far! The TODO list from here is:

  • Make the transformers imports relative imports so that we stop getting circular import errors in the CI
  • Add the phi3_5.md doc (you can copy the layout from a similar model and copy the text from the model card)
  • pip install transformers[quality] and make fixup or make style to get the repo consistency checks green

Once everything is green, then:

  • Add tests (you can copy them from another similar VLM and just change model names etc.)

Once the new tests are green as well, then:

  • Convert the modeling and configuration files to modular, then regenerating the modeling/config files and confirm everything passes

At that point, the PR should be finished!

Sorry, something went wrong.

@Rocketknight1 Rocketknight1 mentioned this pull request Feb 13, 2025
2 tasks
@ArthurZucker
Copy link
Collaborator

Fixes #36166!

@Dahlbomii
Copy link
Contributor Author

I've run into an issue when trying to run the model. From Modeling_phi3_v.py I'm getting "AttributeError: 'DynamicCache' object has no attribute 'get_max_length'" This might be due to a change to the DynamicCache class, but I really can't tell!

@zucchini-nlp
Copy link
Member

@Dahlbomii hey! Yes, we deprecated get_max_length from cache a few versions ago, now it has to be Cache.get_max_cache_shape(layer_idx)

@Rocketknight1
Copy link
Member

Woah - this means the remote code version of Phi-3.5-vision-instruct is broken on main right now and requires a fixed older version of transformers. Definitely makes this PR more urgent/important!

@Rocketknight1
Copy link
Member

cc @zucchini-nlp @gante I can't figure this one out either! The test_generate_continue_from_inputs_embeds test is failing for Phi-3V but I don't really understand how generate() handles inputs_embeds - when I added some breakpoints it seems to generate an input_ids tensor with shape (batch_size, 0), which then causes errors in Phi.

@zucchini-nlp
Copy link
Member

zucchini-nlp commented Feb 28, 2025

Seems like Phi-3.5V was not updated for the latest changes we had in generation, probably it has its own custom generation loop. For this test, we can call super().prepare_input_for_generation(kwargs) to get all the inputs processed correctly. Then image related kwargs can be set to None if needed. You can take a look on how we did it in other VLMs, hope it solves the issue

@Rocketknight1
Copy link
Member

@zucchini-nlp yes, that was it, sorry! They don't override generate() but they do override prepare_input_for_generation() and discard inputs_embeds after the first step.

Rocketknight1 and others added 4 commits March 10, 2025 19:15
@Dahlbomii
Copy link
Contributor Author

@Rocketknight1 for reasons that escape me, when the test processor tries to call self.get_component for the tokenizer, it breaks. Any insight as to why?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

modeling_phi3 errors with AttributeError: 'DynamicCache' object has no attribute 'get_max_length'
4 participants