-
Notifications
You must be signed in to change notification settings - Fork 29.5k
Support batch size > 1 image-text inference #36682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the |
2d81f59
to
03e338e
Compare
Question before reviewing: why we pass an empty list for no-image prompt? What if we just do |
@zucchini-nlp Assuming the batch size is 2, we expect that the length of image lists should be the same is the batch size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, makes sense. Also cc @yonigozlan since you added these functions, do you see any edge cases if we check any
?
Otherwise LGTM
Hi @hiyouga ! Thanks for flagging this issue. I agree we should support inputs such as The issue I see is that we wouldn't catch an error now if we have |
@yonigozlan agreed, I think we can expect users to use consistent format within one input. @hiyouga there's a failing test which is caused by this PR i think, can you take a look? |
ac56330
to
0b9acfc
Compare
Hi @zucchini-nlp , I have made necessary changes to |
images = [self.image1] | ||
with self.assertRaises(ValueError): | ||
processor(text=text, images=images, padding=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't get why this doesn't throw error anymore, IMO passing flat images is ambiguous, and we throw errors instead of trying to infer which text corresponds to which image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hiyouga great, thanks for handling the tests!
I see why we need to flatten images with the new changes, but i don't like calling it every time when one image is needed. I'd suggest to save one image in a variable at the beginning and add a small comment we we do that, so future us don't delete it :)
e2c82a4
to
5d4a4fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @hiyouga, here are a few more comments 🤗
@@ -357,16 +358,19 @@ def preprocess( | |||
|
|||
# All transformations expect numpy arrays. | |||
images_list = [[to_numpy_array(image) for image in images] for images in images_list] | |||
# Search for the first image in the image list. | |||
# NOTE: we can't slice the first image with images_list[0][0] if the first batch contains no images. See #36682 | |||
first_image_in_list = make_flat_list_of_images(images_list)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks a bit like overkill using this function with lots of validations under-the-hood to get the first element. Maybe try this one-liner. What do you think?
first_image_in_list = [images for images in images_list if images][0][0]
src/transformers/image_utils.py
Outdated
@@ -243,7 +243,7 @@ def make_flat_list_of_images( | |||
if ( | |||
isinstance(images, (list, tuple)) | |||
and all(isinstance(images_i, (list, tuple)) for images_i in images) | |||
and all(is_valid_list_of_images(images_i) for images_i in images) | |||
and any(is_valid_list_of_images(images_i) for images_i in images) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I would better extend it to support an empty list, otherwise, we might have something not relevant in any of the lists, e.g.
and any(is_valid_list_of_images(images_i) for images_i in images) | |
and all(is_valid_list_of_images(images_i) or not images_i for images_i in images) |
src/transformers/image_utils.py
Outdated
@@ -277,7 +277,7 @@ def make_nested_list_of_images( | |||
if ( | |||
isinstance(images, (list, tuple)) | |||
and all(isinstance(images_i, (list, tuple)) for images_i in images) | |||
and all(is_valid_list_of_images(images_i) for images_i in images) | |||
and any(is_valid_list_of_images(images_i) for images_i in images) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
Hi @qubvel , thank you for review. I have updated the PR according to your suggestions. I also modified a key of the |
We should keep |
@qubvel got it, how about the latest one? |
@qubvel Hi Pavel, I'd like to ask that is there anything needs to be resolved in this PR? |
@@ -561,11 +561,12 @@ def pad( | |||
else input_data_format | |||
) | |||
data_format = input_data_format if data_format is None else data_format | |||
first_image_in_list = [images_ for images_ in images if images_][0][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just nit, LGTM!
first_image_in_list = [images_ for images_ in images if images_][0][0] | |
# filter out empty image lists, then take first image of the first sample | |
first_image_in_list = [sample_images for sample_images in images if sample_images][0][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! done
Waiting for the CI to be green to merge 😄 |
@qubvel It seems that the integration of llama4 breaks all the processor unit tests https://github.com/huggingface/transformers/commits/main/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you documment what this enables? Like in the pipeline md?
@ArthurZucker This PR mainly enables the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry you are right its kind of obvious that it should support batch > 1 image, what I mean is to have a small doc example somewhere for people to play with it! Let's fix the conflicts and get this merged 🔥
What does this PR do?
This PR follows #35558
Consider a batch of image lists, where the first example has 1 image and the second example has 0 image. e.g.,
Using the latest code, it will receive a value error
Invalid input type. Must be a single image, a list of images, or a list of batches of images.
.In this PR, we use
any
instead ofall
to judge if it is a valid nested list of images. Note that this behavior is the same as the one in transformers 4.48.0.https://github.com/huggingface/transformers/blob/v4.48.0/src/transformers/models/mllama/image_processing_mllama.py#L535-L541
transformers/src/transformers/models/mllama/image_processing_mllama.py
Lines 535 to 541 in 6bc0fbc
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@zucchini-nlp