Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logic Errors in Image_processing_gemma3_fast.py #36806

Open
2 of 4 tasks
javierchacon262 opened this issue Mar 19, 2025 · 3 comments
Open
2 of 4 tasks

Logic Errors in Image_processing_gemma3_fast.py #36806

javierchacon262 opened this issue Mar 19, 2025 · 3 comments

Comments

@javierchacon262
Copy link

javierchacon262 commented Mar 19, 2025

System Info

  • transformers version: 4.50.0.dev0
  • Platform: macOS-15.3.2-arm64-arm-64bit
  • Python version: 3.12.9
  • Huggingface_hub version: 0.29.3
  • Safetensors version: 0.5.3
  • Accelerate version: 1.5.2
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (GPU?): 2.6.0 (False)
  • Tensorflow version (GPU?): 2.19.0 (False)
  • Flax version (CPU?/GPU?/TPU?): 0.10.4 (cpu)
  • Jax version: 0.5.2
  • JaxLib version: 0.5.1
  • Using distributed or parallel set-up in script?:

Who can help?

@amyeroberts
@qubvel

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Steps to reproduce:

  1. Load the Gemma 3 model locally using a pipeline with an image as input.
  2. Ensure the do_pan_and_scan option is set to False.
  3. Run the script — the error appears when the model tries to process the image input.

Expected behavior

It tries to process the image but encounters some logic errors, they are not major errors but little yet errors:

image_processing_gemma3_fast.py
Line 357: The code references images_list, but this variable is defined only inside the if do_pan_and_scan: condition. When do_pan_and_scan == False, images_list is never initialized, resulting in an UnboundLocalError.

image_text_to_text.py
Line 84: Inside the retrieve_images_in_messages() function, the variable idx_images must be incremented even when the first if condition is met. Otherwise, the final check at line 105 throws an IndexError due to a mismatch in the expected number of images.

I implemented the following changes, which resolved the issues:

In image_processing_gemma3_fast.py, replace:

num_crops = [[0] for images in images_list]
With:

num_crops = [[0] for _ in image_list]

In the same file, replace all references to images_list with image_list after the if do_pan_and_scan: condition to ensure consistency.

In image_text_to_text.py, modify line 84 to increment idx_images inside the first if block:
if key in content:
retrieved_images.append(content[key])
idx_images += 1 # Fix to ensure alignment in the list of images

@qubvel
Copy link
Member

qubvel commented Mar 19, 2025

cc @yonigozlan @zucchini-nlp

@zucchini-nlp
Copy link
Member

Hey @javierchacon262 ! Thanks for reporting the issue. We already fixed the typo with images_listin #36776

The other issue is related not to Gemma3, from what I see, but rather to the pipeline as a whole, so I'll leave it to Yoni to address :)

@yonigozlan
Copy link
Member

Hi @javierchacon262 ! Could you give a snippet to reproduce the error with the image_text_to_text pipeline?
Also I have this PR pending which might solve the issue: https://github.com/huggingface/transformers/pull/35616/files#diff-6902c688132722500bd569c3fd58d9de175551d2622d0639f9ec5aaf6c2b839cR76-R99

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants