Description
System Info
transformers
version: 4.50.0.dev0- Platform: macOS-15.3.2-arm64-arm-64bit
- Python version: 3.12.9
- Huggingface_hub version: 0.29.3
- Safetensors version: 0.5.3
- Accelerate version: 1.5.2
- Accelerate config: not found
- DeepSpeed version: not installed
- PyTorch version (GPU?): 2.6.0 (False)
- Tensorflow version (GPU?): 2.19.0 (False)
- Flax version (CPU?/GPU?/TPU?): 0.10.4 (cpu)
- Jax version: 0.5.2
- JaxLib version: 0.5.1
- Using distributed or parallel set-up in script?:
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Steps to reproduce:
- Load the Gemma 3 model locally using a pipeline with an image as input.
- Ensure the do_pan_and_scan option is set to False.
- Run the script — the error appears when the model tries to process the image input.
Expected behavior
It tries to process the image but encounters some logic errors, they are not major errors but little yet errors:
image_processing_gemma3_fast.py
Line 357: The code references images_list, but this variable is defined only inside the if do_pan_and_scan: condition. When do_pan_and_scan == False, images_list is never initialized, resulting in an UnboundLocalError.
image_text_to_text.py
Line 84: Inside the retrieve_images_in_messages() function, the variable idx_images must be incremented even when the first if condition is met. Otherwise, the final check at line 105 throws an IndexError due to a mismatch in the expected number of images.
I implemented the following changes, which resolved the issues:
In image_processing_gemma3_fast.py, replace:
num_crops = [[0] for images in images_list]
With:
num_crops = [[0] for _ in image_list]
In the same file, replace all references to images_list with image_list after the if do_pan_and_scan: condition to ensure consistency.
In image_text_to_text.py, modify line 84 to increment idx_images inside the first if block:
if key in content:
retrieved_images.append(content[key])
idx_images += 1 # Fix to ensure alignment in the list of images