Description
Community contributions: Add Fast Image Processors
Fast image processors have been rolling out progressively for a while. Now that the BaseImageProcessorFast, from which all fast image processors inherit, is in a more stable state, I'm opening this issue to encourage contributors to add fast image processors for models that still only have a "slow" image processor.
How to implement a Fast Image Processor
The core principle of fast image processors is to use torch
and torchvision
functions for image transformations instead of PIL
or numpy
. Among other performance benefits, this enables processing images on GPU, significantly improving inference speed.
Another key difference compared to slow image processors is that, unlike BaseImageProcessor
, which provides only a minimal skeleton, BaseImageProcessorFast
includes all the fundamental functionalities needed for a basic image processor. This allows optimizations made in BaseImageProcessorFast to propagate to its inherited classes. Additionally, most repetitive logic for image loading and argument handling is managed within BaseImageProcessorFast. Except in rare cases, inherited classes do not need to handle image loading, conversion, or retrieving arguments from class attributes in the call/preprocess function, this is all handled in BaseImageProcessorFast
.
Getting Started
Run the following command:
transformers-cli add-fast-image-processor --model-name model_name
where model_name
is the name of the model (as found in its folder under transformers/src/transformers/models
) for which you're adding the fast image processor.
This command will handle all necessary imports and generate a basic fast image processor, which will look similar to this example for Beit:
# coding=utf-8
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Fast Image processor class for Beit."""
from ...image_processing_utils_fast import BASE_IMAGE_PROCESSOR_FAST_DOCSTRING, BaseImageProcessorFast
from ...image_utils import IMAGENET_STANDARD_MEAN, IMAGENET_STANDARD_STD, PILImageResampling
from ...utils import add_start_docstrings
@add_start_docstrings(
"Constructs a fast Beit image processor.",
BASE_IMAGE_PROCESSOR_FAST_DOCSTRING,
)
class BeitImageProcessorFast(BaseImageProcessorFast):
# This generated class can be used as a starting point for the fast image processor.
# if the image processor is only used for simple augmentations, such as resizing, center cropping, rescaling, or normalizing,
# only the default values should be set in the class.
# If the image processor requires more complex augmentations, methods from BaseImageProcessorFast can be overridden.
# In most cases, only the `_preprocess` method should be overridden.
# For an example of a fast image processor requiring more complex augmentations, see `LlavaNextImageProcessorFast`.
# Default values should be checked against the slow image processor
# None values left after checking can be removed
resample = PILImageResampling.BICUBIC
image_mean = IMAGENET_STANDARD_MEAN
image_std = IMAGENET_STANDARD_STD
size = {"height": 256, "width": 256}
default_to_square = None
crop_size = {"height": 224, "width": 224}
do_resize = True
do_center_crop = True
do_rescale = True
do_normalize = True
do_convert_rgb = None
__all__ = ["BeitImageProcessorFast"]
As explained in the generated file, if the image processor only performs basic augmentations such as resizing, center cropping, rescaling, and normalizing, the generated file might be sufficient for a working fast image processor. The class attributes, such as resample
and image_mean
, are automatically parsed from the slow image processor when running the script above. However, you should verify their correctness and check for any missing or incorrectly assigned values.
Customizing the Image Processor
If the image processor requires additional functionalities beyond the basic augmentations, you will need to override the _preprocess
function in BaseImageProcessorFast
. Check the _preprocess
implementation in BaseImageProcessorFast
for reference. Notably, it leverages group_images_by_shape
and reorder_images
to enable batch processing, significantly increasing processing speed, particularly on GPUs. If you create new image processing functions, ensure they support batch processing by utilizing group_images_by_shape
and reorder_images
where possible.
If your image processor requires additional kwargs not present in DefaultFastImageProcessorKwargs
, you must create a ModelNameFastImageProcessorKwargs
class that inherits from DefaultFastImageProcessorKwargs
and defines the new kwargs. Additionally, you should document the added kwargs in the class and the preprocess
function using add_start_docstrings
. (This documentation process may be simplified soon, but is necessary for now to get a correct documentation).
For an example of handling custom kwargs and documentation, refer to LlavaNextImageProcessorFast.
Important Notes
- In nearly all cases,
_preprocess
is the only function inBaseImageProcessorFast
that needs to be overridden. - The
_preprocess
function does not require default values for its arguments, as they are automatically derived from class attributes if not explicitly provided. - Even if
PIL
images ornumpy
arrays are passed to the image processor, theimages
argument in_preprocess
will always be a list of tensors, with the channel dimension first.
Handling Edge Cases
- Nested Images: If images are provided as nested lists (e.g.,
[[image1, image2], [image3]]
), they will be flattened to[image1, image2, image3]
by default before being passed to_preprocess
. This behavior can be modified by overriding_prepare_images_structure
, though flattening is generally recommended. - Formatting Custom Kwargs: If any custom kwargs require formatting before
_preprocess
, override_further_process_kwargs
. - Validating Custom Kwargs: If additional validation is needed for custom kwargs or existing ones, override
_validate_preprocess_kwargs
.
Testing
In the case where the model already has a test_image_processing_model_name.py
file under transformers/tests/models/model_name
, the script ran before should have imported the fast image processor to the file, and added it as a fast_image_processing_class
class attribute to the ModelNameImageProcessingTest
class.
However this is not enough to get all the tests to run on the fast image processor. For all the test functions under ModelNameImageProcessingTest
, you need to replace image_processing = self.image_processing_class(**self.image_processor_dict)
with a loop over self.image_processor_list
.
For example, the test_image_processor_properties
test in test_image_processing_beit.py
which looks like this:
def test_image_processor_properties(self):
image_processing = self.image_processing_class(**self.image_processor_dict)
self.assertTrue(hasattr(image_processing, "do_resize"))
self.assertTrue(hasattr(image_processing, "size"))
self.assertTrue(hasattr(image_processing, "do_center_crop"))
self.assertTrue(hasattr(image_processing, "center_crop"))
self.assertTrue(hasattr(image_processing, "do_normalize"))
self.assertTrue(hasattr(image_processing, "image_mean"))
self.assertTrue(hasattr(image_processing, "image_std"))
self.assertTrue(hasattr(image_processing, "do_reduce_labels"))
should be changed to this:
def test_image_processor_properties(self):
for image_processing_class in self.image_processor_list:
image_processing = image_processing_class(**self.image_processor_dict)
self.assertTrue(hasattr(image_processing, "do_resize"))
self.assertTrue(hasattr(image_processing, "size"))
self.assertTrue(hasattr(image_processing, "do_center_crop"))
self.assertTrue(hasattr(image_processing, "center_crop"))
self.assertTrue(hasattr(image_processing, "do_normalize"))
self.assertTrue(hasattr(image_processing, "image_mean"))
self.assertTrue(hasattr(image_processing, "image_std"))
self.assertTrue(hasattr(image_processing, "do_reduce_labels"))
In the case where no image processing test file is present, now is a great time to add one! You can have a look at the CLIP image processing test file to use as a simple starting point.
Don't hesitate to add model-specific tests if you feel like there are some non-standard image processing techniques in the processor :).
To run the tests, use this command:
RUN_SLOW=1 python -m pytest tests/models/model_name/test_image_processing_model_name.py
Choosing an Image Processor to Implement
The difficulty of implementing a fast image processor varies by model. If this is your first issue, consider starting with an easier one!
Happy coding!
Here is the list of fast image processors left to implement:
- BEiT -> [Fast Processor] BEiT #37005
- BiT -> Add ImageProcessorFast to BiT processor #37180
- Blip
- BridgeTower -> Bridgetower fast image processor #37373
- Chameleon -> Add Fast Image Processor for Chameleon #37140
- Chinese-CLIP -> Add Fast Chinese-CLIP Processor #37012
- CLIP
- Conditional-DETR -> Add Fast Conditional-DETR Processor #37071
- ConvNext
- Deformable-DETR
- Deit
- DepthPro
-
Deta(deprecated) - DETR
- Donut -> Add Fast Image Processor for Donut #37081
- DPT -> 36978 | Fast image processor for DPT model #37481
-
EfficientFormer(deprecated) - EfficientNet -> Add EfficientNet Image PreProcessor #37055
- Flava -> Add Fast Image Processor for Flava #37135
- Fuyu -> Add fuyu Fast Image Processor #37410
- Gemma3
- GLPN -> Add glpn fast processor #38461
- GotOcr2
- Grounding Dino -> Add Fast Grounding-Dino Processor #37108
- Idefics 2 -> Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors #38157
- Idefics3 -> Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors #38157
- ImageGPT -> Added fast image processing for ImageGPT - initial commit #37320
- LayoutLMv2 -> Add Fast Image Processor for LayoutLMv2 #37203
- LayoutLMv3 -> Add Fast Image Processor for LayoutLMv3 #37201
- LeViT -> Add Fast LeViT Processor #37154
- LLava
- LLaVa-NeXT
- LLaVa-NeXT-Video -> Added fast processor for llava-next-video model #37297
- LLaVa-Onevision
- Mask2Former -> Mask2former & Maskformer Fast Image Processor #35685
- MaskFormer -> Mask2former & Maskformer Fast Image Processor #35685
- MLlama -> Mllama fast image processor #37539
- MobileNetV1 -> Add Fast Image Processor for MobileNetV1 #37111
- MobileNetV2 -> Add Fast Mobilenet-V2 Processor #37113
- MobileViT -> Add Fast Image Processor for mobileViT #37143
- Nougat -> add fast image processor nougat #37661
- OneFormer -> [WIP] Add OneformerFastImageProcessor #38343
- OWLv2 -> [Fast Processor] OWLv2 #37289 / Add owlv2 fast processor #39041
- OwlViT -> Add Fast owlvit Processor #37164
- Perceiver -> Add Fast Image Processor for Perceiver #37176
- Pix2Struct -> add fast image processor for pix2struct #37210
- Pixtral
- PoolFormer -> Add Fast Image Processor for PoolFormer #37182
- Pvt -> Add Fast PVT Processor #37204
- Qwen2-VL (Not standard as it also handles videos, don't use it as an example :) )
- RT-DETR
- SAM -> Add Fast SamImageProcessor #36999
- Segformer -> Add Fast Segformer Processor #37024
- SigLIP
- SigLIP2
- SmolVLM -> Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors #38157
- SuperPoint -> Superpoint fast image processor #37804
- Swin2SR -> Add Swin2SR ImageProcessorFast #37169
-
TVLT(deprecated) - TVP
- Video-LLaVA -> Add Fast Image Processor for Video-LLaVA #37023
- VideoMAE -> Add Fast Image Processor for VideoMAE #37191
- Vilt -> Add Fast Image Processor for vilt #37304
- ViT
-
ViT hybrid(deprecated) - ViTMatte -> Fast image processor for VitMatte added and bug in slow version fixed #37616
- VitPose -> Add fast imageprocessor vitpose #38502
- Vivit
- YOLOS -> Add Fast Yolos Processor #37292
- ZoeDepth -> added fast image processor for ZoeDepth and expanded tests accordingly #38515