Skip to content

[Contributions Welcome] Add Fast Image Processors #36978

Open
@yonigozlan

Description

@yonigozlan

Community contributions: Add Fast Image Processors

Fast image processors have been rolling out progressively for a while. Now that the BaseImageProcessorFast, from which all fast image processors inherit, is in a more stable state, I'm opening this issue to encourage contributors to add fast image processors for models that still only have a "slow" image processor.

How to implement a Fast Image Processor

The core principle of fast image processors is to use torch and torchvision functions for image transformations instead of PIL or numpy. Among other performance benefits, this enables processing images on GPU, significantly improving inference speed.

Another key difference compared to slow image processors is that, unlike BaseImageProcessor, which provides only a minimal skeleton, BaseImageProcessorFast includes all the fundamental functionalities needed for a basic image processor. This allows optimizations made in BaseImageProcessorFast to propagate to its inherited classes. Additionally, most repetitive logic for image loading and argument handling is managed within BaseImageProcessorFast. Except in rare cases, inherited classes do not need to handle image loading, conversion, or retrieving arguments from class attributes in the call/preprocess function, this is all handled in BaseImageProcessorFast.

Getting Started

Run the following command:

transformers-cli add-fast-image-processor --model-name model_name

where model_name is the name of the model (as found in its folder under transformers/src/transformers/models) for which you're adding the fast image processor.

This command will handle all necessary imports and generate a basic fast image processor, which will look similar to this example for Beit:

# coding=utf-8
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Fast Image processor class for Beit."""

from ...image_processing_utils_fast import BASE_IMAGE_PROCESSOR_FAST_DOCSTRING, BaseImageProcessorFast
from ...image_utils import IMAGENET_STANDARD_MEAN, IMAGENET_STANDARD_STD, PILImageResampling
from ...utils import add_start_docstrings


@add_start_docstrings(
    "Constructs a fast Beit image processor.",
    BASE_IMAGE_PROCESSOR_FAST_DOCSTRING,
)
class BeitImageProcessorFast(BaseImageProcessorFast):
    # This generated class can be used as a starting point for the fast image processor.
    # if the image processor is only used for simple augmentations, such as resizing, center cropping, rescaling, or normalizing,
    # only the default values should be set in the class.
    # If the image processor requires more complex augmentations, methods from BaseImageProcessorFast can be overridden.
    # In most cases, only the `_preprocess` method should be overridden.

    # For an example of a fast image processor requiring more complex augmentations, see `LlavaNextImageProcessorFast`.

    # Default values should be checked against the slow image processor
    # None values left after checking can be removed
    resample = PILImageResampling.BICUBIC
    image_mean = IMAGENET_STANDARD_MEAN
    image_std = IMAGENET_STANDARD_STD
    size = {"height": 256, "width": 256}
    default_to_square = None
    crop_size = {"height": 224, "width": 224}
    do_resize = True
    do_center_crop = True
    do_rescale = True
    do_normalize = True
    do_convert_rgb = None


__all__ = ["BeitImageProcessorFast"]

As explained in the generated file, if the image processor only performs basic augmentations such as resizing, center cropping, rescaling, and normalizing, the generated file might be sufficient for a working fast image processor. The class attributes, such as resample and image_mean, are automatically parsed from the slow image processor when running the script above. However, you should verify their correctness and check for any missing or incorrectly assigned values.

Customizing the Image Processor

If the image processor requires additional functionalities beyond the basic augmentations, you will need to override the _preprocess function in BaseImageProcessorFast. Check the _preprocess implementation in BaseImageProcessorFast for reference. Notably, it leverages group_images_by_shape and reorder_images to enable batch processing, significantly increasing processing speed, particularly on GPUs. If you create new image processing functions, ensure they support batch processing by utilizing group_images_by_shape and reorder_images where possible.

If your image processor requires additional kwargs not present in DefaultFastImageProcessorKwargs, you must create a ModelNameFastImageProcessorKwargs class that inherits from DefaultFastImageProcessorKwargs and defines the new kwargs. Additionally, you should document the added kwargs in the class and the preprocess function using add_start_docstrings. (This documentation process may be simplified soon, but is necessary for now to get a correct documentation).

For an example of handling custom kwargs and documentation, refer to LlavaNextImageProcessorFast.

Important Notes

  • In nearly all cases, _preprocess is the only function in BaseImageProcessorFast that needs to be overridden.
  • The _preprocess function does not require default values for its arguments, as they are automatically derived from class attributes if not explicitly provided.
  • Even if PIL images or numpy arrays are passed to the image processor, the images argument in _preprocess will always be a list of tensors, with the channel dimension first.

Handling Edge Cases

  • Nested Images: If images are provided as nested lists (e.g., [[image1, image2], [image3]]), they will be flattened to [image1, image2, image3] by default before being passed to _preprocess. This behavior can be modified by overriding _prepare_images_structure, though flattening is generally recommended.
  • Formatting Custom Kwargs: If any custom kwargs require formatting before _preprocess, override _further_process_kwargs.
  • Validating Custom Kwargs: If additional validation is needed for custom kwargs or existing ones, override _validate_preprocess_kwargs.

Testing

In the case where the model already has a test_image_processing_model_name.py file under transformers/tests/models/model_name, the script ran before should have imported the fast image processor to the file, and added it as a fast_image_processing_class class attribute to the ModelNameImageProcessingTest class.
However this is not enough to get all the tests to run on the fast image processor. For all the test functions under ModelNameImageProcessingTest, you need to replace image_processing = self.image_processing_class(**self.image_processor_dict) with a loop over self.image_processor_list.

For example, the test_image_processor_properties test in test_image_processing_beit.py which looks like this:

    def test_image_processor_properties(self):
        image_processing = self.image_processing_class(**self.image_processor_dict)
        self.assertTrue(hasattr(image_processing, "do_resize"))
        self.assertTrue(hasattr(image_processing, "size"))
        self.assertTrue(hasattr(image_processing, "do_center_crop"))
        self.assertTrue(hasattr(image_processing, "center_crop"))
        self.assertTrue(hasattr(image_processing, "do_normalize"))
        self.assertTrue(hasattr(image_processing, "image_mean"))
        self.assertTrue(hasattr(image_processing, "image_std"))
        self.assertTrue(hasattr(image_processing, "do_reduce_labels"))

should be changed to this:

    def test_image_processor_properties(self):
        for image_processing_class in self.image_processor_list:
            image_processing = image_processing_class(**self.image_processor_dict)
            self.assertTrue(hasattr(image_processing, "do_resize"))
            self.assertTrue(hasattr(image_processing, "size"))
            self.assertTrue(hasattr(image_processing, "do_center_crop"))
            self.assertTrue(hasattr(image_processing, "center_crop"))
            self.assertTrue(hasattr(image_processing, "do_normalize"))
            self.assertTrue(hasattr(image_processing, "image_mean"))
            self.assertTrue(hasattr(image_processing, "image_std"))
            self.assertTrue(hasattr(image_processing, "do_reduce_labels"))

In the case where no image processing test file is present, now is a great time to add one! You can have a look at the CLIP image processing test file to use as a simple starting point.

Don't hesitate to add model-specific tests if you feel like there are some non-standard image processing techniques in the processor :).

To run the tests, use this command:

RUN_SLOW=1 python -m pytest tests/models/model_name/test_image_processing_model_name.py

Choosing an Image Processor to Implement

The difficulty of implementing a fast image processor varies by model. If this is your first issue, consider starting with an easier one!

Happy coding!

Here is the list of fast image processors left to implement:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions