-
Notifications
You must be signed in to change notification settings - Fork 29.7k
Add MobileViT fast image processor #38859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Implement MobileViTImageProcessorFast class inheriting from BaseImageProcessorFast - Add support for RGB to BGR channel flipping specific to MobileViT models - Override _preprocess method to handle channel order transformation using torchvision ops - Update test infrastructure to test both slow and fast processors - Add fast processor to auto image processing registry - Update documentation to include fast processor Fixes huggingface#36978
- Implement MobileViTImageProcessorFast using BaseImageProcessorFast - Add GPU-accelerated processing for mobile deployment scenarios - Support channel flipping (RGB to BGR) via custom _preprocess method - Update tests to support both slow and fast processors - Verified functional equivalence and 1.35x average performance improvement - Achieves 1.8x speedup for optimal batch sizes (16-32 images)
- Apply black formatting to meet CI requirements - Fix line length issues and add missing blank lines - Ensure compliance with transformers code style
- Apply black formatting to resolve all CI linter issues - Format both image_processing_mobilevit_fast.py and test_image_processing_mobilevit.py - Resolve conflicts between black and ruff formatters - Ensure compliance with transformers code style standards - All functionality preserved after formatting changes
- Use ruff format as primary formatter per transformers repository standards - Format both image_processing_mobilevit_fast.py and test_image_processing_mobilevit.py - Resolve all CI formatting compliance issues - All functionality preserved after formatting changes
cc @yonigozlan |
…sorFast - Implements missing method to fix CI error about undocumented public method - Method handles semantic segmentation output post-processing with optional target size resizing - Follows same pattern as slow processor implementation - Includes proper error handling for missing PyTorch dependency
- Fix line length issues to comply with black formatting standards - Break long lines in method signatures and function calls - Ensure code meets both ruff and black quality standards
- Ensure all code meets HuggingFace quality standards - Fix formatting conflicts between black and ruff - Ready for final commit and push
- Reformatted src/transformers/models/mobilevit/image_processing_mobilevit_fast.py - Reformatted tests/models/mobilevit/test_image_processing_mobilevit.py - Fixed line length issues and consistent spacing - Ensures CI ruff checks pass
@yonigozlan checks are passing on both PRs for fast image transformers let me know if it’s good to merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @leonchlon ,
There's already a PR close to be merged on MobileViT here #37143
@yonigozlan no worries, i'll keep this open until the other is closed just in case anything comes up |
Summary
This PR adds a fast image processor for MobileViT models, providing significant performance improvements while maintaining full functional equivalence with the existing slow processor.
Changes
MobileViTImageProcessorFast
class insrc/transformers/models/mobilevit/image_processing_mobilevit_fast.py
do_flip_channel_order
parameterPerformance Improvements
Technical Implementation
_preprocess
method handles RGB→BGR conversion (required for MobileViT)shortest_edge
format consistency with slow processor viadefault_to_square=False
do_normalize=None
)Testing
Backward Compatibility
Implementation Notes
The custom
_preprocess
method was necessary becauseBaseImageProcessorFast
does not support thedo_flip_channel_order
parameter required by MobileViT models. This follows the same pattern used by other fast processors (LayoutLMv2, DepthPro) that require specialized preprocessing steps.