Simplify keep_in_fp32_modules logic #36722

Cyrilvallez · 2025-03-14T13:12:55Z

What does this PR do?

As per the title. It's easier to use a regex from the beginning compared to start with a list, then switch to regex near the end of the loading logic.
Also, make sure that only full layer names can be matched with the regex. Otherwise some old and bad layer names such as wo in T5 could be match against layers such as word_embedding, which contains wo.

Also, use the flag only when loading in fp16 or using a quantizer expecting it, as _keep_in_fp32_modules was introduced only to avoid issues when casting bf16 -> fp16. See #20287 for details. That is, the layers should not always be kept in fp32, despite the name of the flag... I added a detailed comment about that because it is not clear at all otherwise

github-actions · 2025-03-14T13:13:09Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

HuggingFaceDocBuilderDev · 2025-03-14T13:39:34Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Nice! One thing that is missing: a check to enforce at init time, like the TP plan, that the layers actually exist in the model?
I think for Blip2 and T5 there might be weirdness, but at least show a warning !

ArthurZucker

Also some documentation is missing + maybe deprecate nion re pattern no?

Cyrilvallez · 2025-03-14T15:21:14Z

I just modified blip2 so that no model in the library can have the flag set to a non-existing layer (😮‍💨), so I don't think we actually need a check at init time - IMO it would clutter the code for something that would never happen. It can add it to be super super safe though!

ArthurZucker

LGTM, IMO from_pretrained should be checking the layer names to make sure typos are avoided for example

github-actions bot marked this pull request as draft March 14, 2025 13:13

Cyrilvallez marked this pull request as ready for review March 14, 2025 13:14

github-actions bot requested review from ArthurZucker and Rocketknight1 March 14, 2025 13:14

ArthurZucker reviewed Mar 14, 2025

View reviewed changes

ArthurZucker approved these changes Mar 18, 2025

View reviewed changes

Cyrilvallez added 12 commits March 21, 2025 14:31

better regex everywhere

bf311e3

fix

8d30d72

Update test_modeling_instructblip.py

874c8fc

BC with explanations this time otherwise it makes no sense at all

0a8cea7

Update test_modeling_instructblip.py

5f66a84

style

c495830

CIs

cfe0371

update _keep_in_fp32_modules in blip2

1be96b4

Update modeling_utils.py

397c2f7

Update modeling_utils.py

5c1ce37

style

9cc7b67

CIs

4db38b0

Cyrilvallez force-pushed the fix-keep-fp32 branch from ba157c2 to 4db38b0 Compare March 21, 2025 13:31

Cyrilvallez added 4 commits March 21, 2025 15:05

add check

a1915cc

trigger CIs

0276d95

Update modeling_utils.py

b8bb5a1

trigger CIs

364946f

Cyrilvallez merged commit dd3933d into main Mar 21, 2025
24 checks passed

Cyrilvallez deleted the fix-keep-fp32 branch March 21, 2025 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify keep_in_fp32_modules logic #36722

Simplify keep_in_fp32_modules logic #36722

Cyrilvallez commented Mar 14, 2025 •

edited

Loading

github-actions bot commented Mar 14, 2025

HuggingFaceDocBuilderDev commented Mar 14, 2025

ArthurZucker left a comment

ArthurZucker left a comment

Cyrilvallez commented Mar 14, 2025

ArthurZucker left a comment

Simplify keep_in_fp32_modules logic #36722

Simplify keep_in_fp32_modules logic #36722

Conversation

Cyrilvallez commented Mar 14, 2025 • edited Loading

What does this PR do?

github-actions bot commented Mar 14, 2025

HuggingFaceDocBuilderDev commented Mar 14, 2025

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

Cyrilvallez commented Mar 14, 2025

ArthurZucker left a comment

Choose a reason for hiding this comment

Cyrilvallez commented Mar 14, 2025 •

edited

Loading