Improve information about group offloading and layerwise casting #11101

a-r-r-o-w · 2025-03-18T06:42:32Z

No description provided.

docs/source/en/optimization/memory.md

HuggingFaceDocBuilderDev · 2025-03-18T06:52:21Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

docs/source/en/optimization/memory.md

DN6 · 2025-03-18T07:19:11Z

docs/source/en/optimization/memory.md

@@ -235,6 +246,13 @@ In the above example, layerwise casting is enabled on the transformer component

 However, you gain more control and flexibility by directly utilizing the [`~hooks.layerwise_casting.apply_layerwise_casting`] function instead of [`~ModelMixin.enable_layerwise_casting`].

+<Tip>
+
+- Layerwise casting may not work with all models out-of-the-box. Sometimes, the forward implementations of the model contain weight-dependent typecasting of inputs. Such implementations are not supported due to the currently simplistic implementation of layerwise casting, which assumes that the forward pass is independent of the weight precision and that the input dtypes are always in `compute_dtype`. An example of an incompatible implementation can be found [here](https://github.com/huggingface/transformers/blob/7f5077e53682ca855afc826162b204ebf809f1f9/src/transformers/models/t5/modeling_t5.py#L294-L299).


Should also mention that it can be disabled on modules with the skip patterns.

DN6 · 2025-03-18T07:19:19Z

docs/source/en/optimization/memory.md

@@ -198,6 +198,17 @@ export_to_video(video, "output.mp4", fps=8)

 Group offloading (for CUDA devices with support for asynchronous data transfer streams) overlaps data transfer and computation to reduce the overall execution time compared to sequential offloading. This is enabled using layer prefetching with CUDA streams. The next layer to be executed is loaded onto the accelerator device while the current layer is being executed - this increases the memory requirements slightly. Group offloading also supports leaf-level offloading (equivalent to sequential CPU offloading) but can be made much faster when using streams.

+<Tip>
+
+- Group offloading may not work with all models out-of-the-box. If the forward implementations of the model contain weight-dependent device-casting of inputs, it may clash with the offloading mechanism's handling of device-casting.


Should also mention that it can be disabled on modules with the skip patterns.

We don't support skipping in group offloading. Will mention for layerwise casting though

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

a-r-r-o-w · 2025-03-24T17:55:53Z

Failing test looks unrelated

update

97d8399

a-r-r-o-w requested a review from DN6 March 18, 2025 06:42

a-r-r-o-w commented Mar 18, 2025

View reviewed changes

docs/source/en/optimization/memory.md Outdated Show resolved Hide resolved

Update docs/source/en/optimization/memory.md

a09ddbd

DN6 reviewed Mar 18, 2025

View reviewed changes

docs/source/en/optimization/memory.md Outdated Show resolved Hide resolved

DN6 reviewed Mar 18, 2025

View reviewed changes

docs/source/en/optimization/memory.md Outdated Show resolved Hide resolved

DN6 reviewed Mar 18, 2025

View reviewed changes

a-r-r-o-w and others added 2 commits March 18, 2025 14:33

Apply suggestions from code review

6e9b6c7

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

apply review suggestions

8850e70

a-r-r-o-w requested a review from DN6 March 18, 2025 09:11

DN6 and others added 2 commits March 24, 2025 22:21

Merge branch 'main' into improve-info-layerwise-and-group

e487c1f

update

6133622

DN6 approved these changes Mar 24, 2025

View reviewed changes

a-r-r-o-w merged commit 1ddf3f3 into main Mar 24, 2025
14 of 15 checks passed

a-r-r-o-w deleted the improve-info-layerwise-and-group branch March 24, 2025 17:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve information about group offloading and layerwise casting #11101

Improve information about group offloading and layerwise casting #11101

Uh oh!

a-r-r-o-w commented Mar 18, 2025

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 18, 2025

Uh oh!

Uh oh!

Uh oh!

DN6 Mar 18, 2025

Uh oh!

DN6 Mar 18, 2025

Uh oh!

a-r-r-o-w Mar 18, 2025

Uh oh!

a-r-r-o-w commented Mar 24, 2025

Uh oh!

Uh oh!

Uh oh!

Improve information about group offloading and layerwise casting #11101

Improve information about group offloading and layerwise casting #11101

Uh oh!

Conversation

a-r-r-o-w commented Mar 18, 2025

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 18, 2025

Uh oh!

Uh oh!

Uh oh!

DN6 Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

DN6 Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

a-r-r-o-w Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

a-r-r-o-w commented Mar 24, 2025

Uh oh!

Uh oh!

Uh oh!