[Inductor] Support scaled mm on inductor #2411

shiyang-weng · 2025-06-19T01:48:15Z

Fuse following pattern to scaled_mm

    #   + - - - - | - - - - - -  | - - - -  +
    #   |    dq_per_tensor  dq_per_tensor   |
    #   |         |              |          |
    #   |    OPT(to_bf16)    OPT(to_bf16)   |
    #   |         |             |           |
    #   |    OPT(reshape)     permute       |
    #   |          \           /            |
    #   |             addmm/mm              |
    #   |                |                  |
    #   |      OPT(quant_per_tensor)        |
    #   |                |                  |
    #   |          OPT(reshape)             |

…uctor

pytorch-bot · 2025-06-19T01:48:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2411

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0c7f8ea with merge base 8b57afe ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jerryzh168 · 2025-06-19T02:40:24Z

test/float8/test_compile.py

@@ -392,5 +392,59 @@ def test_dynamic_scale_numeric_parity(
    assert torch.equal(float8_eager._data, float8_compile._data)


+@pytest.mark.parametrize(


I believe this is the training float8 test file, float8 inference is using https://github.com/pytorch/ao/blob/main/test/dtypes/test_affine_quantized_float.py

I believe this is the training float8 test file, float8 inference is using https://github.com/pytorch/ao/blob/main/test/dtypes/test_affine_quantized_float.py

Ok. I change the ut path on last pr #2379

Xia-Weiwen

LGTM. nit: This PR adds a fusion pass for fp8 q-dq-linear, not scaled_mm. scaled_mm is the fusion result. Please update the PR title.

Xia-Weiwen · 2025-06-25T09:02:05Z

test/quantization/pt2e/test_x86inductor_fusion.py

+    @parametrize("dtype", [torch.float32, torch.bfloat16])
+    @parametrize("input_dim_exceeds_two", [True, False])
+    @parametrize("check_reuse_input", [True, False])
+    def test_scaled_mm(self, has_bias, dtype, input_dim_exceeds_two, check_reuse_input):


It would be better to call it test_fp8_qlinear

Xia-Weiwen · 2025-06-25T09:13:06Z

torchao/quantization/pt2e/inductor_passes/x86.py

+    return dequant_fp8_linear_bias_pattern, dequant_fp8_linear_no_bias_pattern
+
+
+def _is_valid_scaled_mm_pattern(dtype, input_dim_exceeds_two):


The pattern is fp8 qlinear, not scaled_mm. scaled_mm is the fusion result. So, better we call it fp8_qlinear_pattern

Xia-Weiwen · 2025-06-25T09:14:12Z

torchao/quantization/pt2e/inductor_passes/x86.py

+    return _inner
+
+
+def _register_scaled_mm_pass(pattern, dtype, input_dim_exceeds_two):


Same here. scaled_mm -> fp8_qlinear.

Xia-Weiwen · 2025-06-25T09:14:24Z

torchao/quantization/pt2e/inductor_passes/x86.py

+            counters["inductor"]["scaled_mm_matcher_nodes"] += len(match.nodes)
+
+
+def _register_scaled_mm():


Same here. scaled_mm -> fp8_qlinear.

Add fp8 dequant promotion

shiyang-weng added 6 commits June 18, 2025 15:22

quantize_affine_float8/dequantize_affine_float8 not decomposed on ind…

a840ef5

…uctor

remove redundant unittest.skipIf

02d045b

fix rebase issue

9860c56

change dispatch key to a flag decomposed

ca662f3

support scaled_mm on inductor

f51a5be

fix rebase issue

719793c

shiyang-weng marked this pull request as draft June 19, 2025 01:48

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 19, 2025

jerryzh168 reviewed Jun 19, 2025

View reviewed changes

Xia-Weiwen requested a review from leslie-fang-intel June 25, 2025 08:56

Xia-Weiwen reviewed Jun 25, 2025

View reviewed changes

Xia-Weiwen added the topic: not user facing Use this tag if you don't want this PR to show up in release notes label Jun 25, 2025

shiyang-weng added 8 commits June 25, 2025 10:06

support dequant promtion for fp8

48a3d99

add ut

1921b2f

remove redundant codes

0335415

Merge pull request #2 from shiyang-weng/wengshiy/dequant_promotion

955fa6e

Add fp8 dequant promotion

Merge remote-tracking branch 'origin/main' into wengshiy/scaled_mm

a70e094

fix lint

a5bb4d0

Merge branch 'main' into wengshiy/scaled_mm

1c1f890

resolve conflict

0c7f8ea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Inductor] Support scaled mm on inductor #2411

[Inductor] Support scaled mm on inductor #2411

Uh oh!

shiyang-weng commented Jun 19, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 19, 2025 •

edited

Loading

Uh oh!

jerryzh168 Jun 19, 2025

Uh oh!

shiyang-weng Jun 19, 2025

Uh oh!

Xia-Weiwen left a comment •

edited

Loading

Uh oh!

Xia-Weiwen Jun 25, 2025

Uh oh!

Xia-Weiwen Jun 25, 2025

Uh oh!

Xia-Weiwen Jun 25, 2025

Uh oh!

Xia-Weiwen Jun 25, 2025

Uh oh!

Uh oh!

		@@ -392,5 +392,59 @@ def test_dynamic_scale_numeric_parity(
		assert torch.equal(float8_eager._data, float8_compile._data)


		@pytest.mark.parametrize(

		return dequant_fp8_linear_bias_pattern, dequant_fp8_linear_no_bias_pattern


		def _is_valid_scaled_mm_pattern(dtype, input_dim_exceeds_two):

		return _inner


		def _register_scaled_mm_pass(pattern, dtype, input_dim_exceeds_two):

		counters["inductor"]["scaled_mm_matcher_nodes"] += len(match.nodes)


		def _register_scaled_mm():

[Inductor] Support scaled mm on inductor #2411

Are you sure you want to change the base?

[Inductor] Support scaled mm on inductor #2411

Uh oh!

Conversation

shiyang-weng commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2411

✅ No Failures

Uh oh!

jerryzh168 Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

shiyang-weng Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shiyang-weng commented Jun 19, 2025 •

edited

Loading

pytorch-bot bot commented Jun 19, 2025 •

edited

Loading

Xia-Weiwen left a comment •

edited

Loading