Tags · pytorch/ao

ciflow/rocm/2364

Update regression_test_rocm.yml

Jun 12, 2025
2b8acf2
zip
tar.gz

ciflow/rocm/2066

Refactor `is_ROCm_mx_supported` function for improved readability

- Reformatted the return statement to enhance clarity and maintainability of the code.

Jun 9, 2025
012f938
zip
tar.gz

ciflow/benchmark/2260

updates

Jun 5, 2025
843e4f4
zip
tar.gz

v0.11.0

Uses torch.version.cuda to compile CUDA extensions (#2193)

* Uses torch.version.cuda to compile CUDA extensions

* lint

May 9, 2025
f34b473
zip
tar.gz
Notes

v0.11.0-rc4

Uses torch.version.cuda to compile CUDA extensions (#2193)

* Uses torch.version.cuda to compile CUDA extensions

* lint

May 9, 2025
f34b473
zip
tar.gz

v0.11.0-rc3

Move moe quant to better prototype dir (#2192)

* Move moe quant to better prototype dir

Summary:

The old quantization/prototype dir is being deprecated so moving
moe_quant out into the correct one.

Test Plan: see CI

Reviewers:

Subscribers:

Tasks:

Tags:

* actually adding new folder

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* ruff format

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

May 9, 2025
9b22da8
zip
tar.gz

v0.11.0-rc2

Enabling MOE Quantization using linear decomposition (#2043)

* Enabling MOE Quantization using linear decomposition

Summary: This PR is a first step at optimizing moe inference using
torchAO. The goal for this step is to enable existing quantization
kernels and workflows to work for moe quantization by decomposing the
group gemm into a sequence of unbalanced linear ops that can use the
existing quantized kernels. To enable this we had to add support for
quantizing these 3D tensors as well as slicing and indexing. 2 methods
of achieving this were implemented. for int8wo, int8dq, int4wo, fp8wo,
fp8dq, the underlying quantized tensor subclass was adapted to both
support 3D tensors, indexing and slicing, as well as an updated
transformation function that can handle the
ConditionalFeedForwardAOQuantizable modules if the filter funciton in
quantize_ is used to target the aforementioned module. For some complex kernels
which use packed data that couldn't be made to easily work in 3D, we
also added FakeExtraDimTensor which can transform any
quantized tensor subclass into supporting the necessary slice and index
operations for moe quantization. This option is enabled by using
MoeQuantConfig.

This can be applied to huggingface llama4 for instance as shown int he
llama4_quant.py example. Since the hf moe module is implemented in a way
that's not condusive to quantization, it first requires a module swap to
the MOEFeedForwardAOQuantizable.

TODO final benchmark numbers from run.sh, consolidate 3x implementation
of MOEFeedForwardAOQuantizable and ConditionalFeedForwardAOQuantizable.
verify hqq

Test Plan:
python test/quantization/test_moe_quant.py

python
test/torchao/experimental/tests/test_int8_dynamic_activation_intx_weight.py
-k "test_moe_quant_intx"

sh torchao/_models/mixtral-moe/run.sh

Reviewers:

Subscribers:

Tasks:

Tags:

* fixing CI

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* fixing CI

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* fixing CI

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* lint

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* remove test code

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* fixing exp test

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* fixing experimental test

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* fixing experimental CI

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* fixing generate.py device stuff

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* fixing tests that aren't skipping

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* ruff format

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* removing test code

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* fixing CI

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* update API and remove branching on quant_api.py transform functions

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* ruff format

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* fix weird ci error

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* remove change to test_integration.py

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

May 9, 2025
52aa616
zip
tar.gz

v0.11.0-rc1

Fix linux cpu builds. Resolves nightly build for mac stops on 0422 (#…

…2170)

* Revert "[reland][ROCm] preshuffled weight mm (#2044)"

This reverts commit 2266451.

* Revert "Re-land "Add INT8 SDPA path for CPU" (#2093)"

This reverts commit 137b079.

May 6, 2025
d9fe8b6
zip
tar.gz

v0.10.0

fix test infra branch

Apr 2, 2025
8b264ce
zip
tar.gz
Notes

v0.10.0-rc1

Add quantized q @ k test for intented used in quantized attention

Differential Revision: D71370604

Pull Request resolved: #2006

Apr 2, 2025
8e8472c
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ciflow/rocm/2364

ciflow/rocm/2066

ciflow/benchmark/2260

v0.11.0

v0.11.0-rc4

v0.11.0-rc3

v0.11.0-rc2

v0.11.0-rc1

v0.10.0

v0.10.0-rc1

Tags: pytorch/ao