Skip to content

Tags: pytorch/ao

Tags

ciflow/rocm/2364

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Update regression_test_rocm.yml

ciflow/rocm/2066

Refactor `is_ROCm_mx_supported` function for improved readability

- Reformatted the return statement to enhance clarity and maintainability of the code.

ciflow/benchmark/2260

updates

v0.11.0

Uses torch.version.cuda to compile CUDA extensions (#2193)

* Uses torch.version.cuda to compile CUDA extensions

* lint

v0.11.0-rc4

Uses torch.version.cuda to compile CUDA extensions (#2193)

* Uses torch.version.cuda to compile CUDA extensions

* lint

v0.11.0-rc3

Move moe quant to better prototype dir (#2192)

* Move moe quant to better prototype dir

Summary:

The old quantization/prototype dir is being deprecated so moving
moe_quant out into the correct one.

Test Plan: see CI

Reviewers:

Subscribers:

Tasks:

Tags:

* actually adding new folder

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* ruff format

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

v0.11.0-rc2

Enabling MOE Quantization using linear decomposition (#2043)

* Enabling MOE Quantization using linear decomposition

Summary: This PR is a first step at optimizing moe inference using
torchAO. The goal for this step is to enable existing quantization
kernels and workflows to work for moe quantization by decomposing the
group gemm into a sequence of unbalanced linear ops that can use the
existing quantized kernels. To enable this we had to add support for
quantizing these 3D tensors as well as slicing and indexing. 2 methods
of achieving this were implemented. for int8wo, int8dq, int4wo, fp8wo,
fp8dq, the underlying quantized tensor subclass was adapted to both
support 3D tensors, indexing and slicing, as well as an updated
transformation function that can handle the
ConditionalFeedForwardAOQuantizable modules if the filter funciton in
quantize_ is used to target the aforementioned module. For some complex kernels
which use packed data that couldn't be made to easily work in 3D, we
also added FakeExtraDimTensor which can transform any
quantized tensor subclass into supporting the necessary slice and index
operations for moe quantization. This option is enabled by using
MoeQuantConfig.

This can be applied to huggingface llama4 for instance as shown int he
llama4_quant.py example. Since the hf moe module is implemented in a way
that's not condusive to quantization, it first requires a module swap to
the MOEFeedForwardAOQuantizable.

TODO final benchmark numbers from run.sh, consolidate 3x implementation
of MOEFeedForwardAOQuantizable and ConditionalFeedForwardAOQuantizable.
verify hqq

Test Plan:
python test/quantization/test_moe_quant.py

python
test/torchao/experimental/tests/test_int8_dynamic_activation_intx_weight.py
-k "test_moe_quant_intx"

sh torchao/_models/mixtral-moe/run.sh

Reviewers:

Subscribers:

Tasks:

Tags:

* fixing CI

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* fixing CI

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* fixing CI

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* lint

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* remove test code

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* fixing exp test

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* fixing experimental test

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* fixing experimental CI

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* fixing generate.py device stuff

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* fixing tests that aren't skipping

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* ruff format

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* removing test code

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* fixing CI

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* update API and remove branching on quant_api.py transform functions

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* ruff format

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* fix weird ci error

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* remove change to test_integration.py

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

v0.11.0-rc1

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Fix linux cpu builds. Resolves nightly build for mac stops on 0422 (#…

…2170)

* Revert "[reland][ROCm] preshuffled weight mm (#2044)"

This reverts commit 2266451.

* Revert "Re-land "Add INT8 SDPA path for CPU" (#2093)"

This reverts commit 137b079.

v0.10.0

fix test infra branch

v0.10.0-rc1

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Add quantized q @ k test for intented used in quantized attention

Differential Revision: D71370604

Pull Request resolved: #2006