[WIP] Make AWQ more general #2400

jerryzh168 · 2025-06-18T04:26:00Z

Summary:

Added AWQConfig that takes a base config and made corresponding changes in other parts of the flow

Test Plan:

# Produce model
# make sure to change the model_save_path
python torchao/prototype/awq/example2.py --repo "Qwen/Qwen3-4B" --quant awq-8da4w-128 --tasks bhh --model_save_hf_hub_path jerryzh168/Qwen3-4B-8da4w-awq


# eval
lm_eval --model hf --model_args pretrained=jerryzh168/Qwen3-4B-8da4w-awq --tasks bhh --device cuda:0 --batch_size auto

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot · 2025-06-18T04:26:03Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2400

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 11 New Failures

As of commit 8b1fca1 with merge base c561d26 ():

NEW FAILURES - The following jobs have failed:

Run Float8 Tests / test (H100, linux.aws.h100, --pre torch torchvision torchaudio --index-url https://download.pytor... / linux-job (gh)
test/integration/test_integration.py::TestExport::test_export_17
Run Float8 Tests / test (SM-89, linux.g6.4xlarge.experimental.nvidia.gpu, --pre torch --index-url https://download.p... / linux-job (gh)
test/integration/test_integration.py::TestExport::test_export_17
Run Regression Tests / test (CPU 2.5.1, linux.4xlarge, torch==2.5.1 --index-url https://download.pytorch.org/whl/cpu, cp... / linux-job (gh)
test/integration/test_integration.py::TestSaveLoadMeta::test_save_load_int4woqtensors_2_cpu
Run Regression Tests / test (CPU 2.6, linux.4xlarge, torch==2.6.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
test/integration/test_integration.py::TestSaveLoadMeta::test_save_load_int4woqtensors_2_cpu
Run Regression Tests / test (CPU 2.7, linux.4xlarge, torch==2.7.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
test/integration/test_integration.py::TestSaveLoadMeta::test_save_load_int4woqtensors_2_cpu
Run Regression Tests / test (CUDA 2.5.1, linux.g5.12xlarge.nvidia.gpu, torch==2.5.1 --index-url https://download.pytorch... / linux-job (gh)
test/sparsity/test_sparse_api.py::TestBlockSparseWeight::test_sparse_compile_True_input_shape_1024
Run Regression Tests / test (CUDA 2.6, linux.g5.12xlarge.nvidia.gpu, torch==2.6.0, cuda, 12.6) / linux-job (gh)
test/sparsity/test_sparse_api.py::TestBlockSparseWeight::test_sparse_compile_True_input_shape_1024
Run Regression Tests / test (CUDA 2.7, linux.g5.12xlarge.nvidia.gpu, torch==2.7.0, cuda, 12.6) / linux-job (gh)
test/sparsity/test_sparse_api.py::TestBlockSparseWeight::test_sparse_compile_True_input_shape_1024
Run Regression Tests / test-nightly (CPU Nightly, linux.4xlarge, --pre torch --index-url https://download.pytorch.org/wh... / linux-job (gh)
test/integration/test_integration.py::TestSaveLoadMeta::test_save_load_int4woqtensors_2_cpu
Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh)
RuntimeError: Command docker exec -t 39cae02a0213cdebf04023c67030dd84972aaaf7d1b08b50ca21830656437773 /exec failed with exit code 139
Run TorchAO Experimental Tests / test-mps-ops (macos-m1-stable) (gh)
test_accuracy_3

This comment was automatically generated by Dr. CI and updates every 15 minutes.

kimishpatel · 2025-06-18T19:36:17Z

torchao/prototype/awq/api.py

+                eps=eps,
+            )
+        else:
+            observer = AWQObserver2(


can you not add kwargs to the AWQObserver and just check 'base_config' in kwargs?

yes, this is temporary, I think we can deprecate the old one in the end

kimishpatel · 2025-06-18T19:36:58Z

torchao/prototype/awq/api.py

+
+
+@dataclass
+class AWQConfig(AOBaseConfig):


Ok this is consolidating with quantize_ api's config based design?

kimishpatel · 2025-06-18T20:03:35Z

torchao/prototype/awq/api.py

+    dummy_mod = DummyModule(observed_linear.weight * equalization_scale)
+    quant_mod = base_config_handler(dummy_mod, config.base_config)


I am not sure whats happening here?. Isnt module already nn.Module?

this is just trying to quantize the weight with the quantization type specified by config.base_config

kimishpatel · 2025-06-18T20:04:40Z

torchao/prototype/awq/api.py

+    if config.set_inductor_config:
+        torchao.quantization.utils.recommended_inductor_config_setter()
+
+    observed_linear = module


If this is for linear only should you not assert that this nn.Linear? Plus how to you make sure this function is called only on nn.Linear?

yeah that's true, will add an assert, we rely on user to use quantize_ correctly (it's through specifying the filter_fn arg in quantize_ API)

ao/torchao/quantization/quant_api.py

Line 578 in 4e3d019

filter_fn: Optional[Callable[[torch.nn.Module, str], bool]] = None,

Summary: * Added AWQConfig that takes a base config and made corresponding changes in other parts of the flow Test Plan: TODO Reviewers: Subscribers: Tasks: Tags:

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 18, 2025

jerryzh168 mentioned this pull request Jun 18, 2025

[WIP] Add AWQ quantization with QDQLayout support for ExecuTorch #2399

Open

kimishpatel reviewed Jun 18, 2025

View reviewed changes

[WIP] Make AWQ more general

8b1fca1

Summary: * Added AWQConfig that takes a base config and made corresponding changes in other parts of the flow Test Plan: TODO Reviewers: Subscribers: Tasks: Tags:

jerryzh168 force-pushed the refactor-awq branch from d682cb5 to 8b1fca1 Compare June 24, 2025 22:42

jerryzh168 added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label Jun 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Make AWQ more general #2400

[WIP] Make AWQ more general #2400

jerryzh168 commented Jun 18, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 18, 2025 •

edited

Loading

Uh oh!

kimishpatel Jun 18, 2025

Uh oh!

jerryzh168 Jun 18, 2025

Uh oh!

kimishpatel Jun 18, 2025

Uh oh!

kimishpatel Jun 18, 2025

Uh oh!

jerryzh168 Jun 19, 2025

Uh oh!

kimishpatel Jun 18, 2025

Uh oh!

jerryzh168 Jun 19, 2025

Uh oh!

Uh oh!

		dummy_mod = DummyModule(observed_linear.weight * equalization_scale)
		quant_mod = base_config_handler(dummy_mod, config.base_config)



		@dataclass
		class AWQConfig(AOBaseConfig):

[WIP] Make AWQ more general #2400

Are you sure you want to change the base?

[WIP] Make AWQ more general #2400

Conversation

jerryzh168 commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2400

❌ 11 New Failures

Uh oh!

kimishpatel Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

kimishpatel Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

kimishpatel Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

kimishpatel Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jerryzh168 commented Jun 18, 2025 •

edited

Loading

pytorch-bot bot commented Jun 18, 2025 •

edited

Loading