Pulse · pytorch/ao · GitHub

June 16, 2025 – June 23, 2025

Overview

40 Active pull requests

8 Active issues

Could not load contribution data

Please try again later

19 Pull requests merged by 13 people

[float8 moe training] FSDP support
#2413 merged Jun 21, 2025
Build mxfp4 kernel for sm120a
#2285 merged Jun 21, 2025
Remove more Galore bits
#2417 merged Jun 21, 2025
Enable cpp kernel building
#2402 merged Jun 19, 2025
Replace debug handle with from_node to trace operator transformation
#2339 merged Jun 18, 2025
[float8 moe training] make using triton kernels for per-group scaling configurable
#2405 merged Jun 18, 2025
Add part 2 of end-to-end tutorial: fine-tuning
#2394 merged Jun 18, 2025
Fix ruff broken on main
#2404 merged Jun 18, 2025
fix torchao quantized model in fbcode
#2396 merged Jun 18, 2025
[BE] Convert quant_primitives methods private
#2350 merged Jun 18, 2025
Delete Galore
#2397 merged Jun 18, 2025
Add inplace quantizer examples
#2345 merged Jun 18, 2025
Update index.rst
#2395 merged Jun 17, 2025
Add pt2e tutorials to torchao doc page
#2384 merged Jun 17, 2025
deduplicate torch ao debugger tests between pytorch/ao and ExecuTorch
#2390 merged Jun 17, 2025
[float8 training] update torchtitan benchmark script args
#2392 merged Jun 17, 2025
turn off building tests with cpuinfo
#2324 merged Jun 17, 2025
remove torchao dependency from torchao build script
#2383 merged Jun 17, 2025
remove rocm source files when not building for rocm
#2385 merged Jun 16, 2025

21 Pull requests opened by 13 people

Add support for resharding and int4 preshuffle kernel
#2387 opened Jun 16, 2025
Enables the per_tensor lowering patterns for weight per_packing
#2391 opened Jun 17, 2025
Unskip tests
#2398 opened Jun 18, 2025
[WIP] Add AWQ quantization with QDQLayout support for ExecuTorch
#2399 opened Jun 18, 2025
[WIP] Make AWQ more general
#2400 opened Jun 18, 2025
Align scale dtype with model precision in GPTQ
#2403 opened Jun 18, 2025
Improve tiling params to speed up prefill
#2406 opened Jun 18, 2025
Groupwise low bit LUT based model quantization.
#2407 opened Jun 18, 2025
NVfp4
#2408 opened Jun 18, 2025
[float8] add auto_filter_for_recipe to float8
#2410 opened Jun 18, 2025
[Inductor] Support scaled mm on inductor
#2411 opened Jun 19, 2025
fix float8 training TP+SP integration tests
#2414 opened Jun 20, 2025
rename `torchao.testing.float8` to `torchao.testing.training`
#2415 opened Jun 20, 2025
make dtensor shared test util more generic
#2416 opened Jun 20, 2025
Fixes issue #156414: Fixes bug in implementation of _combine_histogram (Follow up)
#2418 opened Jun 21, 2025
enable to_mxfp8 cast for DTensor
#2420 opened Jun 21, 2025
Add support for Int4GroupwisePreshuffleTensor for fbgemm
#2421 opened Jun 22, 2025
Remove `transpose_input` from fbgemm configs
#2422 opened Jun 22, 2025
enabling xpu in UT test
#2424 opened Jun 23, 2025
[float8 moe training] Add TP support
#2425 opened Jun 23, 2025
mitigate the numeric test issue
#2426 opened Jun 23, 2025

3 Issues closed by 3 people

Question about the choice of use_fast_accum in FP8 Training
#2377 closed Jun 22, 2025
[Question] Combining QAT and Sparsity Training
#2310 closed Jun 20, 2025
[Windows][build]two Build failure on Windows on latest main branch
#2297 closed Jun 16, 2025

5 Issues opened by 4 people

Training hangs after 1 epoch when using QAT
#2423 opened Jun 22, 2025
Benefits of Using QAT Before GGUF Quantization?
#2419 opened Jun 21, 2025
TorchAO Paper
#2412 opened Jun 19, 2025
TP + FSDP + MXFP8 fails during compile
#2393 opened Jun 17, 2025
Implement an AWQ algorithm with dynamic activation quantization for ExecuTorch
#2388 opened Jun 16, 2025

14 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Inference tutorial - Part 3 of e2e series [WIP]
#2343 commented on Jun 23, 2025 • 21 new comments
Add Claude MD file
#2311 commented on Jun 18, 2025 • 5 new comments
skip quant/dequant decomposed
#2299 commented on Jun 19, 2025 • 3 new comments
[float8] Prevent quantize_affine_float8/dequantize_affine_float8 decomposed on inductor
#2379 commented on Jun 20, 2025 • 3 new comments
[CPU] Enable DA8W4 on CPU
#2128 commented on Jun 20, 2025 • 1 new comment
[roadmap/tracker] Low precision training for MoEs
#2147 commented on Jun 16, 2025 • 0 new comments
BF16 stochastic rounding does not work distributed (FSDP)
#2296 commented on Jun 18, 2025 • 0 new comments
[QAT] Low-bit FSDP all-gather for QAT
#1224 commented on Jun 19, 2025 • 0 new comments
Eval hf models using lm_eval
#2179 commented on Jun 23, 2025 • 0 new comments
Add round_scales_to_power_of_2 option for float quantization
#2323 commented on Jun 18, 2025 • 0 new comments
moe quant with dedicated kernels [wip]
#2325 commented on Jun 20, 2025 • 0 new comments
Update to new PT Theme
#2361 commented on Jun 17, 2025 • 0 new comments
[CPU INT8 SDPA] use manual transpose and pack
#2380 commented on Jun 20, 2025 • 0 new comments
[not for land] float8 blockwise scaling training prototype using deep_gemm
#2386 commented on Jun 18, 2025 • 0 new comments