-
Notifications
You must be signed in to change notification settings - Fork 305
Enables the per_tensor lowering patterns for weight per_packing #2391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Enables the per_tensor lowering patterns for weight per_packing #2391
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2391
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit e51e9ec with merge base 11ce634 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
c698531
to
67d4a79
Compare
Hi @jerryzh168, @fadara01, @Xia-Weiwen can you please review this pr |
Thanks, can you add some tests in https://github.com/pytorch/ao/tree/main/test/quantization/pt2e |
67d4a79
to
d863085
Compare
Hi @jerryzh168, |
d863085
to
2caf61d
Compare
2caf61d
to
e51e9ec
Compare
Thanks for your PR! |
Hi @fadara01, Thanks for the response. to recreate the experiment
quant script
current setup |
Ahhh that's amazing! I remember doing a PoC for this exact thing back in the day and I had to tweak qlinear/qconv, hence my question. |
Hi @jerryzh168, @fadara01, can you please approve and merge this change. |
This Pr is an extension of #2139 pr,
Major changes:
1)Introduced lowering pattern for "per_tensor" quantized weights.
2) Modified the original api
get_default_arm_inductor_quantization_config
to add user choice of using "per_tensor" and "per_channel" granularity in model weight's quantization.supported shapes:
Tested and verified for different models:
Example script for refence:
Results
All time in sec, Taken on Aws Graviton 3E 32 core Instance
Pip list
cc: @jerryzh168, @fadara01, @Xia-Weiwen