-
Notifications
You must be signed in to change notification settings - Fork 216
Support k_quant quantization in Olive #1818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
match = re.search(r"release (\d+)\.", output) | ||
if match: | ||
return int(match.group(1)) # 11, 12, etc. | ||
except Exception: |
Check notice
Code scanning / CodeQL
Empty except Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 2 months ago
To fix the issue, we should handle the exception in a meaningful way. The best approach is to log the exception so that developers can understand why the function failed to detect the CUDA version. This can be done using Python's built-in logging
module. Additionally, we can add a comment to clarify that returning None
is the intended fallback behavior when CUDA is not available or the detection fails.
-
Copy modified lines R40-R43
@@ -39,4 +39,6 @@ | ||
return int(match.group(1)) # 11, 12, etc. | ||
except Exception: | ||
pass | ||
except Exception as e: | ||
# Log the exception and return None as the fallback when CUDA detection fails. | ||
import logging | ||
logging.warning(f"Failed to detect CUDA version: {e}") | ||
return None |
@jiafatom Our path is to migrate or implement quantization algorithms into our own pass, reducing reliance on external dependencies. For example, we just added OnnxHqqQuantization pass and do not call onnxruntime matmulnbits quantizer for hqq any more: #1809. As for k quant, we should do the same. Can you add K quant pass separately? |
## Describe your changes Initial implementation of the selective mixed precision class. - More algorithms and heuristics will be added later. Experiments for data driven selection is underway. - Two algorithms introduced. `k_quant_last` and `k_quant_mixed` similar to those in #1818. However, qkv_proj is not added since it is a large matrix and expensive. We need more testing to see if it's required to keep qkv in higher precision and/or if we should split qkv. - Downstream passes (both pytorch and onnx) should consume this information to do mixed quantization. Some details regarding how to deal with fused qkv needs to be sorted out. This pass is agnostic to such downstream changes. ## Checklist before requesting a review - [x] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Lint and apply fixes to your code by running `lintrunner -a` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. - [ ] Is this PR including examples changes? If yes, please remember to update [example documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md) in a follow-up PR. ## (Optional) Issue link
Describe your changes
As titled.