Skip to content

Support k_quant quantization in Olive #1818

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Support k_quant quantization in Olive #1818

wants to merge 1 commit into from

Conversation

jiafatom
Copy link
Contributor

@jiafatom jiafatom commented May 5, 2025

Describe your changes

As titled.

match = re.search(r"release (\d+)\.", output)
if match:
return int(match.group(1)) # 11, 12, etc.
except Exception:

Check notice

Code scanning / CodeQL

Empty except Note

'except' clause does nothing but pass and there is no explanatory comment.

Copilot Autofix

AI about 2 months ago

To fix the issue, we should handle the exception in a meaningful way. The best approach is to log the exception so that developers can understand why the function failed to detect the CUDA version. This can be done using Python's built-in logging module. Additionally, we can add a comment to clarify that returning None is the intended fallback behavior when CUDA is not available or the detection fails.


Suggested changeset 1
setup.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/setup.py b/setup.py
--- a/setup.py
+++ b/setup.py
@@ -39,4 +39,6 @@
             return int(match.group(1))  # 11, 12, etc.
-    except Exception:
-        pass
+    except Exception as e:
+        # Log the exception and return None as the fallback when CUDA detection fails.
+        import logging
+        logging.warning(f"Failed to detect CUDA version: {e}")
     return None
EOF
@@ -39,4 +39,6 @@
return int(match.group(1)) # 11, 12, etc.
except Exception:
pass
except Exception as e:
# Log the exception and return None as the fallback when CUDA detection fails.
import logging
logging.warning(f"Failed to detect CUDA version: {e}")
return None
Copilot is powered by AI and may make mistakes. Always verify output.
@xiaoyu-work
Copy link
Contributor

@jiafatom Our path is to migrate or implement quantization algorithms into our own pass, reducing reliance on external dependencies. For example, we just added OnnxHqqQuantization pass and do not call onnxruntime matmulnbits quantizer for hqq any more: #1809. As for k quant, we should do the same. Can you add K quant pass separately?

jambayk added a commit that referenced this pull request Jun 6, 2025
## Describe your changes
Initial implementation of the selective mixed precision class.
- More algorithms and heuristics will be added later. Experiments for
data driven selection is underway.
- Two algorithms introduced. `k_quant_last` and `k_quant_mixed` similar
to those in #1818. However, qkv_proj is not added since it is a large
matrix and expensive. We need more testing to see if it's required to
keep qkv in higher precision and/or if we should split qkv.
- Downstream passes (both pytorch and onnx) should consume this
information to do mixed quantization. Some details regarding how to deal
with fused qkv needs to be sorted out. This pass is agnostic to such
downstream changes.

## Checklist before requesting a review
- [x] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants