Support k_quant quantization in Olive #1818

jiafatom · 2025-05-05T16:22:26Z

Describe your changes

As titled.

setup.py

+        match = re.search(r"release (\d+)\.", output)
+        if match:
+            return int(match.group(1))  # 11, 12, etc.
+    except Exception:


To fix the issue, we should handle the exception in a meaningful way. The best approach is to log the exception so that developers can understand why the function failed to detect the CUDA version. This can be done using Python's built-in logging module. Additionally, we can add a comment to clarify that returning None is the intended fallback behavior when CUDA is not available or the detection fails.

xiaoyu-work · 2025-05-06T07:01:08Z

@jiafatom Our path is to migrate or implement quantization algorithms into our own pass, reducing reliance on external dependencies. For example, we just added OnnxHqqQuantization pass and do not call onnxruntime matmulnbits quantizer for hqq any more: #1809. As for k quant, we should do the same. Can you add K quant pass separately?

## Describe your changes Initial implementation of the selective mixed precision class. - More algorithms and heuristics will be added later. Experiments for data driven selection is underway. - Two algorithms introduced. `k_quant_last` and `k_quant_mixed` similar to those in #1818. However, qkv_proj is not added since it is a large matrix and expensive. We need more testing to see if it's required to keep qkv in higher precision and/or if we should split qkv. - Downstream passes (both pytorch and onnx) should consume this information to do mixed quantization. Some details regarding how to deal with fused qkv needs to be sorted out. This pass is agnostic to such downstream changes. ## Checklist before requesting a review - [x] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Lint and apply fixes to your code by running `lintrunner -a` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. - [ ] Is this PR including examples changes? If yes, please remember to update [example documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md) in a follow-up PR. ## (Optional) Issue link

Support k_quant quantization in Olive

6b626f1

github-advanced-security bot found potential problems May 5, 2025

View reviewed changes

jambayk mentioned this pull request Jun 4, 2025

Add initial selective mixed precision pass #1898

Merged

6 tasks

@@ -39,4 +39,6 @@
                         return int(match.group(1))  # 11, 12, etc.
-                except Exception:
-                    pass
+                except Exception as e:
+                    # Log the exception and return None as the fallback when CUDA detection fails.
+                    import logging
+                    logging.warning(f"Failed to detect CUDA version: {e}")
                 return None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support k_quant quantization in Olive #1818

Support k_quant quantization in Olive #1818

Uh oh!

jiafatom commented May 5, 2025

Uh oh!

Check notice

Copilot Autofix

xiaoyu-work commented May 6, 2025

Uh oh!

Uh oh!

Support k_quant quantization in Olive #1818

Are you sure you want to change the base?

Support k_quant quantization in Olive #1818

Uh oh!

Conversation

jiafatom commented May 5, 2025

Describe your changes

Uh oh!

Check notice

Copilot Autofix

xiaoyu-work commented May 6, 2025

Uh oh!

Uh oh!