-
Notifications
You must be signed in to change notification settings - Fork 184
ORT-GenAI should ship ARM64EC binaries for AMD64 python running on ARM64 Windows #1417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Do you mean AMD64 python here? The guidance from onnxruntime team is to use
We do not have any plans at the moment to create a new package similar to |
Yes, thanks. Edited my issue.
The onnxruntime package also provides arm64ec wheels, not just onnxruntime-qnn. We're in an unfortunate situation that users need to compile using Olive their LLM in an AMD64 environment, then switch to a native ARM64 environment to use GenAI extensions. This is not a great user experience; shipping a arm64ec wheel would allow users to use the same environment end to end. |
Could you show me where you found this wheel? From looking at the packaging pipelines, I couldn't find a reference to arm64ec builds for If it does not require a new PyPI package, I could potentially add support. |
@jywu-msft could help assist with that query (how ORT packages the ARM64EC binary). |
For more context on where I got the wheel, I can pip install |
For some reason, I don't see the same thing as you do:
What wheel file did you download on your machine. Can you point me to it? George is out of office. |
Here are the available wheels for The one that I tried was the onnxruntime-1.21.1-cp312-cp312-win_amd64.whl |
Doing the same on
So, I can tell that |
OK--I tried this again. What I didn't try is just installing onnxruntime on its own. I only looked at the binary after also installing onnxruntime-qnn. I tried installing the two in separate commands. If I just I'm not entirely sure how this works as I don't see pip installing Since the release of onnxruntime-genai isn't pegged to onnxruntime, I suppose we can't take the same approach here. Perhaps we could point users to a different distribution of onnxruntime-genai? Would it be possible to build the package with EC (or ARM64X) binaries and host with each release? Then we could at least point people to install ort-genai from there, if they can't use pip. |
On the side, I'm happy to try and build the binary + wheel myself and self-host it to make the Microsoft BUILD story clear--(I'm making the naive assumption this is going to be a straightforward use of your build scripts, please correct if I'm wrong). But would love to have a permanent "official" solution we can point users to longer-term. |
only onnxruntime-qnn has arm64ec wheels. onnxruntime does not. |
when you pip installed onnxruntime then onnxruntime-qnn , the dll's from onnxruntime-qnn overwrote the installation from onnxruntime |
I think you can test this out. build onnxruntime-genai with --arm64ec option. @baijumeswani should ort_home just point to a qnn enabled build? I'm not sure the exact build options @kory should use. |
The build step should be simple. You shouldn't need to provide ort_home. @kory could you try this build command? python build.py --arm64ec --parallel --config Release |
Uh oh!
There was an error while loading. Please reload this page.
Describe the bug
Windows on ARM users commonly use AMD64 python to execute models using ONNX runtime. This is needed because several python packages (eg. Torch, h5py, etc.) do not yet ship ARM64 for Windows builds.
When installed in this environment, ONNX runtime ships ARM64EC binaries, which allow models to execute (on CPU or NPU) without the need for emulation in an AMD64 python process. Models in either environment will run with the same inference time thanks to the ARM64-EC binary bundled with onnxruntime.
Onnxruntime-GenAI ships AMD64 binaries for this environment. This is a large bottleneck for running GenAI models on Windows on ARM--the ORT-GenAI package is emulated, so it's very slow.
One example is running Deepseek on Snapdragon X Elite, where we can achieve 17TPS on Native ARM Python, but only get 1TPS on AMD64 python. Because the performance difference can be replicated only with large language models running in GenAI extensions, we believe the performance difference is caused by the lack of ARM64-EC binary for GenAI Extensions.
Expected behavior
ORT-GenAI binaries that are distributed with AMD64 Python on Windows on ARM systems should be built for ARM64-EC.
Additional Information
A common toolchain use is using Olive to compile a model, then executing that model on the same device. Because Olive requires Torch, and torch does not have a distribution for Windows on ARM, users must use AMD64 Python.
The path to get good performance out of GenAI extensions is to then switch to a ARM64 Python environment. This is a not a great user experience; if ARM64EC binaries were distributed with GenAI extensions for AMD64 python, users could use 1 python environment for the end-to-end flow.
The text was updated successfully, but these errors were encountered: