Skip to content

ORT-GenAI should ship ARM64EC binaries for AMD64 python running on ARM64 Windows #1417

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kory opened this issue Apr 23, 2025 · 14 comments
Open

Comments

@kory
Copy link

kory commented Apr 23, 2025

Describe the bug
Windows on ARM users commonly use AMD64 python to execute models using ONNX runtime. This is needed because several python packages (eg. Torch, h5py, etc.) do not yet ship ARM64 for Windows builds.

When installed in this environment, ONNX runtime ships ARM64EC binaries, which allow models to execute (on CPU or NPU) without the need for emulation in an AMD64 python process. Models in either environment will run with the same inference time thanks to the ARM64-EC binary bundled with onnxruntime.

Onnxruntime-GenAI ships AMD64 binaries for this environment. This is a large bottleneck for running GenAI models on Windows on ARM--the ORT-GenAI package is emulated, so it's very slow.

One example is running Deepseek on Snapdragon X Elite, where we can achieve 17TPS on Native ARM Python, but only get 1TPS on AMD64 python. Because the performance difference can be replicated only with large language models running in GenAI extensions, we believe the performance difference is caused by the lack of ARM64-EC binary for GenAI Extensions.

Expected behavior
ORT-GenAI binaries that are distributed with AMD64 Python on Windows on ARM systems should be built for ARM64-EC.

Additional Information
A common toolchain use is using Olive to compile a model, then executing that model on the same device. Because Olive requires Torch, and torch does not have a distribution for Windows on ARM, users must use AMD64 Python.

The path to get good performance out of GenAI extensions is to then switch to a ARM64 Python environment. This is a not a great user experience; if ARM64EC binaries were distributed with GenAI extensions for AMD64 python, users could use 1 python environment for the end-to-end flow.

@baijumeswani
Copy link
Collaborator

but only get 1TPS on ARM64 python.

Do you mean AMD64 python here?

The guidance from onnxruntime team is to use onnxruntime-qnn package on windows arm64 for running models on NPU. onnxruntime-qnn provides two wheels:

  • native arm64 wheels
  • arm64ec wheels for amd64

onnxruntime-genai Python package provides:

  • amd64 wheel (not built with arm64ec) - needed for non winarm64 envs
  • native arm64 wheel

We do not have any plans at the moment to create a new package similar to onnxruntime-qnn.

@kory
Copy link
Author

kory commented Apr 23, 2025

but only get 1TPS on ARM64 python.

Do you mean AMD64 python here?

Yes, thanks. Edited my issue.

onnxruntime-genai Python package provides:

  • amd64 wheel (not built with arm64ec) - needed for non winarm64 envs
  • native arm64 wheel

We do not have any plans at the moment to create a new package similar to onnxruntime-qnn.

The onnxruntime package also provides arm64ec wheels, not just onnxruntime-qnn.

We're in an unfortunate situation that users need to compile using Olive their LLM in an AMD64 environment, then switch to a native ARM64 environment to use GenAI extensions. This is not a great user experience; shipping a arm64ec wheel would allow users to use the same environment end to end.

@baijumeswani
Copy link
Collaborator

The onnxruntime package also provides arm64ec wheels,

Could you show me where you found this wheel? From looking at the packaging pipelines, I couldn't find a reference to arm64ec builds for onnxruntime. How were you able to determine that the wheel was an arm64ec wheel?

If it does not require a new PyPI package, I could potentially add support.

@kory
Copy link
Author

kory commented Apr 24, 2025

@jywu-msft could help assist with that query (how ORT packages the ARM64EC binary).

@kory
Copy link
Author

kory commented Apr 24, 2025

For more context on where I got the wheel, I can pip install onnxruntime in an AMD64 python environment on my Windows on ARM machine, and the included onnxruntime.dll is ARM64-EC (actually, .a64xrm, which is ARM64X, a superset of ARM64EC) according to dumpbin.

@baijumeswani
Copy link
Collaborator

For some reason, I don't see the same thing as you do:

C:\Users\bmeswani\Downloads>dumpbin /headers C:\Users\bmeswani\Downloads\onnxruntime-1.21.1-cp312-cp312-win_amd64\onnxruntime\capi\onnxruntime.dll
Microsoft (R) COFF/PE Dumper Version 14.42.34433.0
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file C:\Users\bmeswani\Downloads\onnxruntime-1.21.1-cp312-cp312-win_amd64\onnxruntime\capi\onnxruntime.dll

PE signature found

File Type: DLL

FILE HEADER VALUES
            **8664 machine (x64)**
               7 number of sections
        67F712F1 time date stamp Wed Apr  9 17:38:09 2025
               0 file pointer to symbol table
               0 number of symbols
              F0 size of optional header
            2022 characteristics
                   Executable
                   Application can handle large (>2GB) addresses
                   DLL

What wheel file did you download on your machine. Can you point me to it?

George is out of office.

@baijumeswani
Copy link
Collaborator

Here are the available wheels for onnxruntime: https://pypi.org/project/onnxruntime/#files

The one that I tried was the onnxruntime-1.21.1-cp312-cp312-win_amd64.whl

@baijumeswani
Copy link
Collaborator

Doing the same on onnxruntime-qnn, i get this:

C:\Users\bmeswani\Downloads>dumpbin /headers C:\Users\bmeswani\Downloads\onnxruntime_qnn-1.21.1-cp312-cp312-win_amd64\onnxruntime\capi\onnxruntime.dll
Microsoft (R) COFF/PE Dumper Version 14.42.34433.0
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file C:\Users\bmeswani\Downloads\onnxruntime_qnn-1.21.1-cp312-cp312-win_amd64\onnxruntime\capi\onnxruntime.dll

PE signature found

File Type: DLL

FILE HEADER VALUES
            **8664 machine (x64) (ARM64X)**
               8 number of sections
        67F713B1 time date stamp Wed Apr  9 17:41:21 2025
               0 file pointer to symbol table
               0 number of symbols
              F0 size of optional header
            2022 characteristics
                   Executable
                   Application can handle large (>2GB) addresses
                   DLL

So, I can tell that onnxruntime-qnn package certainly has the arm64ec binaries available. But the onnxruntime one does not have as far as I can tell.

@kory
Copy link
Author

kory commented Apr 24, 2025

OK--I tried this again.

What I didn't try is just installing onnxruntime on its own. I only looked at the binary after also installing onnxruntime-qnn.

I tried installing the two in separate commands. If I just pip install onnxruntime, I get an x64 binary. If I pip install onnxruntime-qnn after, the x64 binary (in the same path; site-packages\onnxruntime\capi\onnxruntime.dll) is replaced with an ARM64X binary.

I'm not entirely sure how this works as I don't see pip installing onnxruntime again when I install onnxruntime-qnn. Again would be good for George to weigh in here when he returns (or perhaps @HectorSVC could help).

Since the release of onnxruntime-genai isn't pegged to onnxruntime, I suppose we can't take the same approach here.

Perhaps we could point users to a different distribution of onnxruntime-genai? Would it be possible to build the package with EC (or ARM64X) binaries and host with each release?

Then we could at least point people to install ort-genai from there, if they can't use pip.

@kory
Copy link
Author

kory commented Apr 24, 2025

On the side, I'm happy to try and build the binary + wheel myself and self-host it to make the Microsoft BUILD story clear--(I'm making the naive assumption this is going to be a straightforward use of your build scripts, please correct if I'm wrong).

But would love to have a permanent "official" solution we can point users to longer-term.

@jywu-msft
Copy link
Member

only onnxruntime-qnn has arm64ec wheels. onnxruntime does not.
one needs to uninstall any onnxuntime-* wheels prior to installing one from a different flavor. e.g. onnxruntime-qnn
since they would conflict with each other. (they both install to same onnxruntime location)

@jywu-msft
Copy link
Member

when you pip installed onnxruntime then onnxruntime-qnn , the dll's from onnxruntime-qnn overwrote the installation from onnxruntime
which is why you saw onnxruntime.dll as ARM64EC

@jywu-msft
Copy link
Member

jywu-msft commented Apr 26, 2025

On the side, I'm happy to try and build the binary + wheel myself and self-host it to make the Microsoft BUILD story clear--(I'm making the naive assumption this is going to be a straightforward use of your build scripts, please correct if I'm wrong).

But would love to have a permanent "official" solution we can point users to longer-term.

I think you can test this out. build onnxruntime-genai with --arm64ec option. @baijumeswani should ort_home just point to a qnn enabled build? I'm not sure the exact build options @kory should use.

@baijumeswani
Copy link
Collaborator

@baijumeswani should ort_home just point to a qnn enabled build? I'm not sure the exact build options @kory should use.

The build step should be simple. You shouldn't need to provide ort_home. @kory could you try this build command?

python build.py --arm64ec --parallel --config Release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants