ORT-GenAI should ship ARM64EC binaries for AMD64 python running on ARM64 Windows #1417

kory · 2025-04-23T20:12:16Z

Describe the bug
Windows on ARM users commonly use AMD64 python to execute models using ONNX runtime. This is needed because several python packages (eg. Torch, h5py, etc.) do not yet ship ARM64 for Windows builds.

When installed in this environment, ONNX runtime ships ARM64EC binaries, which allow models to execute (on CPU or NPU) without the need for emulation in an AMD64 python process. Models in either environment will run with the same inference time thanks to the ARM64-EC binary bundled with onnxruntime.

Onnxruntime-GenAI ships AMD64 binaries for this environment. This is a large bottleneck for running GenAI models on Windows on ARM--the ORT-GenAI package is emulated, so it's very slow.

One example is running Deepseek on Snapdragon X Elite, where we can achieve 17TPS on Native ARM Python, but only get 1TPS on AMD64 python. Because the performance difference can be replicated only with large language models running in GenAI extensions, we believe the performance difference is caused by the lack of ARM64-EC binary for GenAI Extensions.

Expected behavior
ORT-GenAI binaries that are distributed with AMD64 Python on Windows on ARM systems should be built for ARM64-EC.

Additional Information
A common toolchain use is using Olive to compile a model, then executing that model on the same device. Because Olive requires Torch, and torch does not have a distribution for Windows on ARM, users must use AMD64 Python.

The path to get good performance out of GenAI extensions is to then switch to a ARM64 Python environment. This is a not a great user experience; if ARM64EC binaries were distributed with GenAI extensions for AMD64 python, users could use 1 python environment for the end-to-end flow.

baijumeswani · 2025-04-23T21:25:11Z

but only get 1TPS on ARM64 python.

Do you mean AMD64 python here?

The guidance from onnxruntime team is to use onnxruntime-qnn package on windows arm64 for running models on NPU. onnxruntime-qnn provides two wheels:

native arm64 wheels
arm64ec wheels for amd64

onnxruntime-genai Python package provides:

amd64 wheel (not built with arm64ec) - needed for non winarm64 envs
native arm64 wheel

We do not have any plans at the moment to create a new package similar to onnxruntime-qnn.

kory · 2025-04-23T23:18:16Z

but only get 1TPS on ARM64 python.

Do you mean AMD64 python here?

Yes, thanks. Edited my issue.

onnxruntime-genai Python package provides:

amd64 wheel (not built with arm64ec) - needed for non winarm64 envs

native arm64 wheel

We do not have any plans at the moment to create a new package similar to onnxruntime-qnn.

The onnxruntime package also provides arm64ec wheels, not just onnxruntime-qnn.

We're in an unfortunate situation that users need to compile using Olive their LLM in an AMD64 environment, then switch to a native ARM64 environment to use GenAI extensions. This is not a great user experience; shipping a arm64ec wheel would allow users to use the same environment end to end.

baijumeswani · 2025-04-24T17:46:00Z

The onnxruntime package also provides arm64ec wheels,

Could you show me where you found this wheel? From looking at the packaging pipelines, I couldn't find a reference to arm64ec builds for onnxruntime. How were you able to determine that the wheel was an arm64ec wheel?

If it does not require a new PyPI package, I could potentially add support.

kory · 2025-04-24T20:09:43Z

@jywu-msft could help assist with that query (how ORT packages the ARM64EC binary).

kory · 2025-04-24T20:14:49Z

For more context on where I got the wheel, I can pip install onnxruntime in an AMD64 python environment on my Windows on ARM machine, and the included onnxruntime.dll is ARM64-EC (actually, .a64xrm, which is ARM64X, a superset of ARM64EC) according to dumpbin.

baijumeswani · 2025-04-24T20:24:38Z

For some reason, I don't see the same thing as you do:

C:\Users\bmeswani\Downloads>dumpbin /headers C:\Users\bmeswani\Downloads\onnxruntime-1.21.1-cp312-cp312-win_amd64\onnxruntime\capi\onnxruntime.dll
Microsoft (R) COFF/PE Dumper Version 14.42.34433.0
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file C:\Users\bmeswani\Downloads\onnxruntime-1.21.1-cp312-cp312-win_amd64\onnxruntime\capi\onnxruntime.dll

PE signature found

File Type: DLL

FILE HEADER VALUES
            **8664 machine (x64)**
               7 number of sections
        67F712F1 time date stamp Wed Apr  9 17:38:09 2025
               0 file pointer to symbol table
               0 number of symbols
              F0 size of optional header
            2022 characteristics
                   Executable
                   Application can handle large (>2GB) addresses
                   DLL

What wheel file did you download on your machine. Can you point me to it?

George is out of office.

baijumeswani · 2025-04-24T20:26:45Z

Here are the available wheels for onnxruntime: https://pypi.org/project/onnxruntime/#files

The one that I tried was the onnxruntime-1.21.1-cp312-cp312-win_amd64.whl

baijumeswani · 2025-04-24T20:40:29Z

Doing the same on onnxruntime-qnn, i get this:

C:\Users\bmeswani\Downloads>dumpbin /headers C:\Users\bmeswani\Downloads\onnxruntime_qnn-1.21.1-cp312-cp312-win_amd64\onnxruntime\capi\onnxruntime.dll
Microsoft (R) COFF/PE Dumper Version 14.42.34433.0
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file C:\Users\bmeswani\Downloads\onnxruntime_qnn-1.21.1-cp312-cp312-win_amd64\onnxruntime\capi\onnxruntime.dll

PE signature found

File Type: DLL

FILE HEADER VALUES
            **8664 machine (x64) (ARM64X)**
               8 number of sections
        67F713B1 time date stamp Wed Apr  9 17:41:21 2025
               0 file pointer to symbol table
               0 number of symbols
              F0 size of optional header
            2022 characteristics
                   Executable
                   Application can handle large (>2GB) addresses
                   DLL

So, I can tell that onnxruntime-qnn package certainly has the arm64ec binaries available. But the onnxruntime one does not have as far as I can tell.

kory · 2025-04-24T22:13:57Z

OK--I tried this again.

What I didn't try is just installing onnxruntime on its own. I only looked at the binary after also installing onnxruntime-qnn.

I tried installing the two in separate commands. If I just pip install onnxruntime, I get an x64 binary. If I pip install onnxruntime-qnn after, the x64 binary (in the same path; site-packages\onnxruntime\capi\onnxruntime.dll) is replaced with an ARM64X binary.

I'm not entirely sure how this works as I don't see pip installing onnxruntime again when I install onnxruntime-qnn. Again would be good for George to weigh in here when he returns (or perhaps @HectorSVC could help).

Since the release of onnxruntime-genai isn't pegged to onnxruntime, I suppose we can't take the same approach here.

Perhaps we could point users to a different distribution of onnxruntime-genai? Would it be possible to build the package with EC (or ARM64X) binaries and host with each release?

Then we could at least point people to install ort-genai from there, if they can't use pip.

kory · 2025-04-24T22:23:01Z

On the side, I'm happy to try and build the binary + wheel myself and self-host it to make the Microsoft BUILD story clear--(I'm making the naive assumption this is going to be a straightforward use of your build scripts, please correct if I'm wrong).

But would love to have a permanent "official" solution we can point users to longer-term.

jywu-msft · 2025-04-25T01:28:14Z

only onnxruntime-qnn has arm64ec wheels. onnxruntime does not.
one needs to uninstall any onnxuntime-* wheels prior to installing one from a different flavor. e.g. onnxruntime-qnn
since they would conflict with each other. (they both install to same onnxruntime location)

jywu-msft · 2025-04-25T01:31:54Z

when you pip installed onnxruntime then onnxruntime-qnn , the dll's from onnxruntime-qnn overwrote the installation from onnxruntime
which is why you saw onnxruntime.dll as ARM64EC

jywu-msft · 2025-04-26T00:52:51Z

On the side, I'm happy to try and build the binary + wheel myself and self-host it to make the Microsoft BUILD story clear--(I'm making the naive assumption this is going to be a straightforward use of your build scripts, please correct if I'm wrong).

But would love to have a permanent "official" solution we can point users to longer-term.

I think you can test this out. build onnxruntime-genai with --arm64ec option. @baijumeswani should ort_home just point to a qnn enabled build? I'm not sure the exact build options @kory should use.

baijumeswani · 2025-04-30T00:46:06Z

@baijumeswani should ort_home just point to a qnn enabled build? I'm not sure the exact build options @kory should use.

The build step should be simple. You shouldn't need to provide ort_home. @kory could you try this build command?

python build.py --arm64ec --parallel --config Release

microsoft-github-policy-service bot added the platform:windows label Apr 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ORT-GenAI should ship ARM64EC binaries for AMD64 python running on ARM64 Windows #1417

ORT-GenAI should ship ARM64EC binaries for AMD64 python running on ARM64 Windows #1417

kory commented Apr 23, 2025 •

edited

Loading

baijumeswani commented Apr 23, 2025

Uh oh!

kory commented Apr 23, 2025 •

edited

Loading

Uh oh!

baijumeswani commented Apr 24, 2025

Uh oh!

kory commented Apr 24, 2025 •

edited

Loading

Uh oh!

kory commented Apr 24, 2025 •

edited

Loading

Uh oh!

baijumeswani commented Apr 24, 2025

Uh oh!

baijumeswani commented Apr 24, 2025

Uh oh!

baijumeswani commented Apr 24, 2025

Uh oh!

kory commented Apr 24, 2025 •

edited

Loading

Uh oh!

kory commented Apr 24, 2025 •

edited

Loading

Uh oh!

jywu-msft commented Apr 25, 2025

Uh oh!

jywu-msft commented Apr 25, 2025

Uh oh!

jywu-msft commented Apr 26, 2025 •

edited

Loading

Uh oh!

baijumeswani commented Apr 30, 2025

Uh oh!

ORT-GenAI should ship ARM64EC binaries for AMD64 python running on ARM64 Windows #1417

ORT-GenAI should ship ARM64EC binaries for AMD64 python running on ARM64 Windows #1417

Comments

kory commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

baijumeswani commented Apr 23, 2025

Uh oh!

kory commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

baijumeswani commented Apr 24, 2025

Uh oh!

kory commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kory commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

baijumeswani commented Apr 24, 2025

Uh oh!

baijumeswani commented Apr 24, 2025

Uh oh!

baijumeswani commented Apr 24, 2025

Uh oh!

kory commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kory commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jywu-msft commented Apr 25, 2025

Uh oh!

jywu-msft commented Apr 25, 2025

Uh oh!

jywu-msft commented Apr 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

baijumeswani commented Apr 30, 2025

Uh oh!

kory commented Apr 23, 2025 •

edited

Loading

kory commented Apr 23, 2025 •

edited

Loading

kory commented Apr 24, 2025 •

edited

Loading

kory commented Apr 24, 2025 •

edited

Loading

kory commented Apr 24, 2025 •

edited

Loading

kory commented Apr 24, 2025 •

edited

Loading

jywu-msft commented Apr 26, 2025 •

edited

Loading