What's new?
This release adds an optimized CPU/MLAS implementation of DequantizeLinear (8 bit) and introduces the build option client_package_build, which enables default options that are more appropriate for client/on-device workloads (e.g., disable thread spinning by default).
Build System & Packages
- Add –client_package_build option (#25351) - @jywu-msft
- Remove the python installation steps from win-qnn-arm64-ci-pipeline.yml (#25552) - @snnn
CPU EP
- Add multithreaded/vectorized implementation of DequantizeLinear for int8 and uint8 inputs (SSE2, NEON) (#24818) - @adrianlizarraga
QNN EP
- Add support for the Upsample, Einsum, LSTM, and CumSum operators (#24265, #24616, #24646, #24820) - @quic-zhaoxul, @1duo, @chenweng-quic, @Akupadhye
- Fuse scale into Softmax (#24809) - @qti-yuduo
- Enable DSP queue polling when performance is set to “burst” mode (#25361) - @quic-calvnguy
- Update QNN SDK to version 2.36.1 (#25388) - @qti-jkilpatrick
- Include the license file from QNN SDK in the Microsoft.ML.OnnxRunitme.QNN NuGet package (#25158) - @HectorSVC