Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WebNN] Better int64 integration #23831

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

[WebNN] Better int64 integration #23831

wants to merge 3 commits into from

Conversation

Honry
Copy link
Contributor

@Honry Honry commented Feb 27, 2025

This PR adds some workarounds to enable int64 support for some WebNN backends which don't support int64 data type.

  • Do not fallback ops that are specifically due to the int64 limitation.
  • Convert all int64 initializer and input values to int32 and handle potential overflow errors.
  • Register all int64 model inputs and outputs as int32 ml-tensor.
  • Handle ONNX ops that need inputs or outputs conversion between int64 and int32. e.g. ArgMax, ArgMin, Cast, etc.
  • Convert int64 output data back to int32.
  • Disallow int64 outputs as 'ml-tensor' preferredOutputLocation.

Fixed #21401

@Honry
Copy link
Contributor Author

Honry commented Feb 27, 2025

@fdwr, @guschmue, @egalli, PTAL, thanks!

@guschmue guschmue added the ep:WebNN WebNN execution provider label Feb 27, 2025
@guschmue
Copy link
Contributor

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline

@guschmue
Copy link
Contributor

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

@guschmue
Copy link
Contributor

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@guschmue
Copy link
Contributor

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

Copy link
Contributor

@fdwr fdwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Wanming. It's a tough problem to think about, but the change isn't as large as I was expecting.

Comment on lines 92 to 93
public shouldConvertInt64toInt32: boolean = false;
public isInt64ToInt32Converted: boolean = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall hearing from Austin Sullivan some limitations in CoreML about what data types you could bind to the graph inputs/outputs that differed from what operators could internally use (IIRC, the model input/output types were a subset of the operator types), meaning we might need upcasts/downcasts for other data types too. So would it be cleaner to just store a "dataType" and an "originalDataType"? I think both of these booleans could be trivially derived from that (where dataType == 64 and originalDataType == 32, or vice versa).

Copy link
Contributor

@fdwr fdwr Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah, here is @philloooo's comment in webmachinelearning/webnn#675

CoreML only supports int32, int16, uint16, int8, uint8 . And at model boundary, it only supports int32, float32, float16

So if we need to send a uint8 tensor (a common pixel format) to CoreML graph inputs, will we need to upcast to int32? Of course u/int8 inputs don't need to be part of this change, but I want to check whether this approach of a number of booleans will scale cleanly.

(imo, the WebNN model builder would ideally just handle uint8 in the backend rather than each caller needing to repeat such logic which varies across OS's 🙃, or CoreML would just accept uint8 directly, but handling it in ORT is our option for now)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

We still need an additional boolean flag to indicate whether the int64 data has been converted, specifically for cases where the user invokes the download method in this file. If it has been converted, we should converted it back to int32.

Besides, from the perspective of the getCachedTensor method, introducing a boolean flag (shouldConvertInt64toInt32) would provide better clarity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My inquiry was whether we even need a bunch of boolean flags (and consider the future if we add int8 -> int32, how many more boolean flags there will be to test and pass around 🙃), if we instead just had a field for the "coerced data type" or "fallback data type" in the tensor wrapper.

Copy link
Contributor Author

@Honry Honry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fdwr, thanks much for your careful review, addressed most comments, PTAL again. ❤

Comment on lines 92 to 93
public shouldConvertInt64toInt32: boolean = false;
public isInt64ToInt32Converted: boolean = false;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

We still need an additional boolean flag to indicate whether the int64 data has been converted, specifically for cases where the user invokes the download method in this file. If it has been converted, we should converted it back to int32.

Besides, from the perspective of the getCachedTensor method, introducing a boolean flag (shouldConvertInt64toInt32) would provide better clarity.

fdwr
fdwr previously approved these changes Mar 5, 2025
Copy link
Contributor

@fdwr fdwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀 Would be good for @guschmue or @egalli to look at the JSEP and shared WASM parts. The rest looks alright to me.

Comment on lines 92 to 93
public shouldConvertInt64toInt32: boolean = false;
public isInt64ToInt32Converted: boolean = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My inquiry was whether we even need a bunch of boolean flags (and consider the future if we add int8 -> int32, how many more boolean flags there will be to test and pass around 🙃), if we instead just had a field for the "coerced data type" or "fallback data type" in the tensor wrapper.

@fdwr
Copy link
Contributor

fdwr commented Mar 5, 2025

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline

@fdwr
Copy link
Contributor

fdwr commented Mar 5, 2025

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

@fdwr
Copy link
Contributor

fdwr commented Mar 5, 2025

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI

@fdwr
Copy link
Contributor

fdwr commented Mar 5, 2025

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@fdwr
Copy link
Contributor

fdwr commented Mar 6, 2025

/azp run Big Models (Llama2_7B_ONNX Llama2_7B_ONNX)

Copy link

No pipelines are associated with this pull request.

@fdwr
Copy link
Contributor

fdwr commented Mar 6, 2025

The failure for Big Models (Llama2_7B_ONNX Llama2_7B_ONNX) appears to be a device space issue (link):

/opt/rh/gcc-toolset-11/root/usr/bin/ranlib: libprotocd.a: No space left on device

And there are linter issues:

/mnt/vss/_work/1/s/js/web/lib/wasm/jsep/backend-webnn.ts
  291:5   error  Type boolean trivially inferred from a boolean literal, remove type annotation  @typescript-eslint/no-inferrable-types
  376:41  error  ["input"] is better written in dot notation                                     dot-notation
  376:50  error  ["dataTypes"] is better written in dot notation                                 dot-notation

/mnt/vss/_work/1/s/js/web/lib/wasm/jsep/webnn/tensor-manager.ts
   92:3   error  Type boolean trivially inferred from a boolean literal, remove type annotation  @typescript-eslint/no-inferrable-types
   93:3   error  Type boolean trivially inferred from a boolean literal, remove type annotation  @typescript-eslint/no-inferrable-types
  154:25  error  'convertInt32ToInt64' was used before it was defined                            @typescript-eslint/no-use-before-define
  168:11  error  Redundant use of `await` on a return value                                      no-return-await
  169:11  error  Redundant use of `await` on a return value                                      no-return-await
  221:58  error  ["input"] is better written in dot notation                                     dot-notation
  221:67  error  ["dataTypes"] is better written in dot notation                                 dot-notation
  223:7   error  Assignment to function parameter 'dataType'                                     no-param-reassign
  267:9   error  Assignment to function parameter 'data'                                         no-param-reassign
  267:16  error  'convertInt64ToInt32' was used before it was defined                            @typescript-eslint/no-use-before-define
  290:12  error  'convertInt32ToInt64' was used before it was defined                            @typescript-eslint/no-use-before-define
  431:5   error  Type boolean trivially inferred from a boolean literal, remove type annotation  @typescript-eslint/no-inferrable-types
  469:8   error  Use const or class constructors instead of named functions                      prefer-arrow/prefer-arrow-functions
  498:1   error  Use const or class constructors instead of named functions                      prefer-arrow/prefer-arrow-functions

Silly linter, complaining about explicit boolean. :/

Honry added 3 commits March 7, 2025 11:12
This PR adds some workarounds to enable int64 support for some WebNN
backends which don't support int64 data type.

- Do not fallback ops that are specifically due to the int64 limitation.
- Convert all int64 initializer and input values to int32 and handle
 potential overflow errors.
- Register all int64 model intputs and outputs as int32 ml-tensor.
- Handle ONNX ops that need intputs or outputs conversion between int64
and int32. e.g. ArgMax, ArgMin, Cast, etc.
- Convert int64 output data back to int32.
- Disallow int64 outputs as 'ml-tensor' preferredOutputLocation.

Fixed microsoft#21401
@Honry
Copy link
Contributor Author

Honry commented Mar 7, 2025

The failure for Big Models (Llama2_7B_ONNX Llama2_7B_ONNX) appears to be a device space issue (link):

/opt/rh/gcc-toolset-11/root/usr/bin/ranlib: libprotocd.a: No space left on device

And there are linter issues:

/mnt/vss/_work/1/s/js/web/lib/wasm/jsep/backend-webnn.ts
  291:5   error  Type boolean trivially inferred from a boolean literal, remove type annotation  @typescript-eslint/no-inferrable-types
  376:41  error  ["input"] is better written in dot notation                                     dot-notation
  376:50  error  ["dataTypes"] is better written in dot notation                                 dot-notation

/mnt/vss/_work/1/s/js/web/lib/wasm/jsep/webnn/tensor-manager.ts
   92:3   error  Type boolean trivially inferred from a boolean literal, remove type annotation  @typescript-eslint/no-inferrable-types
   93:3   error  Type boolean trivially inferred from a boolean literal, remove type annotation  @typescript-eslint/no-inferrable-types
  154:25  error  'convertInt32ToInt64' was used before it was defined                            @typescript-eslint/no-use-before-define
  168:11  error  Redundant use of `await` on a return value                                      no-return-await
  169:11  error  Redundant use of `await` on a return value                                      no-return-await
  221:58  error  ["input"] is better written in dot notation                                     dot-notation
  221:67  error  ["dataTypes"] is better written in dot notation                                 dot-notation
  223:7   error  Assignment to function parameter 'dataType'                                     no-param-reassign
  267:9   error  Assignment to function parameter 'data'                                         no-param-reassign
  267:16  error  'convertInt64ToInt32' was used before it was defined                            @typescript-eslint/no-use-before-define
  290:12  error  'convertInt32ToInt64' was used before it was defined                            @typescript-eslint/no-use-before-define
  431:5   error  Type boolean trivially inferred from a boolean literal, remove type annotation  @typescript-eslint/no-inferrable-types
  469:8   error  Use const or class constructors instead of named functions                      prefer-arrow/prefer-arrow-functions
  498:1   error  Use const or class constructors instead of named functions                      prefer-arrow/prefer-arrow-functions

Thanks @fdwr, lint issues fixed, and I also rebased the code to latest main to solve a conflict.

Copy link
Contributor

@fdwr fdwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@fdwr
Copy link
Contributor

fdwr commented Mar 7, 2025

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline

@fdwr
Copy link
Contributor

fdwr commented Mar 7, 2025

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

@fdwr
Copy link
Contributor

fdwr commented Mar 7, 2025

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI

@fdwr
Copy link
Contributor

fdwr commented Mar 7, 2025

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:WebNN WebNN execution provider
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[WebNN EP] Support int64 output data type for CoreML backend
3 participants