-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WebNN] Better int64 integration #23831
base: main
Are you sure you want to change the base?
Conversation
/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline |
/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline |
/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models |
Azure Pipelines successfully started running 2 pipeline(s). |
/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI |
Azure Pipelines successfully started running 4 pipeline(s). |
1 similar comment
Azure Pipelines successfully started running 4 pipeline(s). |
Azure Pipelines successfully started running 9 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Wanming. It's a tough problem to think about, but the change isn't as large as I was expecting.
onnxruntime/core/providers/webnn/builders/impl/cast_op_builder.cc
Outdated
Show resolved
Hide resolved
public shouldConvertInt64toInt32: boolean = false; | ||
public isInt64ToInt32Converted: boolean = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall hearing from Austin Sullivan some limitations in CoreML about what data types you could bind to the graph inputs/outputs that differed from what operators could internally use (IIRC, the model input/output types were a subset of the operator types), meaning we might need upcasts/downcasts for other data types too. So would it be cleaner to just store a "dataType" and an "originalDataType"? I think both of these booleans could be trivially derived from that (where dataType == 64 and originalDataType == 32, or vice versa).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah, here is @philloooo's comment in webmachinelearning/webnn#675
CoreML only supports int32, int16, uint16, int8, uint8 . And at model boundary, it only supports int32, float32, float16
So if we need to send a uint8 tensor (a common pixel format) to CoreML graph inputs, will we need to upcast to int32? Of course u/int8 inputs don't need to be part of this change, but I want to check whether this approach of a number of booleans will scale cleanly.
(imo, the WebNN model builder would ideally just handle uint8 in the backend rather than each caller needing to repeat such logic which varies across OS's 🙃, or CoreML would just accept uint8 directly, but handling it in ORT is our option for now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
We still need an additional boolean flag to indicate whether the int64 data has been converted, specifically for cases where the user invokes the download
method in this file. If it has been converted, we should converted it back to int32.
Besides, from the perspective of the getCachedTensor
method, introducing a boolean flag (shouldConvertInt64toInt32
) would provide better clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My inquiry was whether we even need a bunch of boolean flags (and consider the future if we add int8 -> int32, how many more boolean flags there will be to test and pass around 🙃), if we instead just had a field for the "coerced data type" or "fallback data type" in the tensor wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fdwr, thanks much for your careful review, addressed most comments, PTAL again. ❤
onnxruntime/core/providers/webnn/builders/impl/cast_op_builder.cc
Outdated
Show resolved
Hide resolved
public shouldConvertInt64toInt32: boolean = false; | ||
public isInt64ToInt32Converted: boolean = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
We still need an additional boolean flag to indicate whether the int64 data has been converted, specifically for cases where the user invokes the download
method in this file. If it has been converted, we should converted it back to int32.
Besides, from the perspective of the getCachedTensor
method, introducing a boolean flag (shouldConvertInt64toInt32
) would provide better clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public shouldConvertInt64toInt32: boolean = false; | ||
public isInt64ToInt32Converted: boolean = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My inquiry was whether we even need a bunch of boolean flags (and consider the future if we add int8 -> int32, how many more boolean flags there will be to test and pass around 🙃), if we instead just had a field for the "coerced data type" or "fallback data type" in the tensor wrapper.
/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline |
/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline |
/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI |
/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models |
Azure Pipelines successfully started running 2 pipeline(s). |
Azure Pipelines successfully started running 4 pipeline(s). |
1 similar comment
Azure Pipelines successfully started running 4 pipeline(s). |
Azure Pipelines successfully started running 9 pipeline(s). |
/azp run Big Models (Llama2_7B_ONNX Llama2_7B_ONNX) |
No pipelines are associated with this pull request. |
The failure for Big Models (Llama2_7B_ONNX Llama2_7B_ONNX) appears to be a device space issue (link):
And there are linter issues:
Silly linter, complaining about explicit boolean. :/ |
This PR adds some workarounds to enable int64 support for some WebNN backends which don't support int64 data type. - Do not fallback ops that are specifically due to the int64 limitation. - Convert all int64 initializer and input values to int32 and handle potential overflow errors. - Register all int64 model intputs and outputs as int32 ml-tensor. - Handle ONNX ops that need intputs or outputs conversion between int64 and int32. e.g. ArgMax, ArgMin, Cast, etc. - Convert int64 output data back to int32. - Disallow int64 outputs as 'ml-tensor' preferredOutputLocation. Fixed microsoft#21401
Thanks @fdwr, lint issues fixed, and I also rebased the code to latest main to solve a conflict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline |
/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline |
/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI |
/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models |
Azure Pipelines successfully started running 2 pipeline(s). |
Azure Pipelines successfully started running 4 pipeline(s). |
1 similar comment
Azure Pipelines successfully started running 4 pipeline(s). |
Azure Pipelines successfully started running 9 pipeline(s). |
This PR adds some workarounds to enable int64 support for some WebNN backends which don't support int64 data type.
Fixed #21401