Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: Add missing 64-bit integers support for some reduction operators #695

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

huningxin
Copy link
Contributor

@huningxin huningxin commented May 27, 2024

reduceL1, reduceProduct, reduceSum and reduceSumSquare already support 32-bit integers. 64-bit integers should also be supported.

Fix #283, #694


Preview | Diff

`reduceL1`, `reduceProduct`, `reduceSum` and `reduceSumSquare` already
support 32-bit integers. 64-bit integers should also be supported.

Fix webmachinelearning#283, webmachinelearning#694
@Honry
Copy link
Contributor

Honry commented May 27, 2024

@fdwr, @huningxin, starting from ONNX Opset 18, all Reduce* ops support int64 and uint64 in DML EP, see

https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/dml/DmlExecutionProvider/src/Operators/OperatorRegistration.cpp#L937-L979

While from https://learn.microsoft.com/en-us/windows/ai/directml/dml-feature-level-history#dml_feature_level_5_0

It mentions the data type expanding for

  • DML_REDUCE_FUNCTION_L1
  • DML_REDUCE_FUNCTION_MAX
  • DML_REDUCE_FUNCTION_MIN
  • DML_REDUCE_FUNCTION_MULTIPLY
  • DML_REDUCE_FUNCTION_SUM
  • DML_REDUCE_FUNCTION_SUM_SQUARE

But it doesn't mention floating point values for reduceMean and others. Is it a documentation issue?

Copy link
Contributor

@inexorabletash inexorabletash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@a-sully
Copy link
Contributor

a-sully commented May 28, 2024

#654 proposes removing 64-bit int support for some operators. What's the rationale for adding 64-bit int support for these operators?

This may be a discussion for another issue, but does WebNN need to support 64-bit integer types at all given trends towards smaller data types (e.g. int4 and int8) for on-device ML and that some backends have little/no support for 64-bit values in the first place? (e.g. CoreML has ~no support, and most GPUs emulate support for int64)

@huningxin
Copy link
Contributor Author

@a-sully

#654 proposes removing 64-bit int support for some operators.

IIUC, #654 mentioned lower feature level DirectML (before FL 4.1) doesn't support 64-bit integers for some operators. Higher feature DirectML doesn't have that issue. As @inexorabletash mentioned we may assume the browser always carries a copy of library that ensures the highest feature level. If that's the case, we may close that issue.

What's the rationale for adding 64-bit int support for these operators?

As #694 mentioned, the safety checker model for stable diffusion turbo demo uses reduceSum for int64 input and DirectML DML_REDUCE_FUNCTION_SUM supports that 64-bit integers.

some backends have little/no support for 64-bit values in the first place? (e.g. CoreML has ~no support, and most GPUs emulate support for int64)

We may want to follow the backend difference through #463 .

chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request May 31, 2024
This CL adds 64-bit integer support for reduceL1, reduceProduct,
reduceSum and reduceSumSquare. It's based on the spec change being proposed by webmachinelearning/webnn#695.

Bug: 328567884
Change-Id: Ia858b47082f81a9eb6ab3b9403e3773a752eb608
Cq-Include-Trybots: luci.chromium.try:win11-blink-rel,mac14-blink-rel,mac14.arm64-blink-rel
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Jun 1, 2024
This CL adds 64-bit integer support for reduceL1, reduceProduct,
reduceSum and reduceSumSquare. It's based on the spec change being proposed by webmachinelearning/webnn#695.

Bug: 328567884
Change-Id: Ia858b47082f81a9eb6ab3b9403e3773a752eb608
Cq-Include-Trybots: luci.chromium.try:win11-blink-rel,mac14-blink-rel,mac14.arm64-blink-rel
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Jun 1, 2024
This CL adds 64-bit integer support for reduceL1, reduceProduct,
reduceSum and reduceSumSquare. It's based on the spec change being proposed by webmachinelearning/webnn#695.

Bug: 328567884
Change-Id: Ia858b47082f81a9eb6ab3b9403e3773a752eb608
Cq-Include-Trybots: luci.chromium.try:win11-blink-rel,mac14-blink-rel,mac14.arm64-blink-rel
aarongable pushed a commit to chromium/chromium that referenced this pull request Jun 3, 2024
This CL adds 64-bit integer support for reduceL1, reduceProduct,
reduceSum and reduceSumSquare. It's based on the spec change being proposed by webmachinelearning/webnn#695.

Bug: 328567884
Change-Id: Ia858b47082f81a9eb6ab3b9403e3773a752eb608
Cq-Include-Trybots: luci.chromium.try:win11-blink-rel,mac14-blink-rel,mac14.arm64-blink-rel
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5569544
Reviewed-by: ningxin hu <ningxin.hu@intel.com>
Reviewed-by: Austin Sullivan <asully@chromium.org>
Commit-Queue: Lisha Guo <lisha.guo@intel.com>
Cr-Commit-Position: refs/heads/main@{#1309157}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Jun 3, 2024
This CL adds 64-bit integer support for reduceL1, reduceProduct,
reduceSum and reduceSumSquare. It's based on the spec change being proposed by webmachinelearning/webnn#695.

Bug: 328567884
Change-Id: Ia858b47082f81a9eb6ab3b9403e3773a752eb608
Cq-Include-Trybots: luci.chromium.try:win11-blink-rel,mac14-blink-rel,mac14.arm64-blink-rel
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5569544
Reviewed-by: ningxin hu <ningxin.hu@intel.com>
Reviewed-by: Austin Sullivan <asully@chromium.org>
Commit-Queue: Lisha Guo <lisha.guo@intel.com>
Cr-Commit-Position: refs/heads/main@{#1309157}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Jun 3, 2024
This CL adds 64-bit integer support for reduceL1, reduceProduct,
reduceSum and reduceSumSquare. It's based on the spec change being proposed by webmachinelearning/webnn#695.

Bug: 328567884
Change-Id: Ia858b47082f81a9eb6ab3b9403e3773a752eb608
Cq-Include-Trybots: luci.chromium.try:win11-blink-rel,mac14-blink-rel,mac14.arm64-blink-rel
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5569544
Reviewed-by: ningxin hu <ningxin.hu@intel.com>
Reviewed-by: Austin Sullivan <asully@chromium.org>
Commit-Queue: Lisha Guo <lisha.guo@intel.com>
Cr-Commit-Position: refs/heads/main@{#1309157}
sadym-chromium pushed a commit to web-platform-tests/wpt that referenced this pull request Jun 3, 2024
This CL adds 64-bit integer support for reduceL1, reduceProduct,
reduceSum and reduceSumSquare. It's based on the spec change being proposed by webmachinelearning/webnn#695.

Bug: 328567884
Change-Id: Ia858b47082f81a9eb6ab3b9403e3773a752eb608
Cq-Include-Trybots: luci.chromium.try:win11-blink-rel,mac14-blink-rel,mac14.arm64-blink-rel
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5569544
Reviewed-by: ningxin hu <ningxin.hu@intel.com>
Reviewed-by: Austin Sullivan <asully@chromium.org>
Commit-Queue: Lisha Guo <lisha.guo@intel.com>
Cr-Commit-Position: refs/heads/main@{#1309157}
@a-sully
Copy link
Contributor

a-sully commented Jun 3, 2024

What's the rationale for adding 64-bit int support for these operators?

As #694 mentioned, the safety checker model for stable diffusion turbo demo uses reduceSum for int64 input and DirectML DML_REDUCE_FUNCTION_SUM supports that 64-bit integers.

I know this is already implemented in Chromium so I don't mean to quibble too much over this, but I think it's worth questioning whether there should be a process for changing supported data types similar to the existing process for adding new operators

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Jun 5, 2024
…rt for some reduce operators, a=testonly

Automatic update from web-platform-tests
WebNN: Add missing 64-bit integers support for some reduce operators

This CL adds 64-bit integer support for reduceL1, reduceProduct,
reduceSum and reduceSumSquare. It's based on the spec change being proposed by webmachinelearning/webnn#695.

Bug: 328567884
Change-Id: Ia858b47082f81a9eb6ab3b9403e3773a752eb608
Cq-Include-Trybots: luci.chromium.try​:win11-blink-rel,mac14-blink-rel,mac14.arm64-blink-rel
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5569544
Reviewed-by: ningxin hu <ningxin.hu@intel.com>
Reviewed-by: Austin Sullivan <asully@chromium.org>
Commit-Queue: Lisha Guo <lisha.guo@intel.com>
Cr-Commit-Position: refs/heads/main@{#1309157}

--

wpt-commits: 50c7448895efb9f6cfe0b37d77d018ea549d4b99
wpt-pr: 46568
ErichDonGubler pushed a commit to ErichDonGubler/firefox that referenced this pull request Jun 10, 2024
…rt for some reduce operators, a=testonly

Automatic update from web-platform-tests
WebNN: Add missing 64-bit integers support for some reduce operators

This CL adds 64-bit integer support for reduceL1, reduceProduct,
reduceSum and reduceSumSquare. It's based on the spec change being proposed by webmachinelearning/webnn#695.

Bug: 328567884
Change-Id: Ia858b47082f81a9eb6ab3b9403e3773a752eb608
Cq-Include-Trybots: luci.chromium.try​:win11-blink-rel,mac14-blink-rel,mac14.arm64-blink-rel
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5569544
Reviewed-by: ningxin hu <ningxin.hu@intel.com>
Reviewed-by: Austin Sullivan <asully@chromium.org>
Commit-Queue: Lisha Guo <lisha.guo@intel.com>
Cr-Commit-Position: refs/heads/main@{#1309157}

--

wpt-commits: 50c7448895efb9f6cfe0b37d77d018ea549d4b99
wpt-pr: 46568
@huningxin huningxin changed the title Add missing 64-bit integers support for some reduction operators Bugfix: Add missing 64-bit integers support for some reduction operators Jun 11, 2024
@huningxin
Copy link
Contributor Author

@a-sully

but I think it's worth questioning whether there should be a process for changing supported data types similar to the existing process for adding new operators

I'd support adding a process for updating the operators. A PR is proposed: #705. PTAL.

This PR is a follow-up bug fix for #283. In that issue, @fdwr confirmed L1, SUM_SQUARE, MULTIPLY, SUM reduce functions support 32 and 64 bit integers. However, I forgot to add 64-bit integers into the table of #283 which causes @inexorabletash 's PR #646 missing the 64-bit integers support for those reduce operators.

As mentioned, this issue causes the safety checker model for stable diffusion turbo demo failing because it uses reduceSum with int64 input. Chromium already fixed it and I also updated the table of #283.

@fdwr
Copy link
Collaborator

fdwr commented Jun 12, 2024

But it doesn't mention floating point values for reduceMean and others. Is it a documentation issue?

@Honry : The doc is correct. DML_OPERATOR_REDUCE + REDUCE_FUNCTION_AVERAGE https://learn.microsoft.com/en-us/windows/win32/api/directml/ns-directml-dml_reduce_operator_desc only supports float. We didn't support int64 for mean/average reduction because:

  • (1) the averaging division most likely would produce a floating-point result anyway
  • (2) it saves shader space if not actually used (and there weren't any clients for int64 average at the time)
  • (3) it raises policy questions for fractional values of whether to truncate, floor, ceil, round to nearest evens...
  • (4) it's trivial to implement with an explicit REDUCE_SUM followed by DIVIDE.

Notice that similarly ReduceL1 was supported, but ReduceL2 (which has a square root that yields fractional values) does not have an int64 version (and also LogSum, LogSumExp...).

{
  "groupFieldValues": ["ARGMIN", "ARGMAX"],
  "capabilities": [
    {"featureLevel": "1.0", "InputDataType": ["float16", "float32"], "OutputDataType": ["uint32"], "DefaultRank": {"Min": 4, "Max": 4}},
    {"featureLevel": "2.1", "InputDataType": ["AllTensorDataTypes8To32"], "OutputDataType": ["uint32"], "DefaultRank": {"Min": 4, "Max": 4}},
    {"featureLevel": "3.0", "InputDataType": ["AllTensorDataTypes8To32"], "OutputDataType": ["AllNonFloatTensorDataTypes32To64"], "DefaultRank": {"Min": 1, "Max": 8}},
    {"featureLevel": "4.1", "InputDataType": ["AllTensorDataTypes8To64ExceptFloat64"], "OutputDataType": ["AllNonFloatTensorDataTypes32To64"], "DefaultRank": {"Min": 1, "Max": 8}}
  ]
},{
  "groupFieldValues": ["AVERAGE", "L2", "LOG_SUM", "LOG_SUM_EXP"],
  "capabilities": [
    {"featureLevel": "1.0", "InputDataType": ["float16", "float32"], "OutputDataType": ["float16", "float32"], "DefaultRank": {"Min": 4, "Max": 4}},
    {"featureLevel": "3.0", "InputDataType": ["float16", "float32"], "OutputDataType": ["float16", "float32"], "DefaultRank": {"Min": 1, "Max": 8}}
  ]
},{
  "groupFieldValues": ["L1", "SUM_SQUARE"],
  "capabilities": [
    {"featureLevel": "1.0", "InputDataType": ["float16", "float32"], "OutputDataType": ["float16", "float32"], "DefaultRank": {"Min": 4, "Max": 4}},
    {"featureLevel": "3.0", "InputDataType": ["float16", "float32"], "OutputDataType": ["float16", "float32"], "DefaultRank": {"Min": 1, "Max": 8}},
    {"featureLevel": "5.0", "InputDataType": ["AllTensorDataTypes32To64ExceptFloat64"], "OutputDataType": ["AllTensorDataTypes32To64ExceptFloat64"], "DefaultRank": {"Min": 1, "Max": 8}}
  ]
},{
  "groupFieldValues": ["MIN", "MAX"],
  "capabilities": [
    {"featureLevel": "1.0", "InputDataType": ["float16", "float32"], "OutputDataType": ["float16", "float32"], "DefaultRank": {"Min": 4, "Max": 4}},
    {"featureLevel": "2.1", "InputDataType": ["float16", "float32", "uint32", "int32"], "OutputDataType": ["float16", "float32", "uint32", "int32"], "DefaultRank": {"Min": 4, "Max": 4}},
    {"featureLevel": "3.0", "InputDataType": ["AllTensorDataTypes8To32"], "OutputDataType": ["AllTensorDataTypes8To32"], "DefaultRank": {"Min": 1, "Max": 8}},
    {"featureLevel": "5.0", "InputDataType": ["AllTensorDataTypes8To64ExceptFloat64"], "OutputDataType": ["AllTensorDataTypes8To64ExceptFloat64"], "DefaultRank": {"Min": 1, "Max": 8}}
  ]
},{
  "groupFieldValues": ["MULTIPLY", "SUM"],
  "capabilities": [
    {"featureLevel": "1.0", "InputDataType": ["float16", "float32"], "OutputDataType": ["float16", "float32"], "DefaultRank": {"Min": 4, "Max": 4}},
    {"featureLevel": "2.1", "InputDataType": ["float16", "float32", "uint32", "int32"], "OutputDataType": ["float16", "float32", "uint32", "int32"], "DefaultRank": {"Min": 4, "Max": 4}},
    {"featureLevel": "3.0", "InputDataType": ["float16", "float32", "uint32", "int32"], "OutputDataType": ["float16", "float32", "uint32", "int32"], "DefaultRank": {"Min": 1, "Max": 8}},
    {"featureLevel": "5.0", "InputDataType": ["AllTensorDataTypes32To64ExceptFloat64"], "OutputDataType": ["AllTensorDataTypes32To64ExceptFloat64"], "DefaultRank": {"Min": 1, "Max": 8}}
  ]
}

Copy link
Collaborator

@fdwr fdwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks Ningxin.

@philloooo
Copy link
Contributor

As mentioned, this issue causes the safety checker model for stable diffusion turbo demo failing because it uses reduceSum with int64 input.

@huningxin given that int64 is not supported on CoreML, how hard is it for the stable diffusion turbo demo to convert the model from int64 to int32?

@huningxin
Copy link
Contributor Author

@philloooo

As mentioned, this issue causes the safety checker model for stable diffusion turbo demo failing because it uses reduceSum with int64 input.

@huningxin given that int64 is not supported on CoreML, how hard is it for the stable diffusion turbo demo to convert the model from int64 to int32?

The original model casts to int64 before reduceSum. We'll try to change it to int32 and see whether it still works. Will keep this thread posted.

@huningxin
Copy link
Contributor Author

@philloooo

As mentioned, this issue causes the safety checker model for stable diffusion turbo demo failing because it uses reduceSum with int64 input.

@huningxin given that int64 is not supported on CoreML, how hard is it for the stable diffusion turbo demo to convert the model from int64 to int32?

The original model casts to int64 before reduceSum. We'll try to change it to int32 and see whether it still works. Will keep this thread posted.

After the investigation, the two reduceSum operators of safety checker model are able to take int32 input. @Honry helped create an int32 version and host it at https://huggingface.co/lwanming/sd-turbo-ort-web/blob/main/safety_checker_int32_reduceSum.onnx, feel free to check it out.

More details:

  1. The first reduceSum (name "/ReduceSum") takes a 2-D input tensor in shape [1, 3] and reduce along axis 1. The tensor value is either 0 or 1 because it is the output of preceding greater operator. It's safe to cast the input tensor to int32 because it won't overflow when summing three 1s together.
  2. Similarly, the second reduceSum (name "/ReduceSum_1") takes a 2-D input tensor as output of preceding greater operator in shape [1, 17] and reduce along axis 1. It's also safe to cast the input tensor to int32 for the same reason.

@fdwr
Copy link
Collaborator

fdwr commented Jun 13, 2024

@mwyrzykowski Do you know any plans to update CoreML to support int64? It's oddly inconsistent that all the other Apple ML API's (BNNS, MPS, MLX) support int64, but CoreML does not 🤔. Is CoreML still the right API these days to implement a WebNN backend, or is it left behind by newer ones? Thanks for any information or redirections.

Type BNNS MPS MLX CoreML
uint8 BNNSDataTypeUInt8 MPSDataType.uInt8 mx.uint8 x
uint16 BNNSDataTypeUInt16 MPSDataType.uInt16 mx.uint16 x
uint32 BNNSDataTypeUInt32 MPSDataType.uInt32 mx.uint32 x
uint64 BNNSDataTypeUInt64 MPSDataType.uInt64 mx.uint64 x 🤔
int8 BNNSDataTypeInt8 MPSDataType.int8 mx.int8 x
int16 BNNSDataTypeInt16 MPSDataType.int16 mx.int16 x
int32 BNNSDataTypeInt32 MPSDataType.int32 mx.int32 ArrayFeatureType.ArrayDataType.INT32
int64 BNNSDataTypeInt64 MPSDataType.int64 mx.int64 x 🤔
float16f10e5s1 IEEE BNNSDataTypeFloat16 MPSDataType.float16 mx.float16 ArrayFeatureType.ArrayDataType.FLOAT16
float16f7e8s1 Brain BNNSDataTypeBFloat16 MPSDataType.bFloat16 x x
float32f23e8s1 IEEE BNNSDataTypeFloat32 MPSDataType.float32 mx.float32 ArrayFeatureType.ArrayDataType.FLOAT32
float64f52e11s1 IEEE x x x ArrayFeatureType.ArrayDataType.DOUBLE
float16 x 2 x MPSDataType.complexFloat16 x x
float32 x 2 x MPSDataType.complexFloat32 x x
float64 x 2 x x x x
bool8 BNNSDataTypeBoolean MPSDataType.bool bool_ x

@mwyrzykowski
Copy link

@fdwr BNNS only runs on the CPU and MLX being open source can not use the ANE / NPU. MPS being backed by Metal only runs on the GPU.

If running on the ANE is a goal, CoreML is necessary.

@a-sully
Copy link
Contributor

a-sully commented Jun 13, 2024

If running on the ANE is a goal, CoreML is necessary.

Can confirm this is an important goal :)

@fdwr
Copy link
Collaborator

fdwr commented Jun 13, 2024

If running on the ANE is a goal, CoreML is necessary.

Can confirm this is an important goal :)

Concur, this is important (at least for those models that can fully run on ANN or those parts of a model that are viable to run on it).

i3roly pushed a commit to i3roly/firefox-dynasty that referenced this pull request Jun 14, 2024
…rt for some reduce operators, a=testonly

Automatic update from web-platform-tests
WebNN: Add missing 64-bit integers support for some reduce operators

This CL adds 64-bit integer support for reduceL1, reduceProduct,
reduceSum and reduceSumSquare. It's based on the spec change being proposed by webmachinelearning/webnn#695.

Bug: 328567884
Change-Id: Ia858b47082f81a9eb6ab3b9403e3773a752eb608
Cq-Include-Trybots: luci.chromium.try​:win11-blink-rel,mac14-blink-rel,mac14.arm64-blink-rel
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5569544
Reviewed-by: ningxin hu <ningxin.hu@intel.com>
Reviewed-by: Austin Sullivan <asully@chromium.org>
Commit-Queue: Lisha Guo <lisha.guo@intel.com>
Cr-Commit-Position: refs/heads/main@{#1309157}

--

wpt-commits: 50c7448895efb9f6cfe0b37d77d018ea549d4b99
wpt-pr: 46568
guschmue pushed a commit to microsoft/onnxruntime that referenced this pull request Jun 17, 2024
WebNN Spec adds missing 64-bit integers support for `reduceL1`,
`reduceSum`, `reduceSumSquare` and `reduceProduct` ops at this
[PR](webmachinelearning/webnn#695), which has
already been implemented in Chromium. Update corresponding data type
constraints in WebNN EP.

Besides, WebNN CPU backend currently doesn't support `uint64` and
`uint32` for these ops.
gyagp pushed a commit to gyagp/onnxruntime that referenced this pull request Jun 18, 2024
…20912)

WebNN Spec adds missing 64-bit integers support for `reduceL1`,
`reduceSum`, `reduceSumSquare` and `reduceProduct` ops at this
[PR](webmachinelearning/webnn#695), which has
already been implemented in Chromium. Update corresponding data type
constraints in WebNN EP.

Besides, WebNN CPU backend currently doesn't support `uint64` and
`uint32` for these ops.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Specify the operand data type constraints of operation
7 participants