Add MLTensor explainer #754

a-sully · 2024-08-20T23:38:43Z

This explainer builds off the discussions in #482, #541, and many others

Note that #753 proposes renaming MLBuffer to MLTensor. I agree, and so this explainer reflects that

Feedback is welcome

huningxin

LGTM with minor comments, thanks!

mltensor-explainer.md

…hievable with identify

a-sully · 2024-08-22T00:12:06Z

@bbernhar or @RafaelCintron hopefully none of this comes as a surprise? Any feedback?

huningxin

LGTM!

bbernhar

Well written, @a-sully. Minor nits + some comments.

mltensor-explainer.md

bbernhar · 2024-08-22T16:13:51Z

mltensor-explainer.md

+
+For example [an `MLContext` may be created with a `GPUDevice`](https://www.w3.org/TR/webnn/#dom-ml-createcontext-gpudevice), and creating an `MLTensor` from this context with the `MLTensorUsage.WEBGPU_INTEROP` flag expresses a clear intention to share the tensor with the given `GPUDevice`. However, there is no guarantee that sharing this tensor with WebGPU will be zero-copy.
+
+The `MLTensorUsage.READ_FROM` and `MLTensorUsage.WRITE_TO` flags likewise are hints to the user agent indicating that the underlying data will be read and written to, respectively, by script.


Currently, specifying the correct MLTensorUsage is a requirement (and not a hint). Perhaps in the future we can relax that.

Ah, you're correct that usages are required from the JS API's perspective (we'll throw TypeError if attempting to misuse an MLTensor), but this text is speaking to user agents, not web developers. From the perspective of the user agent, it can allocate the buffer wherever it wants, but we're recommending it use MLTensorUsage as a hint in that decision

I updated the first paragraph of this section to hopefully make this more clear. WDYT?

The user agent still needs to figure out which device the buffer needs to be allocated on where omitting any MLTensorUsage flag results could result into the buffer being left inaccessible (aka only used for output). Are you saying the user agent needs to either predict this and/or move/copy it around as needed?

I believe the problem you're describing relates to this question? https://github.com/webmachinelearning/webnn/pull/754/files#diff-0ee55cff3e7c8d2280b48fa37962505a398c7a18df2eb78e0497c76042054cc8R286

Sounds like the same problem. Also, if we eliminate deviceType then we're back to using hints here so I suppose we need to think about how this will work.

Correct, for the user agent it would stay "required" (for now).

just like how WebGPU can't force the system to use a "real" GPU (e.g. Warp exists)

AFAIK WebGPU doesn't allow a real GPU buffer to be created then used with a WARP device: a buffer must be created from the same device type, even if it happens to be CPU.

this explainer punts on finding a solution for this use case for now ¯(ツ)/¯

SGTM but currently, the explainer suggests this is possible as "hints" (ie. omitting flags == stays on deviceType) which I think is too speculative.

I don't want to haggle over this too much because this is an explainer and not a spec. As I mentioned above, the specification cannot dictate where/how the user agent allocates memory. When we specify MLTensor, I expect we'll add non-normative guidance

I also wouldn't bank on deviceType being around for much longer...

As I mentioned above, the specification cannot dictate where/how the user agent allocates memory.

We could specify new hints which describe how frequently the tensor's data will change which could help (but not dictate) the user agent's decision where best to allocate it. Another option is we specify which device the allocation will start from. Giving web developers no control over these allocations doesn't seem ideal and we'll probably need to help them there, beyond what this explainer offers.

If you still prefer to address this later, I'm OK with that too.

We could specify new hints which describe how frequently the tensor's data will change which could help (but not dictate) the user agent's decision where best to allocate it

Right, if it was useful to signal "this buffer will be used for WebGPU interop repeatedly and just read back to script once" or vice-versa, then the user agent may be able to make a more informed decision between the choices presented in #754 (comment):

Allocate the MLTensor as a "default" buffer, requiring an additional staging buffer for readBuffer(), or

Allocate the MLTensor as a "readback" buffer and whip up a new GPU buffer for interop

then the user agent may be able to make a more informed decision between the choices presented in #754 (comment)

The decision is only good as its last access and since inputs could be re-dispatched as outputs, I worry hints specified at creation could lead to bad choices being made repeatedly.

I think it's fine we have hints to determine what type of CPU allocation we start with (eg. read-back or upload buffer) but we'll also need to move it over to the GPU (eg. default buffer) then release the CPU allocation. If the web developer needs to WRITE or READ again then it'll move back to the CPU. If specified as such, web developers might have enough predictability to avoid staging buffers (and going OOM).

bbernhar · 2024-08-22T16:24:15Z

mltensor-explainer.md

+
+### Timelines
+
+WebNN uses a programming model similar to [WebGPU's](https://www.w3.org/TR/webgpu/#programming-model), in that compute tasks are posted to a timeline - which I've referred to as an "ML context timeline" throughout this document - separate from the content timeline (i.e. "script"). See [the WebGPU documentation of timelines](https://gpuweb.github.io/gpuweb/#programming-model-timelines) for more details.


nit: WebGPU folks, interested in WebNN, might want to know the "ML context timeline" here is equivalent to WebGPU's queue AND device timeline.

Hmm we haven't actually specified WebNN's timelines yet (tracked in #529) so I'm hesitant to make such assertive statements for now

I would say the example code already implies this and could be more clear:

// Rent out the MLTensor to WebGPU. const tensorizedGpuBuffer = gpuDevice.importExternalBuffer(mlTensor1);

The MLTensor being created by the MLContext must be made available to the gpuDevice which can be the same device also specified to createContext(), so they must be on an equivalent timeline.

FYI #754 (comment) proposes making importExternalBuffer() async, in part to decouple these timelines in the eyes of the spec

The explainer now contains this note: https://github.com/webmachinelearning/webnn/pull/754/files#diff-0ee55cff3e7c8d2280b48fa37962505a398c7a18df2eb78e0497c76042054cc8R277

Is there more to discuss here or can we resolve this thread?

Importing and returning the MLTensor are each points of synchronization between the respective WebNN and WebGPU timelines.

I have a slight preference to call out the device-queue timeline (separate from content/script timeline) since that is where we must control the execution of queue commands between WebNN and WebGPU. I'll leave it up to you if you want to comment it here or punt that for the real spec.

mltensor-explainer.md

a-sully · 2024-08-23T20:24:17Z

FYI @huningxin I'll be OOO next week so please feel free to merge this PR once @bbernhar is happy with the PR (please resolve comments which have been adequately addressed). I'm also happy to address some nits when I get back

Thank you all for the thorough reviews!

anssiko · 2024-08-26T14:24:38Z

@a-sully thanks for producing this explainer and please enjoy your time off this week!

We discussed this explainer a little on our call https://www.w3.org/2024/08/22-webmachinelearning-minutes.html#t04 and folks liked it.

Reviewers should focus on the overall design, open questions. Spec changes including IDL will come in separate PRs. As @a-sully suggests, any IDL is tentative and is subject to change.

webgpu interop issues addressed by the explainer can be marked for closure.

@a-sully MLTensor (and broadly WebGPU interop) was proposed as one of the TPAC topics webmachinelearning/meetings#25 and I'd like you to introduce the proposal there. Features such as buffer-sharing between WebNN and WebGPU have cross-group interest.

bbernhar

Resolving addressed comments.

mltensor-explainer.md

RafaelCintron

@a-sully thank you very much for putting this together.

mltensor-explainer.md

RafaelCintron · 2024-08-28T18:10:26Z

mltensor-explainer.md

+
+### Open Questions
+
+- How will errors be surfaced? Do we need a concept similar to [WebGPU's error scopes](https://www.w3.org/TR/webgpu/#error-scopes), or is [returning errors via a promise for select operations sufficient](https://github.com/webmachinelearning/webnn/issues/697#issuecomment-2195656878)? See [#477](https://github.com/webmachinelearning/webnn/issues/477)


WebGPU has error scopes because the API is very stateful and it's non-trivial to replicate all of the state checking in both the JS process and the GPU process. ErrorScopes can be challenging for web developers to use.

For WebNN, if implementations can efficiently check state in the JS process and throw exceptions on errors, we shouldn't need error scopes. For security, of course, the same state checking will also need to happen in the GPU process since a p000ned JS process can bypass error checking and send the GPU process whatever it wants.

In summary, in think we can leave ErrorScopes out of the spec for now and revisit if needed.

I agree it would be nice to avoid the complexity of ErrorScopes until there's a demonstrated need. I've left it out from this proposal because there are bigger challenges to solve and I think we can mostly get away without a cohesive error-reporting mechanism, at least for now

My (longer-term) concern is that we don't have a reliable way to surface errors from dispatch(). I don't think we can assume every failed dispatch() results in a lost MLContext, especially considering platforms where an MLContext is not so closely tied to a single GPU

mltensor-explainer.md

Kangz · 2024-09-02T12:10:05Z

mltensor-explainer.md

+// For WebGPU Interop
+
+interface GPUExternalBuffer {};
+GPUExternalBuffer includes GPUObjectBase;


Depending on how opaque the internal data is for WGSL shaders (see other comment) this wouldn't be an external buffer, but more like a GPUImportedMLTensor maybe? Buffer makes it sound like the layout would be transparent and the buffer could be seen as an array<u32> directly.

See my response on the other comment. If this object is indeed addressable like a generic GPUBuffer then is GPUExternalBuffer reasonable or would you prefer the name indicate that it came from WebNN (by using "ML" or "tensor")?

Updated to use array<T> so we can just import as a GPUBuffer. Longer-term, it would be nice to have some sort of GPUImportedTensor type to abstract away the layout of the buffer, but we'll start with this

mltensor-explainer.md

Kangz · 2024-09-02T12:14:02Z

mltensor-explainer.md

+  // Rent out the MLTensor to WebGPU.  
+  const tensorizedGpuBuffer = gpuDevice.importExternalBuffer(mlTensor1);
+
+  // Create a bind group for `gpuVideoTexture`, create a command encoder, etc.


Can you describe a bit more how WebGPU would use tensorizedGpuBuffer? Is it like a regular GPUBuffer that can be addressed linearly, and the application would have some formula to know where to find individual tensor elements (this is much preferred) or would the tensor be an opaque type in WGSL that an application would have to address with some builtin function (workable, but likely to make it harder to write efficient algorithms)?

Good question. Barring any unforeseen complications, I'm expecting the former: just a regular, linearly addressable GPUBuffer

In the short term, we can work around any complications by copying the tensor contents - which will be necessary in many cases regardless

mltensor-explainer.md

anssiko · 2024-09-02T13:51:59Z

@Kangz much thanks for your review and advice. We'll have a high-bandwidth discussion about this proposal at our TPAC F2F webmachinelearning/meetings#25 and your expertise would be of great use to the group in that session. You can join us in person or remotely (export your invite here). To narrow down on the timing, we'd get to this 23 Sep 2024 3-4pm PDT.

fdwr

Some comments, but looks good. It's pretty clear with all the examples and scenarios. The variable names are pleasantly clear (e.g. imageAsMlTensor, imageAsArrayBuffer). Thank you for writing this.

mltensor-explainer.md

bjjones · 2024-09-06T22:13:15Z

mltensor-explainer.md

+
+// For WebGPU Interop
+
+interface GPUExternalBuffer {};


What’s the benefit of having a new GPUExternalBuffer object? It doesn’t seem like GPUExternalBuffer represents something significantly different from a normal GPUBuffer. I think we should consider if a new type is really necessary - because if we could get away without a new type and just overload/restrict GPUBuffer it would save us the WebGPU and WGSL implementation work.

@Kangz do you have any opinions on this? (related to #754 (comment) and #754 (comment))

mltensor-explainer.md

As agreed upon in discussions on the MLTensor explainer: webmachinelearning/webnn#754 Bug: 361372446 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel Change-Id: I8bb993c1855c12eac68dc0f2ea359b5f1ae61932 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5858162 Commit-Queue: Austin Sullivan <asully@chromium.org> Reviewed-by: Reilly Grant <reillyg@chromium.org> Reviewed-by: Alex Gough <ajgo@chromium.org> Cr-Commit-Position: refs/heads/main@{#1355445}

… MLTensorUsageFlags, a=testonly Automatic update from web-platform-tests webnn: Rename READ[_FROM] and WRITE[_TO] MLTensorUsageFlags As agreed upon in discussions on the MLTensor explainer: webmachinelearning/webnn#754 Bug: 361372446 Cq-Include-Trybots: luci.chromium.try:win11-blink-rel Change-Id: I8bb993c1855c12eac68dc0f2ea359b5f1ae61932 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5858162 Commit-Queue: Austin Sullivan <asullychromium.org> Reviewed-by: Reilly Grant <reillygchromium.org> Reviewed-by: Alex Gough <ajgochromium.org> Cr-Commit-Position: refs/heads/main{#1355445} -- wpt-commits: 31f51a67c7adbf4467cef8b0e70fe48b0141feda wpt-pr: 48173 UltraBlame original commit: 63277736fd23a46e332bedb1979fdf109564f7ce

mltensor-explainer.md

Kangz · 2024-10-14T13:19:01Z

mltensor-explainer.md

+
+Any `MLTensor` created with the `MLTensorDescriptor.importableToWebGPU` flag may be imported into any `GPUDevice`. In the best case, this requires no data copies. If the underlying buffer backing the `MLTensor` is not accessible to the `GPUDevice`, this will require copying the contents of the `MLTensor` to a new buffer, then copying the contents of this buffer back to the `MLTensor` once WebGPU releases its handle to the buffer.
+
+While an `MLTensor` is rented to a `GPUDevice`, the `GPUDevice` has exclusive, read/write access to the imported buffer, which is created as a `GPUExternalBuffer` with `GPUBufferUsageFlags.STORAGE`. All WebNN work depending - directly or indirectly - on the imported `MLTensor` is blocked until the `GPUDevice` returns the tensor.


Maybe add COPY_SRC/COPY_DST for convenience as well? It should always be available without constraints.

SGTM, updated in the latest patch. @bbernhar @bjjones FYI

Kangz · 2024-10-14T13:19:40Z

mltensor-explainer.md

+
+Any `MLTensor` created with the `MLTensorDescriptor.importableToWebGPU` flag may be imported into any `GPUDevice`. In the best case, this requires no data copies. If the underlying buffer backing the `MLTensor` is not accessible to the `GPUDevice`, this will require copying the contents of the `MLTensor` to a new buffer, then copying the contents of this buffer back to the `MLTensor` once WebGPU releases its handle to the buffer.
+
+While an `MLTensor` is rented to a `GPUDevice`, the `GPUDevice` has exclusive, read/write access to the imported buffer, which is created as a `GPUExternalBuffer` with `GPUBufferUsageFlags.STORAGE`. All WebNN work depending - directly or indirectly - on the imported `MLTensor` is blocked until the `GPUDevice` returns the tensor.


Sorry if this has been answered earlier: is the size and layout of the buffer defined somewhere such that the WGSL code manipulating the buffer knows where to find each element?

Please correct me if I'm misunderstanding the question, but yes, I believe so. The size of the buffer is determined by its data type and shape. The layout is an implementation detail from the perspective of WebNN, but in practice the buffer will always be a type that GPU code (in the Chromium implementation, at least) understands. For example on Mac we'll import the buffer as an IOSurface (which can be imported as a shared image)

Ok I think we need more precision here. How would a user write a WGSL shader that manipulates a tensor when trying to for example implement custom operators? WebGPU sees storage buffers as a byte-addressed linear allocation, so the application needs to be aware of the layout of the tensor to address the allocation to find a specific element.

Added a snippet showing the WGSL declaration of the imported buffer. See the diff in the latest patch: 4af9354

pyu10055 · 2024-10-15T00:59:07Z

mltensor-explainer.md

+
+### Importing an `MLTensor` to WebGPU
+
+Any `MLTensor` created with the `MLTensorDescriptor.importableToWebGPU` flag may be imported into any `GPUDevice`. In the best case, this requires no data copies. If the underlying buffer backing the `MLTensor` is not accessible to the `GPUDevice`, this will require copying the contents of the `MLTensor` to a new buffer, then copying the contents of this buffer back to the `MLTensor` once WebGPU releases its handle to the buffer.


How would this handle cases where the device buffer strides are not aligned with the tensor dimensions? Or the buffer storage data type is not the same as the tensor data type, i.e. is packed with quantized valued (4 int8 within a int32).

How would this handle cases where the device buffer strides are not aligned with the tensor dimensions?

Hmm I would assume WGSL only cares about buffer strides, so the tensor dimensions would be irrelevant to WebGPU. Is there a scenario you have in mind?

Or the buffer storage data type is not the same as the tensor data type,

Good question. Ideally the importableToWebGPU flag can be used as a hint to allocate the tensor in an import-friendly fashion, but at the end of the day I expect we'll have to make a data copy in many scenarios, and possibly also accept that some MLTensors won't be importable to WebGPU at all. CoreML only supports creating an MLMultiArray with a handful of data types, for example, and only float16 buffers can be imported to WebGPU without copies. If it's not float16, a copy will be required on Mac

FWIW WebNN does not currently support packing, since no operators operate on packed data. That being said, folks at Intel are working on adding a dequantizeLinear op which aims to support "uint4" which will presumably require packing since there's no Int4Array in JS

As per the feedback on this thread on the MLTensor explainer PR: webmachinelearning/webnn#754 (comment) This CL includes logic to still support specifying the deprecated MLTensorUsage flags for now, though this logic will only exist for about a milestone to give callers the opportunity to migrate their existing code Bug: 343638938 Change-Id: I56209e68fde3920b8d6c781c8f804ac6fcd35c9a Cq-Include-Trybots: luci.chromium.try:mac14.arm64-blink-rel,win11-blink-rel Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5933323 Reviewed-by: ningxin hu <ningxin.hu@intel.com> Auto-Submit: Austin Sullivan <asully@chromium.org> Commit-Queue: ningxin hu <ningxin.hu@intel.com> Cr-Commit-Position: refs/heads/main@{#1370419}

…of boolean flags, a=testonly Automatic update from web-platform-tests webnn: Deprecate MLTensorUsage in favor of boolean flags As per the feedback on this thread on the MLTensor explainer PR: webmachinelearning/webnn#754 (comment) This CL includes logic to still support specifying the deprecated MLTensorUsage flags for now, though this logic will only exist for about a milestone to give callers the opportunity to migrate their existing code Bug: 343638938 Change-Id: I56209e68fde3920b8d6c781c8f804ac6fcd35c9a Cq-Include-Trybots: luci.chromium.try:mac14.arm64-blink-rel,win11-blink-rel Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5933323 Reviewed-by: ningxin hu <ningxin.hu@intel.com> Auto-Submit: Austin Sullivan <asully@chromium.org> Commit-Queue: ningxin hu <ningxin.hu@intel.com> Cr-Commit-Position: refs/heads/main@{#1370419} -- wpt-commits: ab3cd3a3748943c7ec96b7fdcc7b8e3acebe1396 wpt-pr: 48690

…of boolean flags, a=testonly Automatic update from web-platform-tests webnn: Deprecate MLTensorUsage in favor of boolean flags As per the feedback on this thread on the MLTensor explainer PR: webmachinelearning/webnn#754 (comment) This CL includes logic to still support specifying the deprecated MLTensorUsage flags for now, though this logic will only exist for about a milestone to give callers the opportunity to migrate their existing code Bug: 343638938 Change-Id: I56209e68fde3920b8d6c781c8f804ac6fcd35c9a Cq-Include-Trybots: luci.chromium.try:mac14.arm64-blink-rel,win11-blink-rel Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5933323 Reviewed-by: ningxin hu <ningxin.huintel.com> Auto-Submit: Austin Sullivan <asullychromium.org> Commit-Queue: ningxin hu <ningxin.huintel.com> Cr-Commit-Position: refs/heads/main{#1370419} -- wpt-commits: ab3cd3a3748943c7ec96b7fdcc7b8e3acebe1396 wpt-pr: 48690 UltraBlame original commit: 6f4fb11a34977957f99608befb8bd2ecaaec0153

…of boolean flags, a=testonly Automatic update from web-platform-tests webnn: Deprecate MLTensorUsage in favor of boolean flags As per the feedback on this thread on the MLTensor explainer PR: webmachinelearning/webnn#754 (comment) This CL includes logic to still support specifying the deprecated MLTensorUsage flags for now, though this logic will only exist for about a milestone to give callers the opportunity to migrate their existing code Bug: 343638938 Change-Id: I56209e68fde3920b8d6c781c8f804ac6fcd35c9a Cq-Include-Trybots: luci.chromium.try:mac14.arm64-blink-rel,win11-blink-rel Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5933323 Reviewed-by: ningxin hu <ningxin.hu@intel.com> Auto-Submit: Austin Sullivan <asully@chromium.org> Commit-Queue: ningxin hu <ningxin.hu@intel.com> Cr-Commit-Position: refs/heads/main@{#1370419} -- wpt-commits: ab3cd3a3748943c7ec96b7fdcc7b8e3acebe1396 wpt-pr: 48690

Kangz

WebGPU interop seems workable and LGTM

a-sully · 2024-10-25T15:44:19Z

WebGPU interop seems workable and LGTM

Thanks for the review!

@huningxin or @fdwr would you mind merging this PR? If other reviewers have further feedback, I'm happy to address it in follow-up changes!

fdwr

LGTM. Ningxin?

mltensor-explainer.md

huningxin

LGTM!

add MLTensor explainer

d996853

huningxin approved these changes Aug 21, 2024

View reviewed changes

mltensor-explainer.md Outdated Show resolved Hide resolved

mltensor-explainer.md Outdated Show resolved Hide resolved

use correct usages in Fibonacci example + mention a tensor copy is ac…

68929b9

…hievable with identify

huningxin approved these changes Aug 22, 2024

View reviewed changes

anssiko mentioned this pull request Aug 22, 2024

WebML WG - TPAC 2024 agenda webmachinelearning/meetings#25

Open

bbernhar suggested changes Aug 22, 2024

View reviewed changes

a-sully added 3 commits August 23, 2024 10:43

address bberhar feedback: part 1

c1a80ee

address bbernhar feedback: part 2

ac407d9

address bbernhar feedback: part 3

d3e2be5

anssiko requested review from RafaelCintron and fdwr August 26, 2024 14:25

bbernhar suggested changes Aug 26, 2024

View reviewed changes

mltensor-explainer.md Show resolved Hide resolved

mltensor-explainer.md Outdated Show resolved Hide resolved

mltensor-explainer.md Show resolved Hide resolved

mltensor-explainer.md Show resolved Hide resolved

RafaelCintron approved these changes Aug 28, 2024

View reviewed changes

anssiko added the webgpu interop label Aug 29, 2024

Kangz reviewed Sep 2, 2024

View reviewed changes

fdwr approved these changes Sep 4, 2024

View reviewed changes

bjjones reviewed Sep 6, 2024

View reviewed changes

make importExternalBuffer() async (among other changes)

5621975

a-sully mentioned this pull request Sep 9, 2024

Add "implementation consideration" about how out-of-bound indices of Gather/Scatter should be handled #486

Open

a-sully added 2 commits September 11, 2024 17:26

Use GPUExternalBuffer with STORAGE usage flag

c3f2e6b

s/sourceData/inputData

f381d21

a-sully force-pushed the mltensor-explainer branch from 1758c2f to f381d21 Compare September 12, 2024 00:27

remove open question about cross-GPU-device interop

e74f1aa

bbernhar mentioned this pull request Sep 12, 2024

Support building graphs from MLTensor containing constants #760

Open

chromium-wpt-export-bot mentioned this pull request Sep 13, 2024

webnn: Rename READ[_FROM] and WRITE[_TO] MLTensorUsageFlags web-platform-tests/wpt#48173

Merged

egalli mentioned this pull request Sep 21, 2024

[WebNN EP] Enable IO Bindings with MLTensor microsoft/onnxruntime#21301

Merged

bbernhar mentioned this pull request Sep 23, 2024

[MLBuffer] Support interop with WebGPU #688

Closed

a-sully mentioned this pull request Sep 23, 2024

Add MLBuffer exploration doc #541

Closed

domenic reviewed Sep 23, 2024

View reviewed changes

mltensor-explainer.md Outdated Show resolved Hide resolved

domenic reviewed Sep 23, 2024

View reviewed changes

mltensor-explainer.md Outdated Show resolved Hide resolved

a-sully added 2 commits October 1, 2024 11:25

address domenic feedback

9b274d2

inline the formerly-nested MLTensorUsage dict into MLTensorDescriptor

9a4fcb3

Kangz reviewed Oct 14, 2024

View reviewed changes

pyu10055 reviewed Oct 15, 2024

View reviewed changes

don't explicitly state that cross-GPU imports will be supported

8a603cf

chromium-wpt-export-bot mentioned this pull request Oct 18, 2024

webnn: Deprecate MLTensorUsage in favor of boolean flags web-platform-tests/wpt#48690

Merged

add WGSL code

4af9354

Kangz approved these changes Oct 25, 2024

View reviewed changes

fdwr approved these changes Oct 25, 2024

View reviewed changes

mltensor-explainer.md Outdated Show resolved Hide resolved

Typo GPUImportendTensor

ebbdf4b

huningxin approved these changes Oct 26, 2024

View reviewed changes

fdwr merged commit 9287c0b into webmachinelearning:main Oct 26, 2024
2 checks passed

a-sully deleted the mltensor-explainer branch October 26, 2024 02:50

a-sully mentioned this pull request Oct 29, 2024

Reconsider MLOperand methods #666

Closed

a-sully mentioned this pull request Nov 5, 2024

Proposal: Report non-fatal errors from the WebNN timeline #778

Open


		For example [an `MLContext` may be created with a `GPUDevice`](https://www.w3.org/TR/webnn/#dom-ml-createcontext-gpudevice), and creating an `MLTensor` from this context with the `MLTensorUsage.WEBGPU_INTEROP` flag expresses a clear intention to share the tensor with the given `GPUDevice`. However, there is no guarantee that sharing this tensor with WebGPU will be zero-copy.

		The `MLTensorUsage.READ_FROM` and `MLTensorUsage.WRITE_TO` flags likewise are hints to the user agent indicating that the underlying data will be read and written to, respectively, by script.


		### Timelines

		WebNN uses a programming model similar to [WebGPU's](https://www.w3.org/TR/webgpu/#programming-model), in that compute tasks are posted to a timeline - which I've referred to as an "ML context timeline" throughout this document - separate from the content timeline (i.e. "script"). See [the WebGPU documentation of timelines](https://gpuweb.github.io/gpuweb/#programming-model-timelines) for more details.


		### Open Questions

		- How will errors be surfaced? Do we need a concept similar to [WebGPU's error scopes](https://www.w3.org/TR/webgpu/#error-scopes), or is [returning errors via a promise for select operations sufficient](https://github.com/webmachinelearning/webnn/issues/697#issuecomment-2195656878)? See [#477](https://github.com/webmachinelearning/webnn/issues/477)


		Any `MLTensor` created with the `MLTensorDescriptor.importableToWebGPU` flag may be imported into any `GPUDevice`. In the best case, this requires no data copies. If the underlying buffer backing the `MLTensor` is not accessible to the `GPUDevice`, this will require copying the contents of the `MLTensor` to a new buffer, then copying the contents of this buffer back to the `MLTensor` once WebGPU releases its handle to the buffer.

		While an `MLTensor` is rented to a `GPUDevice`, the `GPUDevice` has exclusive, read/write access to the imported buffer, which is created as a `GPUExternalBuffer` with `GPUBufferUsageFlags.STORAGE`. All WebNN work depending - directly or indirectly - on the imported `MLTensor` is blocked until the `GPUDevice` returns the tensor.


		### Importing an `MLTensor` to WebGPU

		Any `MLTensor` created with the `MLTensorDescriptor.importableToWebGPU` flag may be imported into any `GPUDevice`. In the best case, this requires no data copies. If the underlying buffer backing the `MLTensor` is not accessible to the `GPUDevice`, this will require copying the contents of the `MLTensor` to a new buffer, then copying the contents of this buffer back to the `MLTensor` once WebGPU releases its handle to the buffer.

Add MLTensor explainer #754

Add MLTensor explainer #754

Conversation

a-sully commented Aug 20, 2024

huningxin left a comment

Choose a reason for hiding this comment

a-sully commented Aug 22, 2024

huningxin left a comment

Choose a reason for hiding this comment

bbernhar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bbernhar Sep 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a-sully commented Aug 23, 2024

anssiko commented Aug 26, 2024

bbernhar left a comment

Choose a reason for hiding this comment

RafaelCintron left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anssiko commented Sep 2, 2024

fdwr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a-sully Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Kangz left a comment

Choose a reason for hiding this comment

a-sully commented Oct 25, 2024

fdwr left a comment

Choose a reason for hiding this comment

huningxin left a comment

Choose a reason for hiding this comment

bbernhar Sep 6, 2024 •

edited

Loading

a-sully Sep 12, 2024 •

edited

Loading