Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't transfer input ArrayBuffers #566

Open
reillyeon opened this issue Feb 15, 2024 · 8 comments · May be fixed by #700
Open

Don't transfer input ArrayBuffers #566

reillyeon opened this issue Feb 15, 2024 · 8 comments · May be fixed by #700
Labels

Comments

@reillyeon
Copy link
Contributor

Right now the input and output ArrayBuffers passed to MLContext's compute() method are transferred so that their contents cannot be modified by script during the computation. It seems appropriate that an ArrayBuffer passed to MLGraphBuilder's constant() method should be treated the same way, though it would be better if the transfer were delayed until build() is called so that multiple ArrayBufferViews using the same backing ArrayBuffer can be passed as constants. build() would construct a list of unique backing ArrayBuffers to transfer rather than the naive approach of simply iterating over the ArrayBufferViews themselves.

I'll note that transferring ArrayBuffer parameters makes an API harder to call from WebAssembly, but I think that any argument for why compute inputs and outputs are transferred should also apply to constants.

@huningxin
Copy link
Contributor

@reillyeon

but I think that any argument for why compute inputs and outputs are transferred should also apply to constants.

Agreed.

I'll note that transferring ArrayBuffer parameters makes an API harder to call from WebAssembly

This is true.

Should we consider copying the content of input (and constant) ArrayBuffer instead? Notice that the Chromium implementation actually copies the input and constant ArrayBuffers for XNNPACK backend and DirectML (mojo) backend. It means for each input ArrayBuffer, WebAssembly path may need two copies, one is done by JS that copies from Wasm heap to an ArrayBuffer that can be transferred, another one is done by Chromium implementation that copies from ArrayBuffer to internal storage.

@reillyeon
Copy link
Contributor Author

Given the constraints on implementations based on the underlying platform APIs I think it's reasonable to define constant and input handling such that MLGraphBuilder and compute() copy any provided ArrayBuffers. If developers want to avoid a copy they can use MLBuffers instead.

In theory Chromium's XNNPACK backend could avoid copies but I'm also planning to delete it in favor of a more flexible TFLite backend. The zero-copy option only works for CPU backends anyways (without MLBuffer).

@reillyeon reillyeon changed the title Transferring constant ArrayBuffers Don't transfer input ArrayBuffers Feb 29, 2024
@reillyeon
Copy link
Contributor Author

Renamed this issue given my current position on how to resolve the inconsistency between handling of constant and input ArrayBuffers.

inexorabletash added a commit to inexorabletash/webnn that referenced this issue Mar 13, 2024
- Adds validation that passed outputs are neither inputs nor
  constants, matching the Chromium implementation.

- Traverses the graph, starting with outputs, to visit all connected
  operands.

- Previously the outputs were iterated over to process inputs and
  constants, which didn't make any sense. Inputs are hooked up to
  [[inputsDescriptors]]. Nothing is specifically mentioned about
  constants, but an issue about transferring is referenced. (webmachinelearning#566)

- The impact of MLGraphBuilder re-use (webmachinelearning#567) is called out, since it
  could allow for removing the graph traversal.

- Populates graph's [[outputDescriptors]], which was previously just
  missing. (webmachinelearning#448)

- Makes most of the validation behavior of build() happen
  synchronously rather than "in parallel". A promise is still
  returned, of course.

- Converting the graph into an implementation-defined format is called
  out, which remains in "in parallel" steps.

Fixes webmachinelearning#448, fixes webmachinelearning#457, fixes webmachinelearning#552.
fdwr pushed a commit that referenced this issue Mar 16, 2024
* Content: Define build() steps more rigorously

- Adds validation that passed outputs are neither inputs nor
  constants, matching the Chromium implementation.

- Traverses the graph, starting with outputs, to visit all connected
  operands.

- Previously the outputs were iterated over to process inputs and
  constants, which didn't make any sense. Inputs are hooked up to
  [[inputsDescriptors]]. Nothing is specifically mentioned about
  constants, but an issue about transferring is referenced. (#566)

- The impact of MLGraphBuilder re-use (#567) is called out, since it
  could allow for removing the graph traversal.

- Populates graph's [[outputDescriptors]], which was previously just
  missing. (#448)

- Makes most of the validation behavior of build() happen
  synchronously rather than "in parallel". A promise is still
  returned, of course.

- Converting the graph into an implementation-defined format is called
  out, which remains in "in parallel" steps.

Fixes #448, fixes #457, fixes #552.

* Update index.bs

Co-authored-by: Reilly Grant <reillyeon@users.noreply.github.com>

* Build collection of input operands, and simplify steps

* Populate operand and operator sets too

---------

Co-authored-by: Reilly Grant <reillyeon@users.noreply.github.com>
@a-sully a-sully linked a pull request Jun 3, 2024 that will close this issue
@a-sully
Copy link
Contributor

a-sully commented Jun 3, 2024

#700 naively addresses this issue by copying (rather than transferring) inputs. This is a bit awkward, though:

  1. MLComputeResult contains the copied inputs. This is completely superfluous
  2. It's a bit odd that the inputs - but not the outputs - are transferred
  3. Should we bother passing outputs to compute() in the first place? As @reillyeon mentioned above, if the developer wants zero/minimal-copy, they should use MLBuffer. Passing outputs doesn't actually reduce copies in the Chromium implementation anyways

Thoughts on these changes?

dictionary MLComputeResult {
-  MLNamedArrayBufferViews inputs;
  MLNamedArrayBufferViews outputs;
};

-Promise<MLComputeResult> compute(MLGraph graph, MLNamedArrayBufferViews inputs, MLNamedArrayBufferViews outputs);
+Promise<MLComputeResult> compute(MLGraph graph, MLNamedArrayBufferViews inputs);

Note that we'd like to deprecate compute() for MLBuffer + dispatch() at some point so perhaps this is not worth much discussion... Though MLBuffer isn't specified at all yet, so that may be a ways off :)

@zolkis
Copy link
Collaborator

zolkis commented Jun 3, 2024

(moved the comment here from the PR)
For me copying is fine, but also asking @inexorabletash: in general for the web platform, should we express more concisely the use case (policy) behind this, i.e. the expected immutability of inputs (transfer of ownership) and that copying is just a way (mechanism) to ensure that... we'd rather want something like "lock input" steps (choose a better name) - in most cases implemented (even spec'd) as copying the input. I understand in this case it will always end up being a copy, but (coming from RTOS times) something hurts inside :).

@inexorabletash
Copy link
Member

The closest I've found (thanks to Webdex) are these annotated invocations:

The one in Encoding is probably the closest to what you're looking for:

  • Normative spec text just says copy, i.e. implementations can do anything, but it must be indistinguishable from a copy
  • Informative note encouraging implementations to optimize if possible
  • Warning about security hazards of the data mutating

I think the equivalent here would be to add an informative note to the call sites (in #700, "Let copiedInputs be the result of copying MLNamedArrayBufferViews inputs ...") describing why a copy is made. I think this is helpful for reviewers, future editors, and implementers to understand why it is specified this way. This could be written as just NOTE: A copy is performed because... Implementations are encouraged to... or perhaps <details class=note><summary>Why a copy?</summary>A copy is performed because...</details>

@a-sully
Copy link
Contributor

a-sully commented Jun 4, 2024

Thanks for the input, @inexorabletash. Adding a NOTE to #700 SGTM

Thoughts on these changes?

dictionary MLComputeResult {
-  MLNamedArrayBufferViews inputs;
  MLNamedArrayBufferViews outputs;
};

-Promise<MLComputeResult> compute(MLGraph graph, MLNamedArrayBufferViews inputs, MLNamedArrayBufferViews outputs);
+Promise<MLComputeResult> compute(MLGraph graph, MLNamedArrayBufferViews inputs);

Any objections to expanding the scope of this issue to the above diff? i.e. Don't transfer input ArrayBuffers, and return output ArrayBuffers rather than passing them to compute()

@reillyeon
Copy link
Contributor Author

Accepting a typed array to write the result into (or "BYOB") is a valuable pattern for reducing memory allocation overhead (and thus garbage collection overhead).

I don't think it's worth changing the interface here but we should consider providing a BYOB version of readBuffer() which accepts a typed array to copy the contents of the MLBuffer into. By specifying that the typed array is written to on the ML task source I think we can avoid the need to transfer the buffer since script won't have concurrent access to its contents (anything written to the buffer before the copy is performed will be overwritten).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants