Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can an MLGraphBuilder be reused? #567

Open
reillyeon opened this issue Feb 15, 2024 · 8 comments
Open

Can an MLGraphBuilder be reused? #567

reillyeon opened this issue Feb 15, 2024 · 8 comments
Labels

Comments

@reillyeon
Copy link
Contributor

While it doesn't seem impossible for implementations to support building multiple MLGraph instances out of a single MLGraphBuilder instance (potentially reusing subsets of the graph by choosing different inputs and outputs) it seems like it adds complexity to implementations to support.

Are there any known use cases for MLGraphBuilder reuse?

@zolkis
Copy link
Collaborator

zolkis commented Feb 15, 2024

Side info: the discussion in #303 on the relationship between context, builder and graph may be relevant, notably comment, comment, comment, and (mainly) comment.

When getting familiar with this spec, I also ran a few circles around use cases vs the relationships above, some of my doubts answered (e.g. in #302). While the current API shape does have its rationale, not all discussions have been closed, and we might want to record the relevant design decisions as well. I am following this up (on reusing builder for multiple graphs).

@inexorabletash
Copy link
Contributor

Relevant tidbit:

  • If MLGraphBuilder can't be re-used, then input() can synchronously validate that input names are unique.
  • If MLGraphBuilder can be re-used, then at build() time the graph needs to be traversed to identify inputs connected to the outputs, and uniqueness validated then.

@inexorabletash
Copy link
Contributor

Further observations:

  • Currently, since MLGraphBuilder can be re-used, then creating an input() operand that is not connected to any output is permitted - two disconnected graphs could be being constructed in parallel. If we change that, then there's a choice: is creating a disconnected input an error at build() time or not?
    • If it's an error, then we help developers realize they've created unused operands.

inexorabletash added a commit to inexorabletash/webnn that referenced this issue Mar 13, 2024
- Adds validation that passed outputs are neither inputs nor
  constants, matching the Chromium implementation.

- Traverses the graph, starting with outputs, to visit all connected
  operands.

- Previously the outputs were iterated over to process inputs and
  constants, which didn't make any sense. Inputs are hooked up to
  [[inputsDescriptors]]. Nothing is specifically mentioned about
  constants, but an issue about transferring is referenced. (webmachinelearning#566)

- The impact of MLGraphBuilder re-use (webmachinelearning#567) is called out, since it
  could allow for removing the graph traversal.

- Populates graph's [[outputDescriptors]], which was previously just
  missing. (webmachinelearning#448)

- Makes most of the validation behavior of build() happen
  synchronously rather than "in parallel". A promise is still
  returned, of course.

- Converting the graph into an implementation-defined format is called
  out, which remains in "in parallel" steps.

Fixes webmachinelearning#448, fixes webmachinelearning#457, fixes webmachinelearning#552.
@fdwr
Copy link
Collaborator

fdwr commented Mar 14, 2024

Are there any known use cases for MLGraphBuilder reuse?

I'm not sure. Certainly most of the time you have one model and one unique set of weights related to that model, but there may be interesting preprocessing cases that could benefit from a graphBuilder's constant reuse. One could load several model weights, preprocess part of the model eagerly (like folding LORA weights into SD via WebNN), then want to construct the full graph reusing the unchanged weights and the new weights, rather than resupply them again (which could reupload old ones to the device). I'm just hypothesizing though.

If MLGraphBuilder can't be re-used, then input() can synchronously validate that input names are unique.

That is a nice property to have.

creating an input() operand that is not connected to any output is permitted ... If it's an error, then we help developers realize they've created unused operands.

🤔 I can foresee cases where graphs are generated programmatically, and requiring all inputs to always be consumed could be an annoyance. I've definitely seen unused weight inputs in models before (arguably they should have been cleaned out to save space), and if you have apps that try to straight-forwardly map that model's contents to WebNN calls, it will be a bother for the app to add a lot of extra uniqueness book-keeping just to avoid an error that doesn't give the app much actionable information anyway - the app isn't going to modify the model and send email to the model's author that there was unused content in it 😉, and if after building all nodes for WebNN only to realize none of the nodes included that input, the app can't rewind and un-input the input.

Such diagnostic information would be more useful for people calling the API directly, but the audience for that is pretty slim, kinda like how for WebGPU when displaying a 3D model it's not expected that you would call WebGPU API's directly, but rather WebGPU is an implementation to display existing model assets, be they FBX, COLLADA DAE, Wavefront OBJ...

fdwr pushed a commit that referenced this issue Mar 16, 2024
* Content: Define build() steps more rigorously

- Adds validation that passed outputs are neither inputs nor
  constants, matching the Chromium implementation.

- Traverses the graph, starting with outputs, to visit all connected
  operands.

- Previously the outputs were iterated over to process inputs and
  constants, which didn't make any sense. Inputs are hooked up to
  [[inputsDescriptors]]. Nothing is specifically mentioned about
  constants, but an issue about transferring is referenced. (#566)

- The impact of MLGraphBuilder re-use (#567) is called out, since it
  could allow for removing the graph traversal.

- Populates graph's [[outputDescriptors]], which was previously just
  missing. (#448)

- Makes most of the validation behavior of build() happen
  synchronously rather than "in parallel". A promise is still
  returned, of course.

- Converting the graph into an implementation-defined format is called
  out, which remains in "in parallel" steps.

Fixes #448, fixes #457, fixes #552.

* Update index.bs

Co-authored-by: Reilly Grant <reillyeon@users.noreply.github.com>

* Build collection of input operands, and simplify steps

* Populate operand and operator sets too

---------

Co-authored-by: Reilly Grant <reillyeon@users.noreply.github.com>
@huningxin
Copy link
Contributor

Are there any known use cases for MLGraphBuilder reuse?

There is another one I mentioned in #614 .

Today when ONNXRuntime WebNN EP inferencing transformer decoders, like Whisper, because WebNN doesn't support If operator, there will be two WebNN sub-graphs being built. One with past Key Value (KV) cache ("with_past") and another one without past KV cache ("no_past"). The inference code will run the "no_past" sub-graph for the first iteration and run the "with_past" sub-graph for the following iterations. The two sub-graphs actually share some common weights. It would be useful if the same weights being built by builder.constant() can be taken by the operators of two sub-graphs.

@reillyeon
Copy link
Contributor Author

This use case makes sense but it assumes that implementations are able to optimize the case where multiple graphs share operands (or at least constants). The Chromium implementation so far is not capable of this but I can see ways in which it could be however it does somewhat depend on the underlying platform framework as well. This will have to be an area of future design exploration but given the potential it seems reasonable to at least continue to allow builder reuse even if implementations aren't yet taking advantage of the optimization potential.

One thing to consider however is whether an implementation could require all sub-graphs to be built at the same time rather than piecemeal. It seems optimal for implementations which have to convert to another format (e.g. a TFLite or Core ML model) to be able to do this once rather than needing to support building on additional graphs after the fact.

@reillyeon
Copy link
Contributor Author

An additional thought I had while discussing this with @a-sully is that if an implementation did try to allow constants to be efficiently reused by multiple graphs but didn't require those graphs to be built at the same time then the builder would need to continue to own a copy of the constant data until it is freed instead of being able to pass ownership of that memory to the underlying platform. This could cause this attempt at memory optimization to backfire.

@huningxin
Copy link
Contributor

@reillyeon

if an implementation did try to allow constants to be efficiently reused by multiple graphs but didn't require those graphs to be built at the same time then the builder would need to continue to own a copy of the constant data until it is freed instead of being able to pass ownership of that memory to the underlying platform.

If a builder allows to bind MLBuffer to a constant, constant MLBuffer proposal, would that solve this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants