Can an MLGraphBuilder be reused? #567

reillyeon · 2024-02-15T02:10:33Z

While it doesn't seem impossible for implementations to support building multiple MLGraph instances out of a single MLGraphBuilder instance (potentially reusing subsets of the graph by choosing different inputs and outputs) it seems like it adds complexity to implementations to support.

Are there any known use cases for MLGraphBuilder reuse?

The text was updated successfully, but these errors were encountered:

zolkis · 2024-02-15T08:49:31Z

Side info: the discussion in #303 on the relationship between context, builder and graph may be relevant, notably comment, comment, comment, and (mainly) comment.

When getting familiar with this spec, I also ran a few circles around use cases vs the relationships above, some of my doubts answered (e.g. in #302). While the current API shape does have its rationale, not all discussions have been closed, and we might want to record the relevant design decisions as well. I am following this up (on reusing builder for multiple graphs).

inexorabletash · 2024-03-13T21:23:44Z

Relevant tidbit:

If MLGraphBuilder can't be re-used, then input() can synchronously validate that input names are unique.
If MLGraphBuilder can be re-used, then at build() time the graph needs to be traversed to identify inputs connected to the outputs, and uniqueness validated then.

inexorabletash · 2024-03-13T23:12:10Z

Further observations:

Currently, since MLGraphBuilder can be re-used, then creating an input() operand that is not connected to any output is permitted - two disconnected graphs could be being constructed in parallel. If we change that, then there's a choice: is creating a disconnected input an error at build() time or not?
- If it's an error, then we help developers realize they've created unused operands.

- Adds validation that passed outputs are neither inputs nor constants, matching the Chromium implementation. - Traverses the graph, starting with outputs, to visit all connected operands. - Previously the outputs were iterated over to process inputs and constants, which didn't make any sense. Inputs are hooked up to [[inputsDescriptors]]. Nothing is specifically mentioned about constants, but an issue about transferring is referenced. (webmachinelearning#566) - The impact of MLGraphBuilder re-use (webmachinelearning#567) is called out, since it could allow for removing the graph traversal. - Populates graph's [[outputDescriptors]], which was previously just missing. (webmachinelearning#448) - Makes most of the validation behavior of build() happen synchronously rather than "in parallel". A promise is still returned, of course. - Converting the graph into an implementation-defined format is called out, which remains in "in parallel" steps. Fixes webmachinelearning#448, fixes webmachinelearning#457, fixes webmachinelearning#552.

fdwr · 2024-03-14T06:34:55Z

Are there any known use cases for MLGraphBuilder reuse?

I'm not sure. Certainly most of the time you have one model and one unique set of weights related to that model, but there may be interesting preprocessing cases that could benefit from a graphBuilder's constant reuse. One could load several model weights, preprocess part of the model eagerly (like folding LORA weights into SD via WebNN), then want to construct the full graph reusing the unchanged weights and the new weights, rather than resupply them again (which could reupload old ones to the device). I'm just hypothesizing though.

If MLGraphBuilder can't be re-used, then input() can synchronously validate that input names are unique.

That is a nice property to have.

creating an input() operand that is not connected to any output is permitted ... If it's an error, then we help developers realize they've created unused operands.

🤔 I can foresee cases where graphs are generated programmatically, and requiring all inputs to always be consumed could be an annoyance. I've definitely seen unused weight inputs in models before (arguably they should have been cleaned out to save space), and if you have apps that try to straight-forwardly map that model's contents to WebNN calls, it will be a bother for the app to add a lot of extra uniqueness book-keeping just to avoid an error that doesn't give the app much actionable information anyway - the app isn't going to modify the model and send email to the model's author that there was unused content in it 😉, and if after building all nodes for WebNN only to realize none of the nodes included that input, the app can't rewind and un-input the input.

Such diagnostic information would be more useful for people calling the API directly, but the audience for that is pretty slim, kinda like how for WebGPU when displaying a 3D model it's not expected that you would call WebGPU API's directly, but rather WebGPU is an implementation to display existing model assets, be they FBX, COLLADA DAE, Wavefront OBJ...

* Content: Define build() steps more rigorously - Adds validation that passed outputs are neither inputs nor constants, matching the Chromium implementation. - Traverses the graph, starting with outputs, to visit all connected operands. - Previously the outputs were iterated over to process inputs and constants, which didn't make any sense. Inputs are hooked up to [[inputsDescriptors]]. Nothing is specifically mentioned about constants, but an issue about transferring is referenced. (#566) - The impact of MLGraphBuilder re-use (#567) is called out, since it could allow for removing the graph traversal. - Populates graph's [[outputDescriptors]], which was previously just missing. (#448) - Makes most of the validation behavior of build() happen synchronously rather than "in parallel". A promise is still returned, of course. - Converting the graph into an implementation-defined format is called out, which remains in "in parallel" steps. Fixes #448, fixes #457, fixes #552. * Update index.bs Co-authored-by: Reilly Grant <reillyeon@users.noreply.github.com> * Build collection of input operands, and simplify steps * Populate operand and operator sets too --------- Co-authored-by: Reilly Grant <reillyeon@users.noreply.github.com>

huningxin · 2024-03-27T03:17:18Z

Are there any known use cases for MLGraphBuilder reuse?

There is another one I mentioned in #614 .

Today when ONNXRuntime WebNN EP inferencing transformer decoders, like Whisper, because WebNN doesn't support If operator, there will be two WebNN sub-graphs being built. One with past Key Value (KV) cache ("with_past") and another one without past KV cache ("no_past"). The inference code will run the "no_past" sub-graph for the first iteration and run the "with_past" sub-graph for the following iterations. The two sub-graphs actually share some common weights. It would be useful if the same weights being built by builder.constant() can be taken by the operators of two sub-graphs.

reillyeon · 2024-03-27T06:16:56Z

This use case makes sense but it assumes that implementations are able to optimize the case where multiple graphs share operands (or at least constants). The Chromium implementation so far is not capable of this but I can see ways in which it could be however it does somewhat depend on the underlying platform framework as well. This will have to be an area of future design exploration but given the potential it seems reasonable to at least continue to allow builder reuse even if implementations aren't yet taking advantage of the optimization potential.

One thing to consider however is whether an implementation could require all sub-graphs to be built at the same time rather than piecemeal. It seems optimal for implementations which have to convert to another format (e.g. a TFLite or Core ML model) to be able to do this once rather than needing to support building on additional graphs after the fact.

reillyeon · 2024-03-27T22:29:26Z

An additional thought I had while discussing this with @a-sully is that if an implementation did try to allow constants to be efficiently reused by multiple graphs but didn't require those graphs to be built at the same time then the builder would need to continue to own a copy of the constant data until it is freed instead of being able to pass ownership of that memory to the underlying platform. This could cause this attempt at memory optimization to backfire.

huningxin · 2024-03-29T01:12:29Z

@reillyeon

if an implementation did try to allow constants to be efficiently reused by multiple graphs but didn't require those graphs to be built at the same time then the builder would need to continue to own a copy of the constant data until it is freed instead of being able to pass ownership of that memory to the underlying platform.

If a builder allows to bind MLBuffer to a constant, constant MLBuffer proposal, would that solve this issue?

inexorabletash added the question label Feb 16, 2024

inexorabletash mentioned this issue Feb 29, 2024

Consider adding node labels for more diagnosable error messages for async errors. #585

Open

inexorabletash mentioned this issue Mar 13, 2024

Content: Define build() steps more rigorously #603

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can an MLGraphBuilder be reused? #567

Can an MLGraphBuilder be reused? #567

reillyeon commented Feb 15, 2024

zolkis commented Feb 15, 2024

inexorabletash commented Mar 13, 2024

inexorabletash commented Mar 13, 2024

fdwr commented Mar 14, 2024

huningxin commented Mar 27, 2024

reillyeon commented Mar 27, 2024

reillyeon commented Mar 27, 2024

huningxin commented Mar 29, 2024

Can an MLGraphBuilder be reused? #567

Can an MLGraphBuilder be reused? #567

Comments

reillyeon commented Feb 15, 2024

zolkis commented Feb 15, 2024

inexorabletash commented Mar 13, 2024

inexorabletash commented Mar 13, 2024

fdwr commented Mar 14, 2024

huningxin commented Mar 27, 2024

reillyeon commented Mar 27, 2024

reillyeon commented Mar 27, 2024

huningxin commented Mar 29, 2024