Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define graph execution methods used in different threading models. #255

Closed
wants to merge 2 commits into from

Conversation

wchao1115
Copy link
Collaborator

@wchao1115 wchao1115 commented Feb 26, 2022

Based on the discussion in issue #230, we define three methods of execution for the compiled graph as follow:

Immediate Execution through the MLExecution interface:

The MLExecution interface represents a method of execution that synchronously carries out the computational workload of a compiled graph MLGraph on the calling thread, which must be a worker thread, to produce results as defined by the operations in the graph. This method of execution requires an MLContext created with MLContextOptions with the MLDevicePreference option set to either "cpu" or "default" resolved to a CPU context.

Async Execution through the MLAwaitedExecution interface:

The MLAwaitedExecution interface represents a method of execution that asynchronously carries out the computational workload of a compiled graph MLGraph on a worker thread to avoid blocking the calling thread while producing results as defined by the operations in the graph. This method of execution requires an MLContext created with MLContextOptions or WebGLRenderingContext.

Queued Execution through the MLCommandEncoder interface:

The MLCommandEncoder interface represents a method of execution that synchronously records the computational workload of a compiled graph MLGraph to a GPU command buffer GPUCommandBuffer on the calling thread. Since the workload is not immediately executed, just recorded, this method allows more flexibility for the caller to determine how and when the recorded commands will be submitted for execution on the GPU relative to other GPU workload on the same queue. This method of execution requires an MLContext created with GPUDevice.


Preview | Diff

@dontcallmedom
Copy link
Contributor

I may be confused, but how would a developer know that their MLContext defaults have resulted in a CPU vs GPU, and in a WebGL vs WebGPU outcome? That's key to them picking the right executor.

Generally speaking, branching on the type of the processing unit and support for WebGPU significantly raises the complexity of coding against the API; maybe that's acceptable given our targets (framework developers), but it's probably worth confirming it.

Bikeshedding - to align with the existing pattern in the file API of FileReader (async) and FileReaderSync (sync, available in worker only), I think we should use MLExecutionSync and MLExecution (or more explicitly MLExecutionAsync?); and maybe Executor rather than Execution?

@wchao1115
Copy link
Collaborator Author

@dontcallmedom The assumption here is that the caller choosing the graph execution method is also responsible for how the MLContext is created. The discussion #230 exposes a new requirement that WebNN must support multiple threading models for its graph execution depending on the circumstances around how the graph should be processed in relation to the rest of the activities in the framework deploying it.

For instance, if a context is created specifically for the CPU, the graph can be executed either synchronously (via MLExecution) or asynchronously (via MLAwaitedExecution) but the caller must choose which one to go with, while if the context is created by a WebGPU device, the graph can't be executed directly either synchronously or asynchronously, but rather its workload must be recorded in a command buffer and yield to the app. Only this way the ML workload can then be truly managed alongside the rest of the graphics payload going to the GPU.

index.bs Outdated Show resolved Hide resolved
The {{MLAwaitedExecution}} interface represents a method of execution that asynchronously carries out the computational workload of a compiled graph {{MLGraph}} on a worker thread to avoid blocking the calling thread while producing results as defined by the operations in the graph. This method of execution requires an {{MLContext}} created with {{MLContextOptions}} or {{WebGLRenderingContext}}.

<script type=idl>
typedef (MLBufferView or WebGLTexture or GPUTexture) MLResource;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose MLResource should not support WebGPU resources e.g.,GPUBuffer (of MLBufferView) and GPUTexture.

Copy link
Contributor

@huningxin huningxin Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should MLResource be limited to just ArrayBufferView? It could be just an async version of MLExecution.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that async execution of a GPU context would always automatically upload/download the result to/from the CPU memory? I can see that. How will async execution be used by TF.js in the readback scenario?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that async execution of a GPU context would always automatically upload/download the result to/from the CPU memory?

Yes.

How will async execution be used by TF.js in the readback scenario?

The TF.js has Tensor::data() method which "asynchronously downloads the values".

The {{MLExecution}} interface represents a way the execution of the graph is carried out immediately on the
calling thread, which must also be a worker thread. The execution is carried out by the {{MLExecution/compute()}},
method which produces the results of the computation from all the inputs bound to graph on the bound outputs.
This type of execution is limited to only when the computational device bound to the ML context is a CPU.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it should support GPU as well. It would allow the caller who uses sync API could also access GPU.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my original thought, but @RafaelCintron believes that we should not allow sync execution on a GPU context, because it means you need to block on the calling thread until the GPU executes and produces the results in the output tensors, which could create jarring user experience.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is restricted to DedicatedWorker, I'm not sure I understand where the risk with blocking on a GPU execution would be. Having it applicable to GPU contexts would help with the concern I raised

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which could create jarring user experience.

Because it is limited in the worker, I suppose the UI thread (main thread) won't be blocked in this way.

interface MLCommandEncoder {
constructor(MLContext context);

undefined initializeGraph(MLGraph graph, MLNamedGPUInputs inputs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this method be named as initialize? MLGraph is the first argument.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should MLExecution and MLAwaitedExecution expose initialize method as well? With that, we can remove the MLGraphBuilder::constant and move all constant buffers binding to specialized execution interfaces.

Copy link
Collaborator Author

@wchao1115 wchao1115 Mar 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally, an initialize method on an interface implies that the operation initializes the host interface object, which makes the host object stateful. I explicitly named this method initializeGraph because the operation would initialize the graph input param to the method, and not the host object itself i.e. it implies that the graph is stateful, and not the encoder.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should MLExecution and MLAwaitedExecution expose initialize

MLExecution is CPU only, and so there isn't a need to initialize and pre-process the weights i.e. graph constants and inputs are treated the same way.

MLAwaitedExecution when used for a GPU context, will need to initialize and pre-process the weights before the first inference call. But that can be done without defining the initializeGraph method because weights are simply constant operands bound to the graph at construction time. The implementation can already distinguish the weight constants from the rest of the named inputs, so initializing them could technically be done before the first inference call (i.e. compute).

The real reason that only MLCommandEncoder will need a separate initializeGraph method is that unlike MLExecution and MLAwaitedExecution, the command encoder does not actually carry out the execution of the compute workload. It merely records the workload onto the command buffer on behalf of the caller. This means that it is the caller's responsibility to track the weight constant tensors, and make sure that constant initialization workload is recorded separately by the initializeGraph call.

Another way to think about it is that the command encoder is a lower level API than the execution API. And, while the execution API can "cache" the constant's initialization result before making the first inference call to compute the output result, the encoder API cannot, it can only record the act of initialization, as a workload separated from the inference workload captured by the subsequent dispatch call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A CPU backend may also need to pre-process the weights, e.g., conversion to a SIMD instruction optimized block memory layout. And as the discussion above, I would suggest MLExecution supports both CPU and GPU.

The graph initialization, e.g., constants pre-processing, may be time consuming. Doing it in sync MLGraphBuilder::build may block the main thread. Hosting a sync version in MLExecution and an async version in MLAwaitedExecution may be a solution.

Regarding to the resource type, the initialize of MLExecution and MLAwaitedExecution could only accept ArrayBufferView, while initializeGraph of MLCommandEncorder only accepts GPUBuffer. This would align with the execution methods respectively (with my suggestion in comment) and simplify the usage.

In this way, the graph building code could be shared for different execution methods. Otherwise, the callers need to call MLGraphBuilder::constant to bind constant to a graph for MLExecution and MLAwaitedExecution, but call MLGraphBuilder::input and bind the constant in MLCommandEncoder::initializeGraph. That leads to different graph building code.

index.bs Show resolved Hide resolved
interface MLAwaitedExecution {
constructor(MLContext context);

Promise<MLNamedOutputs> compute(MLGraph graph, MLNamedInputs inputs, MLNamedOutputs outputs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it just return Promise<void>?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that Promise<Return-Type> is a semantic used for a deferred async result on the return value. In this case, the result will be placed on the outputs param, which should also be the return value on the return type. Let me know if I get it wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Promise<Return-Type> means that a developer would write code à la

const outputs = await executor.compute(graph, inputs, namedoutputs);

I agree with @huningxin that this feels wrong, since the expectation is instead that namedoutputs will be modified in place by the compute method - having the same object also returns via the promise is at best confusing.

@dontcallmedom
Copy link
Contributor

dontcallmedom commented Mar 2, 2022

@wchao1115 I understood the model you propose that way. My question is - imagine you create a default MLContext with:

// Default context created
const context = navigator.ml.createContext();

That means the developer leaves entire control of where the model will be computed to the user agent; the developer has no way to know whether the resulting context implies the sync approach (MLExecution), the async one (MLAwaitedExecution), or the WebGPU one (MLCommandEncoder), so there is no way to write code that triggers the right execution.

Presumably, this could be surfaced somehow on the MLContext object, but isn't at this stage.

Even if it gets surfaced, it still means that WebNN users would have to maintain 3 code pathways to deal with the general case of executing a graph (i.e. one where the developer wants to leave the choice of the context to the user agent); 3 pathways means that many opportunities for additional bugs, that much additional need for testing, etc - that may very well be OK, but I think we should check whether this is an acceptable trade-off.

@wchao1115
Copy link
Collaborator Author

That means the developer leaves entire control of where the model will be computed to the user agent; the developer has no way to know whether the resulting context implies the sync approach (MLExecution), the async one (MLAwaitedExecution), or the WebGPU one (MLCommandEncoder)

In the default case, the caller can use either MLExecution or MLAwaitedExecution. This is already spelled out in the definition of each of these two execution methods. You can't use MLCommandEncoder because in its definition, it says:

This method of execution requires an MLContext created with GPUDevice.

@dontcallmedom
Copy link
Contributor

In the default case, the caller can use either MLExecution or MLAwaitedExecution. This is already spelled out in the definition of each of these two execution methods. You can't use MLCommandEncoder because in its definition, it says:

This method of execution requires an MLContext created with GPUDevice.

OK, I'm starting to understand more clearly the intent here:

  • createContext(GPUDevice) ⇒ MLCommandEncoder
  • createContext(!GPUDevice) && window ⇒ MLAwaitedExecution
  • createContext(!GPUDevice) && worker && cpu ⇒ (MLAwaitedExecution | MLExecution)
  • createContext(!GPUDevice) && worker && !cpu ⇒ MLAwaitedExecution

If that's indeed the model, I think it needs to be surfaced in createContext, not just under the various execution interfaces; the execution interfaces also need to trigger errors when they're called with the wrong context.

And I still don't understand with the default context type how a developer would know whether they can or cannot use MLExecution in a worker, since I don't know how they can determine whether « "default" [was] resolved to a CPU context ».

Generally speaking, I am worried that the generic interface MLContext can be used as parameter to this or that constructor depending on a state that isn't directly exposed to the developer. Imagine a scenario where the developer is delegating creation of the MLContext to a 3rd party library, the information about how the MLContext has been created will also need to be passed back, which feels awkward.

This type of execution is limited to only when the computational device bound to the ML context is a CPU.

The {{MLAwaitedExecution}} interface represents a way the execution of the graph is performed asynchronously
on a separate worker thread. The call to the {{MLAwaitedExecution/compute()}} method returns instantly without
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
on a separate worker thread. The call to the {{MLAwaitedExecution/compute()}} method returns instantly without
in the background. The call to the {{MLAwaitedExecution/compute()}} method returns instantly without

or maybe just say "separate thread" (but "worker thread" makes it feel that it will be run in a developer-exposed worker)

on a separate worker thread. The call to the {{MLAwaitedExecution/compute()}} method returns instantly without
blocking the calling thread. This execution method is appropriate when the responsiveness of the calling thread
is critical to good user experience. The computation results will be placed at the bound outputs at the time
the operation is completed on a worker thread at which time the calling thread is signaled. This type of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the operation is completed on a worker thread at which time the calling thread is signaled. This type of
the operation is completed in the background at which time the calling thread is signaled. This type of

@wchao1115
Copy link
Collaborator Author

wchao1115 commented Mar 3, 2022

If that's indeed the model, I think it needs to be surfaced in createContext, not just under the various execution interfaces;

Thanks @dontcallmedom. There is an alternative design discussed in the issue #230, which I think is what you alluded to here. Let me try to implement that in a PR, and see which one is more palatable. The entire matrix of required use cases is complex and so there may not be a "perfect" design in this case. But maybe we can try to pick the least bad one.

I must point out though that this scenario:

Imagine a scenario where the developer is delegating creation of the MLContext to a 3rd party libra

Is not something we aim to support i.e. callers that use WebNN to create and execute the compiled ML graph also has the responsibility for context creation. The act of executing or recording the execution of the models is highly dependent on the type of context and the underlying device used to create and execute them.

@wchao1115
Copy link
Collaborator Author

There is an alternative design discussed in the issue #230,

@dontcallmedom Please take a look at an alternative design at PR #257

@bbernhar
Copy link

Is the intent of MLCommandEncoder to be also interopable with WebGPU? If so, how will MLCommandEncoder work on graphics APIs (like Vulkan) that do not natively support ML?

WebNN does not necessarily need to introduce WebGPU's device execution model (deferred vs immediate) in order to execute ML work on the GPU asynchronously. Before "command encoders", the context itself would encode and schedule GPU work by generating a (internal) single command-buffer, each on a separate thread/worker, to effectively allow simultaneous encoding. The control/flexibility of explicit ML command encoders (deferred) does not seem useful if the WebNN developer can't broadly share it with WebGPU.

@wchao1115
Copy link
Collaborator Author

@bbernhar Maybe I misunderstood your previous comment. If the WebNN context were to record its ML workloads onto an internal command buffer, who is responsible for executing that buffer? If that's also the responsibility of the context, then is the execution a blocking operation? The purpose of the MLCommandEncoder is to produce the command buffer with the ML workloads in it. Its caller is free to execute the command buffer whenever is appropriate using the WebGPU API, in however way it wants.

@bbernhar
Copy link

@bbernhar Maybe I misunderstood your previous comment. If the WebNN context were to record its ML workloads onto an internal command buffer, who is responsible for executing that buffer?

Either MLContext enqueues a (explicit or implicit) encoded ML command-buffer into WebGPU's shared queue and GPUQueue.Submit executes it indirectly OR MLContext directly executes the encoded ML command-buffer using WebGPU's shared queue. Both submit work to the GPU asynchronously, as it rarely makes sense to do otherwise (ie. CPU stalls GPU).

Notice the use of an implicit ML command-buffer allows for a simple interop model of "queue sharing" (vs sharing command-lists), avoiding the need for WebGPU to broadly support non-native concepts like MLCommandEncoder.

Hope this helps.

@wchao1115
Copy link
Collaborator Author

@bbernhar @RafaelCintron Following our discussion, should we go ahead and close this discussion around MLCommandEncoder topic in this PR (and the relevant #257)?

@huningxin
Copy link
Contributor

huningxin commented Mar 24, 2022

@bbernhar

Either MLContext enqueues a (explicit or implicit) encoded ML command-buffer into WebGPU's shared queue and GPUQueue.Submit executes it indirectly OR MLContext directly executes the encoded ML command-buffer using WebGPU's shared queue. Both submit work to the GPU asynchronously, as it rarely makes sense to do otherwise (ie. CPU stalls GPU).

I'd like to understand how resource sharing and execution ordering happen in this design.

As the sample of WebGPU/WebNN background blur for the full-GPU-only video processing use case, it involves dispatching compute shaders and ML graph and sharing GPUBuffers between them. The processing steps include:

  1. The first compute shader converts the input texture into input tensor GPUBuffer.
  2. The ML graph consumes the input tensor GPUBuffer, executes the semantic segmentation and produces the results into the output tensor GPUBuffer (segmap).
  3. The second compute shader takes the input texture, the blurred texture (produced by another compute shader) and the segmap GPUBuffer as inputs, produces the background blurred image into the output texture.

The pseudo code looks like:

const commandEncoder = device.createCommandEncoder();

// Convert inputTexture to inputTensor
let computePass = commandEncoder.beginComputePass();
computePass.setPipeline(preprocessingPipeline);
computePass.setBindGroup();
computePass.dispatch();
computePass.end();

// ML graph segments background from inputTensor to outputTensor
// Should it record the command into commandEncoder? like
mlGraph.record(commandEncoder, inputBuffer, outputBuffer); 

// Blend inputTexture to outputTexture based on outputTensor
let computePass = commandEncoder.beginComputePass();
computePass.setPipeline(postprocessingPipeline);
computePass.setBindGroup();
computePass.dispatch();
computePass.end();

// Render the outputTexture onto canvas
const passEncoder = commandEncoder.beginRenderPass();
passEncoder.setPipeline(fullscreenQuadPipeline);
passEncoder.setBindGroup();
passEncoder.draw(6, 1, 0, 0);
passEncoder.end();

device.queue.submit([commandEncoder.finish()]);

@bbernhar
Copy link

bbernhar commented Mar 24, 2022

@huningxin

If we do the "execute WebGPU indirect" method:

const commandEncoder = device.createCommandEncoder();

let computePass1 = commandEncoder1.beginComputePass();
...
computePass1.dispatch();
computePass1.end();

mlcontext = navigator.ml.createContext(device)
let graph = /* Build graph */
mlcommandBuffer= mlcontext.createCommandBuffer(graph, inputBuffer, outputBuffer)

let computePass2 = commandEncoder2.beginComputePass();
...
computePass2.dispatch();
computePass2.end();

const renderPass1 = commandEncoder2.beginRenderPass();
...
renderPass1.draw(...);
renderPass1.end();

mlcontext.Submit(queue, [commandEncoder1.Finish(), mlcommandBuffer, commandEncoder2.Finish()])

MLContext would generate it's own (internal) GPU command buffer, encoded with ML work, then effectively calls queue.Submit() implicitly (or internally) with the (optional) WebGPU command buffer(s) AND the generated ML command buffer (this is the execution order). Resource sharing would work just like WebGPU interop (ie. external buffers/textures) would be imported/exported.

WDYT?

@huningxin
Copy link
Contributor

@bbernhar

mlcommandBuffer= mlcontext.getCommandBuffer(graph, inputBuffer, outputBuffer)

Get (create) a GPUCommandBuffer from a MLGraph is a brilliant idea. I like it.

My question is whether we can leverage GPUQueue to submit it, like

device.queue.submit([commandEncoder1.Finish(), mlcommandBuffer, commandEncoder2.Finish()])

@bbernhar
Copy link

bbernhar commented Mar 25, 2022

@huningxin

My question is whether we can leverage GPUQueue to submit it, like

Thanks! Yes, it's possible but there are no plans to support WebGPU interop through shared command-buffers and it may not be allowed unless it's explicitly defined in the WebGPU specification (ex. Dawn forbids this). Of course, going the other way is permitted.

@huningxin
Copy link
Contributor

GPUQueue::submit is defined to accept a sequence of GPUCommandBuffer:

interface GPUQueue {
    undefined submit(sequence<GPUCommandBuffer> commandBuffers);
};

So if we define MLContext::getCommandBuffer returns a GPUCommandBuffer, it seems to me WebGPU spec would not be required to change.

Did I miss anything?

@wchao1115
Copy link
Collaborator Author

wchao1115 commented Mar 27, 2022

So if we define MLContext::getCommandBuffer returns a GPUCommandBuffer, it seems to me WebGPU spec would not be required to change.

@bbernhar, @RafaelCintron and myself did discuss this exact design at great length and agreed not to pursue them mainly because it creates a strong requirement that a WebNN implementation must produce a WebGPU primitive (i.e. GPUCommandBuffer) that any implementation of WebGPU must understand how to handle.

Although I think it is conceptually sound, and am also a proponent of it, I agreed that it could make implementing WebNN much harder, especially when it's implemented using an OS API that has no design provision for interoping with the GPU stack used to implement WebGPU on the same environment e.g. it isn't at all clear how it's going to be implemented by Khronos API, Apple CoreML, or even hardware-vendor APIs like OpenVINO.

The new consensus now is to leave the ML workloads generation and the command buffer submission as an implementation detail behind a public interface that simply accepts the WebGPU device's default queue. Please take a look at my latest commit on #257.

@huningxin
Copy link
Contributor

huningxin commented Mar 28, 2022

because it creates a strong requirement that a WebNN implementation must produce a WebGPU primitive (i.e. GPUCommandBuffer) that any implementation of WebGPU must understand how to handle.

I suppose this might not be an issue. Because as you mentioned, only when WebNN is implemented by using an OS API that supports interop with the GPU stack used to implement WebGPU on the same environment, the creation of MLContext from a GPUDevice would be allowed. Otherwise, the creation of MLContext from a GPUDevice would throw an error. So there is not such a case that a WebNN implementation produces a WebGPU primitive that is not supported by a WebGPU implementation on the same environment / browser.

@bbernhar
Copy link

Did I miss anything?

GPUCommandBuffer is also expected to contain only "valid commands". This is a hard requirement for WebGPU implementations. Either WebGPU needs to know what these commands are (and where to find them) or be told how to validate/handle, otherwise Queue.submit will always fail. =(

@wchao1115
Copy link
Collaborator Author

wchao1115 commented Mar 29, 2022

only when WebNN is implemented by using an OS API that supports interop with the GPU stack used to implement WebGPU on the same environment, the creation of MLContext from a GPUDevice would be allowed.

The WebNN implementation also needs to be able to produce an implementation of GPUCommandBuffer (from your proposed MLContext.getCommandBuffer method) that the WebGPU implementation would accept. This is harder to do in practice than simply submitting the ML workload directly into the internal device's queue (from a GPUDevice) as it all happens behind the scene without a public commitment to producing WebGPU-compatible command buffer.

Additionally, if the MLContext is created with GPU device type, but without a GPUDevice given, the WebNN implementation is free to use its own internal device and reuses the exact same code as it would on a given GPUDevice to submit to the internal device's queue. It's a less publicly demanding contract that will still allow efficient interop with the WebGPU device without tying our hands in the future.

@huningxin
Copy link
Contributor

@bbernhar

GPUCommandBuffer is also expected to contain only "valid commands".

It makes sense. If the MLGPUCommandBuffer extends GPUCommandBuffer, its implementation should implement the validate / handle methods of the interface.

@wchao1115

This is harder to do in practice than simply submitting the ML workload directly into the internal device's queue (from a GPUDevice) as it all happens behind the scene without a public commitment to producing WebGPU-compatible command buffer.

It's true. However, with the proposal of "execute WebGPU indirect" as

mlcontext.Submit(queue, [commandEncoder1.Finish(), mlcommandBuffer, commandEncoder2.Finish()])

it requires WebNN implementation to validate / handlie of other types of GPUCommandBuffer, say rendering and compute, and submit them into the device queue. This part of work is out of scope of WebNN and duplicated with WebGPU implementation.

My another concern is from developer experience, with the MLContext::sbumit, now developers have two code paths to submit GPU command buffers. Ideally, developers should be able to just drop the WebNN graph compute as a command buffer into the existing submit path.

@wchao1115
Copy link
Collaborator Author

My another concern is from developer experience, with the MLContext::sbumit

I'm sorry I'm not following. My latest commit on PR #257 doesn't call for a new submit method, instead it just requires that the implementation of computeAsync internally submits the workload onto the GPUDevice device queue. Please take a look at the latest commit and let me know if you have any concern.

@huningxin
Copy link
Contributor

I was talking about @bbernhar 's proposal. Sorry for the confusion. I'll move to #257 and comment there.

@wchao1115
Copy link
Collaborator Author

I was talking about @bbernhar 's proposal. Sorry for the confusion. I'll move to #257 and comment there.

Yeah, we already moved past that proposal.

@wchao1115
Copy link
Collaborator Author

Hi all, I would like to pause the discussion in this PR for now to avoid confusion since all the recent revisions of the proposal on the topic have been made on #257. Please put your additional comment on that #257 for simplicity.

@bbernhar
Copy link

bbernhar commented Mar 29, 2022

it requires WebNN implementation to validate / handlie of other types of GPUCommandBuffer, say rendering and compute, and submit them into the device queue. This part of work is out of scope of WebNN and duplicated with WebGPU implementation.

Nothing is duplicated. WebNN would have it's own (internal) queue implementation that re-uses the WebGPU native API so it may co-execute GPU and ML commands in a single call. Going the other way requires WebGPU implementers to do the same.

@wchao1115
Copy link
Collaborator Author

Closing this one. See #257 the evolution of this PR continues there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants