Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

op metadata that helps avoid implementation mistakes #243

Closed
quidity opened this issue Dec 22, 2021 · 6 comments
Closed

op metadata that helps avoid implementation mistakes #243

quidity opened this issue Dec 22, 2021 · 6 comments
Labels
security-tracker Group bringing to attention of security, or tracked by the security Group but not needing response.

Comments

@quidity
Copy link

quidity commented Dec 22, 2021

Some way to help implementors validate that data is being passed correctly between nodes in the computation graph, has the expected sizes, etc.

Perhaps a little language which describes the required sizes/shapes and can be used to create checks that can be enforced as a graph is constructed, and as a graph is executed.

e.g. fields for ops which indicate:

Does it change input/output tensor contents?
Does it change input/output tensor sizes?
Can the input tensor be the output tensor?
Shape of the input, shape of the output?
Constraints on other input parameters that can be turned into automatic checks.

@anssiko anssiko added the security-tracker Group bringing to attention of security, or tracked by the security Group but not needing response. label Dec 22, 2021
@anssiko
Copy link
Member

anssiko commented Jan 31, 2022

The group had this issue on its agenda, but we did not record anything in the minutes.

All - please share your feedback for the proposal on this issue.

@anssiko
Copy link
Member

anssiko commented Feb 17, 2022

We discussed this issue again on our call. It was noted it may not be possible to extract very useful metadata for all ops given the complexity of operations.

@quidity, we'd like to hear which of these options would work for you:

  1. Would you prefer us to add a new "Metadata" section for each op to accompany "Arguments" and "Returns"? See the relu example below.
  2. Or could we perhaps provide all this information combined in an appendix? It could allow us to present this information in a more concise form, for example as a table. We could then link to that from applicable op definitions. See the example table below.
  3. Something else?

Option 1: add metadata into each op definition (using relu as an example):

7.7.21. relu

Compute the rectified linear function of the input tensor.

partial interface MLGraphBuilder {
  MLOperand relu(MLOperand x);
  MLOperator relu();
};

Arguments:
x: an MLOperand. The input tensor.

Returns:

an MLOperand. The output tensor of the same shape as x.
an MLOperator. The operator representing the relu operation.

Metadata:
Does it change input/output tensor contents?
Does it change input/output tensor sizes?
Can the input tensor be the output tensor?
Shape of the input, shape of the output?
Constraints on other input parameters that can be turned into automatic checks.


Option 2: add a single table combining metadata across all ops:

Does it change input/output tensor contents? Does it change input/output tensor sizes? Can the input tensor be the output tensor? Shape of the input, shape of the output? Constraints on other input parameters that can be turned into automatic checks.
batchNormalization
clamp
concat

@quidity
Copy link
Author

quidity commented Feb 17, 2022

Having something machine-readable (an appendix, or directly as metadata on the ops) will be more useful when constructing fuzzers or running a hardened debug implementation designed to detect implementation errors. A table is useful to people, but less so to the machines.

@wchao1115
Copy link
Collaborator

wchao1115 commented Feb 23, 2022

I would like to point out that the MLGraphBuilder API isn't a data execution API, but rather a data definition API. Every single defined operation that can act as a "node" in this computational graph has a clearly defined mapping rule between its input and output shape of the data it would consume and produce. This means that at any random node one can always infer the node's input shape by reversing the graph all the way back to the root node where the shapes of the input nodes are fully defined. This process is generally known as shape inferencing, and it makes any metadata about input shape to be added to the operation definition redundant. Note that shape inferencing only concerns the definition of the data (e.g. its shape) and not the actual content of the data.

Once the graph is fully constructed and compiled, the input shapes into each of the operations in the graph are inferred and finalized. It is part of what the implementation of the MLGraph interface would store. And finally, when the graph is actually executed against the actual data that are bound to it at execution time, bound checking would then occur. Keep in mind that no actual data has been bound to the compiled graph until at this stage, right before the graph is executed against it. This is happening at the compute method, which could fail. It is the implementation's responsibility to make sure proper bound checking occurs against the shapes of the data already inferred by that time.

@huningxin
Copy link
Contributor

Can the input tensor be the output tensor?

As @wchao1115 mentioned, MLGraphBuilder is a graph definition API, the output MLOperand is returned by the function call of an operation, like y = builder.relu(x). The input can not be used as the output.

The MLGraph::compute method takes both input buffers and output buffers. Do you mean whether a buffer can be used for both input and output in the same call? I think this is not defined in the current spec.

@anssiko
Copy link
Member

anssiko commented Mar 24, 2022

In #251 we landed an update to https://www.w3.org/TR/webnn/#security that describes how the chosen API design (graph definition API) minimizes the attack surface for the compiled computational graph. We believe this update makes the benefits of this design choice clearer and addresses this issue.

@anssiko anssiko closed this as completed Mar 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
security-tracker Group bringing to attention of security, or tracked by the security Group but not needing response.
Projects
None yet
Development

No branches or pull requests

4 participants