Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling unsupported OperandType #36

Closed
huningxin opened this issue Jan 15, 2020 · 14 comments
Closed

Handling unsupported OperandType #36

huningxin opened this issue Jan 15, 2020 · 14 comments

Comments

@huningxin
Copy link
Contributor

Current spec defines OperandType enum

enum OperandType {
  "float32",
  "int32",
  "uint32",
  "tensor-float32",
  "tensor-int32",
  "tensor-quant8-asymm"
};

The float16 and tensor-float16 are being added #35 .

However, as mentioned by @wchao1115 in #26 (comment), there are situations that the selected device doesn't have native support of an OperandType. For example, some CPUs may not support tensor-float16, some GPUs may not support tensor-quant8-asymm and some AI accelerators may not support tensor-float32. To allow the app gracefully handle these situations, e.g. select a different device, or use different model with supported operand type, the API should report the unsupported OperandType error.

Open this issue to explore the definition of unsupported OperandType error and the API behavior to return that error.

Thoughts?

@wchao1115
Copy link
Collaborator

@huningxin Should "tensor-quant8-asymc" simply be "tensor-int8"? Does the current naming imply that the content in the tensor should be coupled with quantization data e.g. zero-point? It might be cleaner to just define the data type here independent of the operation (e.g. quantization) that have originated it.

@huningxin
Copy link
Contributor Author

Does the current naming imply that the content in the tensor should be coupled with quantization data e.g. zero-point?

Yes. It is supposed to be used together with scale and zeroPoint of OpearndDescriptor.

It might be cleaner to just define the data type here independent of the operation (e.g. quantization) that have originated it.

I notice there are different quantization support ways in native API, e.g. ONNX defines quantization operations, such as QLinearConv, but NNAPI defines quantized tensors.

We may need to create a new issue for compatibility investigation among them. WDYT?

@wchao1115
Copy link
Collaborator

wchao1115 commented Feb 3, 2020

@huningxin My feedback is not about compatibility. If this enum is meant to describe the data type of an operand, then it should not mix with the notion of a process to be applied with that operand. It just makes it more confusing and hard to evolve over time. If an API requires or produces quantized data blocks, naturally it will also require or produce quantization parameters e.g. zero point and scale (for linear quantization). The fact that the quantized data type is int8 or int4 is independent of the quantization process the data type is involved in. Therefore, defining an enum that mixes the two concepts together is not a good design and makes it hard to extend it in the future.

@huningxin
Copy link
Contributor Author

huningxin commented Feb 3, 2020

@wchao1115 , thanks for your feedback. This enum is used by OperandDescriptor to describe the type of an operand. In this design, the quantized tensor is one type of operand. For example, user code can describe a 8-bit asymmetric quantized tensor operand by setting OperandDescriptor.type as tensor-quant8-asymm and specifying the OperandDescriptor.scale and OperandDescriptor.zeroPoint value accordingly. It may be extended to support other quantization schema by adding a new type into the OperandType enum and extending the OperandDescriptor interface for additional parameters. This design can map to NNAPI ANeuralNetworksOperandType that supports multiple quantization schemas, e.g. 8-bit per-channel symmetric quantized tensor used for weights. However it may not map well to other native APIs, so I propose to create a dedicated issue about it (for compatibility investigation or new design proposal).

@wchao1115
Copy link
Collaborator

wchao1115 commented Feb 5, 2020

@huningxin You can define OperandDescriptor with only the data type and the shape, and instead adding a method that accepts the zeropoint and scale, and produces an Operand interface object of a quantized data block. This way you won't need a quantized-specific enum value in OperandType. This way OperandDescriptor remains nice and compact, and more durable over time. The current design makes this data structure too broadly defined, which risks constant changes in the future whenever a new operand type is needed (say, for a non-linear quantized operation, or a sparse tensor, etc.)

@huningxin
Copy link
Contributor Author

You can define OperandDescriptor with only the data type and the shape, and instead adding a method that accepts the zeropoint and scale, and produces an Operand interface object of a quantized data block.

This sounds like a good design. @wchao1115 , could you please open a new issue for this proposal? Thanks.

@gramalingam
Copy link
Contributor

I agree. To rephrase the question, as I understand it, do the quantization-parameters (zeropoint/scale) have to be part of the type (fixed statically), or can we let them be dynamically determined?

@wchao1115
Copy link
Collaborator

You can define OperandDescriptor with only the data type and the shape, and instead adding a method that accepts the zeropoint and scale, and produces an Operand interface object of a quantized data block.

This sounds like a good design. @wchao1115 , could you please open a new issue for this proposal? Thanks.

Create new issue #44

@huningxin
Copy link
Contributor Author

@wchao1115 , do you think we still need this issue? I suppose #44 is only about redesign the quantization, am I correct?

@BenjaminPoulain
Copy link

@wchao1115 , do you think we still need this issue? I suppose #44 is only about redesign the quantization, am I correct?

Yes

@huningxin
Copy link
Contributor Author

As the discussion in Feb 20 CG meeting, reopen this issue. When hardware doesn't support a data type in a model, the compilation should fail with an exception. The proposal would depend on #46 that changes compilation interface.

@huningxin huningxin reopened this Feb 26, 2020
@wchao1115
Copy link
Collaborator

As the discussion in Feb 20 CG meeting, reopen this issue. When hardware doesn't support a data type in a model, the compilation should fail with an exception.

When an API throw an exception, it usually means it hits a condition in which it doesn't expect and one that its immediate caller can't easily recover from e.g. a hardware failure in the middle of an I/O operation etc. This is why an exception is not the same as a return error code, and that a catch statement on call site should be rare.

A data type not supported by the underlying hardware is an error code not an exception because there is a defined behavior of what the immediate caller of the API should do to recover from such a condition i.e. compile with a fallback option.

To handle this situation, we will need to look at the current CompilationPreference enum or maybe introducing a new enum that the caller can use to recover from this error condition i.e. a CPU fallback for instance.

@dontcallmedom
Copy link
Contributor

was this solved by #50?

@anssiko
Copy link
Member

anssiko commented Mar 3, 2023

Indeed we can close this per #50 (comment), @wchao1115 please feel free to re-open if you think otherwise.

@anssiko anssiko closed this as completed Mar 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants