New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TFLite 16x8] ADD/SUB operators: fixes + tests for versioning #42373
Conversation
…se + tests for versioning. Change-Id: Ib24a2c2b837eaeb4868afdf1445aa332965a41af
- POT int16x8: create a new BroadcastSub16POTSlow function to manage the POT scaling. - General int16x8: the BroadcastAdd4DSlow should be used instead of BroadcastSubSlow as the sign of input2 multiplier is changed in PrepareGeneralSubOp. Change-Id: Id8042d089af51f402cba72b1db9bb5d948ba5cbc
@@ -913,6 +913,7 @@ OperatorProperty GetOperatorProperty(const ModelT* model, int subgraph_index, | |||
property.inputs = {{0, {}}, {1, {}}}; | |||
property.outputs = {{0, {}}}; | |||
property.version = 2; | |||
property.restrict_same_input_output_scale = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The properties for ADD and SUB ops are different. Is it intended? What is the reason for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for spotting! corrected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you
Change-Id: I0ddc0cd4db9604d3e84f5ce55ecbf8acc55c08d0
Change-Id: Ia7f43fe9d77596e6ba9ff85d9776b94440362b0a
float GetTolerance(float min, float max) { | ||
float kQuantizedStep = (max - min) / (std::numeric_limits<T>::max() - | ||
std::numeric_limits<T>::min()); | ||
return 2.0 * kQuantizedStep; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the 2.0 come from ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @fredrec Thanks for the review.
Tolerance should be 2 quantized steps for the result:
a_float = a_int16 +- quantized_step
if i do difference, then upper bound is 2*quantized_step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this PR:
maximum version of ADD has been changed to 3 for 16x8, so I updated places, where it is mentioned as 4
both inputs for SUB operator should be quantized to int16 as for it is done for ADD
It has been suggested to modify quantize_model.cc file and set option pot_scale_int16 to false during quantization - implementation is done this way.