Skip to content

Conversation

@yair-ehrenwald
Copy link
Contributor

No description provided.

@google-ml-butler google-ml-butler bot added the size:XL CL Change Size:Extra Large label Dec 11, 2020
@google-ml-butler
Copy link

Thanks for contributing to TensorFlow Lite Micro.

To keep this process moving along, we'd like to make sure that you have completed the items on this list:

We would like to have a discussion on the Github issue first to determine the best path forward, and then proceed to the PR review.

@yair-ehrenwald
Copy link
Contributor Author

Github issue: #45607

@gbaned gbaned self-assigned this Dec 14, 2020
@gbaned gbaned added the comp:micro Related to TensorFlow Lite Microcontrollers label Dec 14, 2020
@gbaned gbaned requested a review from advaitjain December 14, 2020 04:40
@gbaned gbaned added the awaiting review Pull request awaiting review label Dec 21, 2020
Copy link
Member

@advaitjain advaitjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level comments on the first pass. I will likely have some more detailed feedback once we address the comments from this round of review.

Additional things to note:

  • you can apply the kokoro:force-run label to trigger the CI. Look for the TF Micro CI as that will show any errors that will prevent merging. In the case of this PR, it would be missing licenses, incorrect formatting etc.

  • My preference is to have a detailed PR description that shows what the cycle improvement would be from this optimized implementation on (say the keyword_benchmark) as well as exact commands to reproduce (even if the toolchain is proprietary). This makes looking back at the history much more meaningful.

    • See #46411 as an example. In that PR I do not have a before/after since I already have continuous builds set up. For the current PR, I would recommend detailing the cycles for both the reference and optimized implementation.
  • We can still break this PR into smaller chucks for some of the headers. And that would be preferable. We will be refactoring some of the reference kernels as we go along.

To your installation location. For example: TARGET_TOOLCHAIN_ROOT :=
/home/myuser/work/CEVA-ToolBox/V18/BX
4. Generate the Makefile for the project: /tensorflow$ make -f
tensorflow/lite/micro/tools/make/Makefile TARGET=ceva TARGET_ARCH=bx1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer going with TARGET=ceva and TARGET_ARCH=bx1 and having a single kernels/ceva, tools/make/ceva_makefile.inc, tools/make/ext_libs/ceva.inc.

And all the logic to support the different architectures be contained within these directories.

Comment on lines +34 to +49
struct OpData {
// The scaling factor from input to output (aka the 'real multiplier') can
// be represented as a fixed point multiplier plus a left shift.
int32_t output_multiplier;
int output_shift;
// The range of the fused activation layer. For example for kNone and
// uint8_t these would be 0 and 255.
int32_t output_activation_min;
int32_t output_activation_max;
// The index of the temporary tensor where the quantized inputs are cached.
int input_quantized_index;
// Cached zero point values of tensors.
int32_t input_zero_point;
int32_t filter_zero_point;
int32_t output_zero_point;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use the shared struct:

struct OpDataFullyConnected {
// The scaling factor from input to output (aka the 'real multiplier') can
// be represented as a fixed point multiplier plus a left shift.
int32_t output_multiplier;
int output_shift;
// The range of the fused activation layer. For example for kNone and
// uint8_t these would be 0 and 255.
int32_t output_activation_min;
int32_t output_activation_max;
// The index of the temporary tensor where the quantized inputs are cached.
int input_quantized_index;
// Cached zero point values of tensors.
int32_t input_zero_point;
int32_t filter_zero_point;
int32_t output_zero_point;
};

and also shared code (as much as possible):
https://github.com/tensorflow/tensorflow/blob/9d6681202e062fbf6df8bcb39d70b129333f590c/tensorflow/lite/micro/kernels/fully_connected_common.cc

See #46242 for an example of the direction that we would like to move towards in terms of sharing code between the reference and optimized implementations.

Comment on lines +51 to +80
constexpr int kInputTensor = 0;
constexpr int kWeightsTensor = 1;
constexpr int kBiasTensor = 2;
constexpr int kOutputTensor = 0;

TfLiteStatus CalculateOpData(TfLiteContext* context,
TfLiteFusedActivation activation,
TfLiteType data_type, const TfLiteTensor* input,
const TfLiteTensor* filter,
const TfLiteTensor* bias, TfLiteTensor* output,
OpData* data) {
TfLiteStatus status = kTfLiteOk;
if (data_type != kTfLiteFloat32) {
double real_multiplier = 0.0;
TF_LITE_ENSURE_STATUS(GetQuantizedConvolutionMultipler(
context, input, filter, bias, output, &real_multiplier));
int exponent;
QuantizeMultiplier(real_multiplier, &data->output_multiplier, &exponent);
data->output_shift = -exponent;
TF_LITE_ENSURE_STATUS(CalculateActivationRangeQuantized(
context, activation, output, &data->output_activation_min,
&data->output_activation_max));

data->input_zero_point = input->params.zero_point;
data->filter_zero_point = filter->params.zero_point;
data->output_zero_point = output->params.zero_point;
}
return status;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these can all be shared from the refactored code.

Comment on lines +113 to +121
tflite::FullyConnectedParams op_params;
op_params.input_offset = -data.input_zero_point;
op_params.weights_offset = -data.filter_zero_point;
op_params.output_offset = data.output_zero_point;
op_params.output_multiplier = data.output_multiplier;
// TODO(b/138810107): Figure out whether output shift should be inverted
op_params.output_shift = -data.output_shift;
op_params.quantized_activation_min = data.output_activation_min;
op_params.quantized_activation_max = data.output_activation_max;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is now a function that does this conversion which reduces this boilerplate code to a single line.

Comment on lines +148 to +152
void* params = (void*)&op_params;
int8_t* inputp = (int8_t*)tflite::micro::GetTensorData<int8_t>(input);
int8_t* filterp = (int8_t*)tflite::micro::GetTensorData<int8_t>(filter);
int32_t* biasp = (int32_t*)tflite::micro::GetTensorData<int32_t>(bias);
int8_t* outputp = (int8_t*)tflite::micro::GetTensorData<int8_t>(output);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer C++ style casts, and static_cast whenever possible.

https://google.github.io/styleguide/cppguide.html#Casting

ifeq ($(TARGET), ceva)
ifeq ($(TARGET_ARCH), bx1)
#TARGET_ARCH :=
ifeq ($(TARGET), ceva_bx1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this intentional? going to TARGET=ceva_bx1?

TARGET_ARCH := bx
CXXFLAGS := -std=c++11 -DTF_LITE_STATIC_MEMORY
CCFLAGS := -std=c11 -DTF_LITE_STATIC_MEMORY
CCFLAGS := -std=c11 -DTF_LITE_STATIC_MEMORY -I./tensorflow/lite/micro/kernels/ceva_bx1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preference would be to use full include paths, unless there is a strong reason why that would not work.

@@ -0,0 +1,611 @@
/*************************************************************************************\
* Copyright (C) CEVA(R) Inc. All rights reserved *
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can only accept code with a single copyright for the tensorflow authors + apache license.

@@ -0,0 +1,571 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the implementation of this header? Preference would be to break this up into exactly what is needed for a particular PR since that would make reviewing the code feasible.

@@ -0,0 +1,348 @@
/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2021

@advaitjain
Copy link
Member

Also, I have created a new label (comp:micro:ceva). If you could apply that to all your PRs and github issues, that would make tracking this work easier. And link the issue from your PR as well.

You should have the permissions needed to be able to apply labels.

You can directly add me as a reviewer, have yourself be the assignee for the github issues.

@advaitjain
Copy link
Member

I may have reviewed this PR before it was ready -- sorry about that. If possible, please explicitly add a comment letting me know that the PR is ready for review (similar to #46411 (comment))

@yair-ehrenwald
Copy link
Contributor Author

I may have reviewed this PR before it was ready -- sorry about that. If possible, please explicitly add a comment letting me know that the PR is ready for review (similar to #46411 (comment))

This is my fault, I should have closed this one - I created a new PR after our chat - incorporating code sharing with the reference kernel - see #46500 and also #46226 for Quantize.
I'll go over your comments here and apply them to those PRs where applicable.

@advaitjain
Copy link
Member

sounds good, I'll go ahead and close this PR then. Let me know when the others are ready for review.

@advaitjain advaitjain closed this Jan 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting review Pull request awaiting review cla: yes comp:micro Related to TensorFlow Lite Microcontrollers size:XL CL Change Size:Extra Large

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants