TFLM Added optimized fully connected (float32 and int8) for CEVA-BX1 #45606

yair-ehrenwald · 2020-12-11T18:25:44Z

No description provided.

google-ml-butler · 2020-12-11T18:25:47Z

Thanks for contributing to TensorFlow Lite Micro.

To keep this process moving along, we'd like to make sure that you have completed the items on this list:

Read the contributing guidelines for TensorFlow Lite Micro
Created a TF Lite Micro Github issue
Linked to the issue from the PR description

We would like to have a discussion on the Github issue first to determine the best path forward, and then proceed to the PR review.

yair-ehrenwald · 2020-12-11T18:30:09Z

Github issue: #45607

advaitjain

High level comments on the first pass. I will likely have some more detailed feedback once we address the comments from this round of review.

Additional things to note:

you can apply the kokoro:force-run label to trigger the CI. Look for the TF Micro CI as that will show any errors that will prevent merging. In the case of this PR, it would be missing licenses, incorrect formatting etc.
My preference is to have a detailed PR description that shows what the cycle improvement would be from this optimized implementation on (say the keyword_benchmark) as well as exact commands to reproduce (even if the toolchain is proprietary). This makes looking back at the history much more meaningful.
- See #46411 as an example. In that PR I do not have a before/after since I already have continuous builds set up. For the current PR, I would recommend detailing the cycles for both the reference and optimized implementation.
We can still break this PR into smaller chucks for some of the headers. And that would be preferable. We will be refactoring some of the reference kernels as we go along.

advaitjain · 2021-01-19T06:16:14Z

tensorflow/lite/micro/examples/micro_speech/README.md

    To your installation location. For example: TARGET_TOOLCHAIN_ROOT :=
    /home/myuser/work/CEVA-ToolBox/V18/BX
 4.  Generate the Makefile for the project: /tensorflow$ make -f
-    tensorflow/lite/micro/tools/make/Makefile TARGET=ceva TARGET_ARCH=bx1


I'd prefer going with TARGET=ceva and TARGET_ARCH=bx1 and having a single kernels/ceva, tools/make/ceva_makefile.inc, tools/make/ext_libs/ceva.inc.

And all the logic to support the different architectures be contained within these directories.

advaitjain · 2021-01-19T06:18:52Z

tensorflow/lite/micro/kernels/ceva_bx1/fully_connected.cc

+struct OpData {
+  // The scaling factor from input to output (aka the 'real multiplier') can
+  // be represented as a fixed point multiplier plus a left shift.
+  int32_t output_multiplier;
+  int output_shift;
+  // The range of the fused activation layer. For example for kNone and
+  // uint8_t these would be 0 and 255.
+  int32_t output_activation_min;
+  int32_t output_activation_max;
+  // The index of the temporary tensor where the quantized inputs are cached.
+  int input_quantized_index;
+  // Cached zero point values of tensors.
+  int32_t input_zero_point;
+  int32_t filter_zero_point;
+  int32_t output_zero_point;
+};


let's use the shared struct:

tensorflow/tensorflow/lite/micro/kernels/fully_connected.h

Lines 26 to 41 in 9d66812

struct OpDataFullyConnected {

// The scaling factor from input to output (aka the 'real multiplier') can

// be represented as a fixed point multiplier plus a left shift.

int32_t output_multiplier;

int output_shift;

// The range of the fused activation layer. For example for kNone and

// uint8_t these would be 0 and 255.

int32_t output_activation_min;

int32_t output_activation_max;

// The index of the temporary tensor where the quantized inputs are cached.

int input_quantized_index;

// Cached zero point values of tensors.

int32_t input_zero_point;

int32_t filter_zero_point;

int32_t output_zero_point;

};

and also shared code (as much as possible):
https://github.com/tensorflow/tensorflow/blob/9d6681202e062fbf6df8bcb39d70b129333f590c/tensorflow/lite/micro/kernels/fully_connected_common.cc

See #46242 for an example of the direction that we would like to move towards in terms of sharing code between the reference and optimized implementations.

advaitjain · 2021-01-19T06:19:20Z

tensorflow/lite/micro/kernels/ceva_bx1/fully_connected.cc

+constexpr int kInputTensor = 0;
+constexpr int kWeightsTensor = 1;
+constexpr int kBiasTensor = 2;
+constexpr int kOutputTensor = 0;
+
+TfLiteStatus CalculateOpData(TfLiteContext* context,
+                             TfLiteFusedActivation activation,
+                             TfLiteType data_type, const TfLiteTensor* input,
+                             const TfLiteTensor* filter,
+                             const TfLiteTensor* bias, TfLiteTensor* output,
+                             OpData* data) {
+  TfLiteStatus status = kTfLiteOk;
+  if (data_type != kTfLiteFloat32) {
+    double real_multiplier = 0.0;
+    TF_LITE_ENSURE_STATUS(GetQuantizedConvolutionMultipler(
+        context, input, filter, bias, output, &real_multiplier));
+    int exponent;
+    QuantizeMultiplier(real_multiplier, &data->output_multiplier, &exponent);
+    data->output_shift = -exponent;
+    TF_LITE_ENSURE_STATUS(CalculateActivationRangeQuantized(
+        context, activation, output, &data->output_activation_min,
+        &data->output_activation_max));
+
+    data->input_zero_point = input->params.zero_point;
+    data->filter_zero_point = filter->params.zero_point;
+    data->output_zero_point = output->params.zero_point;
+  }
+  return status;
+}
+


these can all be shared from the refactored code.

advaitjain · 2021-01-19T06:20:19Z

tensorflow/lite/micro/kernels/ceva_bx1/fully_connected.cc

+  tflite::FullyConnectedParams op_params;
+  op_params.input_offset = -data.input_zero_point;
+  op_params.weights_offset = -data.filter_zero_point;
+  op_params.output_offset = data.output_zero_point;
+  op_params.output_multiplier = data.output_multiplier;
+  // TODO(b/138810107): Figure out whether output shift should be inverted
+  op_params.output_shift = -data.output_shift;
+  op_params.quantized_activation_min = data.output_activation_min;
+  op_params.quantized_activation_max = data.output_activation_max;


there is now a function that does this conversion which reduces this boilerplate code to a single line.

advaitjain · 2021-01-19T06:21:28Z

tensorflow/lite/micro/kernels/ceva_bx1/fully_connected.cc

+  void* params = (void*)&op_params;
+  int8_t* inputp = (int8_t*)tflite::micro::GetTensorData<int8_t>(input);
+  int8_t* filterp = (int8_t*)tflite::micro::GetTensorData<int8_t>(filter);
+  int32_t* biasp = (int32_t*)tflite::micro::GetTensorData<int32_t>(bias);
+  int8_t* outputp = (int8_t*)tflite::micro::GetTensorData<int8_t>(output);


prefer C++ style casts, and static_cast whenever possible.

https://google.github.io/styleguide/cppguide.html#Casting

advaitjain · 2021-01-19T06:31:29Z

tensorflow/lite/micro/tools/make/targets/ceva_bx1_makefile.inc

-ifeq ($(TARGET), ceva)
-ifeq ($(TARGET_ARCH), bx1)
- #TARGET_ARCH := 
+ifeq ($(TARGET), ceva_bx1)


was this intentional? going to TARGET=ceva_bx1?

advaitjain · 2021-01-19T06:32:08Z

tensorflow/lite/micro/tools/make/targets/ceva_bx1_makefile.inc

+TARGET_ARCH := bx
 CXXFLAGS := -std=c++11 -DTF_LITE_STATIC_MEMORY
-CCFLAGS  := -std=c11   -DTF_LITE_STATIC_MEMORY
+CCFLAGS  := -std=c11   -DTF_LITE_STATIC_MEMORY -I./tensorflow/lite/micro/kernels/ceva_bx1


preference would be to use full include paths, unless there is a strong reason why that would not work.

advaitjain · 2021-01-19T06:32:55Z

tensorflow/lite/micro/kernels/ceva_bx1/CEVA_basic_op.h

@@ -0,0 +1,611 @@
+/*************************************************************************************\
+* Copyright (C) CEVA(R) Inc. All rights reserved *


We can only accept code with a single copyright for the tensorflow authors + apache license.

advaitjain · 2021-01-19T06:34:44Z

tensorflow/lite/micro/kernels/ceva_bx1/CEVA_TFLM_lib.h

@@ -0,0 +1,571 @@
+


where is the implementation of this header? Preference would be to break this up into exactly what is needed for a particular PR since that would make reviewing the code feasible.

advaitjain · 2021-01-19T06:35:02Z

tensorflow/lite/micro/kernels/ceva_bx1/fully_connected.cc

@@ -0,0 +1,348 @@
+/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.


advaitjain · 2021-01-19T06:44:27Z

Also, I have created a new label (comp:micro:ceva). If you could apply that to all your PRs and github issues, that would make tracking this work easier. And link the issue from your PR as well.

You should have the permissions needed to be able to apply labels.

You can directly add me as a reviewer, have yourself be the assignee for the github issues.

advaitjain · 2021-01-19T06:51:16Z

I may have reviewed this PR before it was ready -- sorry about that. If possible, please explicitly add a comment letting me know that the PR is ready for review (similar to #46411 (comment))

yair-ehrenwald · 2021-01-19T09:08:09Z

I may have reviewed this PR before it was ready -- sorry about that. If possible, please explicitly add a comment letting me know that the PR is ready for review (similar to #46411 (comment))

This is my fault, I should have closed this one - I created a new PR after our chat - incorporating code sharing with the reference kernel - see #46500 and also #46226 for Quantize.
I'll go over your comments here and apply them to those PRs where applicable.

advaitjain · 2021-01-19T22:57:08Z

sounds good, I'll go ahead and close this PR then. Let me know when the others are ready for review.

yair-ehrenwald added 2 commits December 8, 2020 21:16

Makefile changes to resolve issue 43898

e690dcb

Added optimized fully connected op

fb034ff

google-ml-butler bot added the size:XL CL Change Size:Extra Large label Dec 11, 2020

google-cla bot added the cla: yes label Dec 11, 2020

yair-ehrenwald mentioned this pull request Dec 11, 2020

TFLM: Adding ported and optimized operations for CEVA-BX1 #45607

Closed

clang format

6c716d2

gbaned self-assigned this Dec 14, 2020

gbaned added the comp:micro Related to TensorFlow Lite Microcontrollers label Dec 14, 2020

gbaned requested a review from advaitjain December 14, 2020 04:40

gbaned added the awaiting review Pull request awaiting review label Dec 21, 2020

advaitjain requested changes Jan 19, 2021

View reviewed changes

advaitjain added the comp:micro:ceva label Jan 19, 2021

advaitjain unassigned gbaned Jan 19, 2021

advaitjain closed this Jan 19, 2021

	struct OpDataFullyConnected {
	// The scaling factor from input to output (aka the 'real multiplier') can
	// be represented as a fixed point multiplier plus a left shift.
	int32_t output_multiplier;
	int output_shift;
	// The range of the fused activation layer. For example for kNone and
	// uint8_t these would be 0 and 255.
	int32_t output_activation_min;
	int32_t output_activation_max;
	// The index of the temporary tensor where the quantized inputs are cached.
	int input_quantized_index;
	// Cached zero point values of tensors.
	int32_t input_zero_point;
	int32_t filter_zero_point;
	int32_t output_zero_point;
	};

		@@ -0,0 +1,611 @@
		/*************************************************************************************\
		* Copyright (C) CEVA(R) Inc. All rights reserved *

		@@ -0,0 +1,348 @@
		/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.

TFLM Added optimized fully connected (float32 and int8) for CEVA-BX1 #45606

TFLM Added optimized fully connected (float32 and int8) for CEVA-BX1 #45606

Uh oh!

Conversation

yair-ehrenwald commented Dec 11, 2020

Uh oh!

google-ml-butler bot commented Dec 11, 2020

Uh oh!

yair-ehrenwald commented Dec 11, 2020

Uh oh!

advaitjain left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

advaitjain commented Jan 19, 2021

Uh oh!

advaitjain commented Jan 19, 2021

Uh oh!

yair-ehrenwald commented Jan 19, 2021

Uh oh!

advaitjain commented Jan 19, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants