Use xa_nnlib for conv for Fusion F1 #47378

njeffrie · 2021-02-24T18:43:16Z

The code in this change is the subset of functionality needed for int8 svdf for Hifi4 copied from pnikam-cad/tensorflow@a737c1e/tensorflow/lite/micro/kernels/xtensa_hifi/conv.cc

Note that the current change has not pulled in the floating point, uint8 implementation or the Hifi5 implementation.

Profiled the person_detection_benchmark with the following command:

make -f tensorflow/lite/micro/tools/make/Makefile TARGET=xtensa OPTIMIZED_KERNEL_DIR=xtensa TARGET_ARCH=fusion_f1 XTENSA_CORE=F1_190305_swupgrade run_person_detection_benchmark -j8
gives a latency of 73.761M ticks with this change vs 212.980M ticks without this change.

Per OP latency with this change:

KeywordRunNIerations(1) took 38516 ticks (38 ms)
DEPTHWISE_CONV_2D took 11961939 ticks (11961 ms).
DEPTHWISE_CONV_2D took 12296923 ticks (12296 ms).
CONV_2D took 987358 ticks (987 ms).
DEPTHWISE_CONV_2D took 6138259 ticks (6138 ms).
CONV_2D took 554614 ticks (554 ms).
DEPTHWISE_CONV_2D took 12063331 ticks (12063 ms).
CONV_2D took 665206 ticks (665 ms).
DEPTHWISE_CONV_2D took 3018615 ticks (3018 ms).
CONV_2D took 334246 ticks (334 ms).
DEPTHWISE_CONV_2D took 5837463 ticks (5837 ms).
CONV_2D took 444838 ticks (444 ms).
DEPTHWISE_CONV_2D took 1462009 ticks (1462 ms).
CONV_2D took 225286 ticks (225 ms).
DEPTHWISE_CONV_2D took 2734009 ticks (2734 ms).
CONV_2D took 335878 ticks (335 ms).
DEPTHWISE_CONV_2D took 2734009 ticks (2734 ms).
CONV_2D took 335878 ticks (335 ms).
DEPTHWISE_CONV_2D took 2734009 ticks (2734 ms).
CONV_2D took 335878 ticks (335 ms).
DEPTHWISE_CONV_2D took 2734009 ticks (2734 ms).
CONV_2D took 335878 ticks (335 ms).
DEPTHWISE_CONV_2D took 2734009 ticks (2734 ms).
CONV_2D took 335878 ticks (335 ms).
DEPTHWISE_CONV_2D took 685980 ticks (685 ms).
CONV_2D took 173254 ticks (173 ms).
DEPTHWISE_CONV_2D took 1197084 ticks (1197 ms).
CONV_2D took 283846 ticks (283 ms).
AVERAGE_POOL_2D took 75604 ticks (75 ms).
CONV_2D took 3382 ticks (3 ms).
RESHAPE took 290 ticks (0 ms).
SOFTMAX took 1933 ticks (1 ms).

Without this change:

WithPersonDataIterations(1) took 212980371 ticks (212980 ms)
DEPTHWISE_CONV_2D took 11961939 ticks (11961 ms).
DEPTHWISE_CONV_2D took 12296923 ticks (12296 ms).
CONV_2D took 13604549 ticks (13604 ms).
DEPTHWISE_CONV_2D took 6138259 ticks (6138 ms).
CONV_2D took 9585893 ticks (9585 ms).
DEPTHWISE_CONV_2D took 12063331 ticks (12063 ms).
CONV_2D took 15189221 ticks (15189 ms).
DEPTHWISE_CONV_2D took 3018615 ticks (3018 ms).
CONV_2D took 7590389 ticks (7590 ms).
DEPTHWISE_CONV_2D took 5837463 ticks (5837 ms).
CONV_2D took 13193717 ticks (13193 ms).
DEPTHWISE_CONV_2D took 1462009 ticks (1462 ms).
CONV_2D took 6596093 ticks (6596 ms).
DEPTHWISE_CONV_2D took 2734009 ticks (2734 ms).
CONV_2D took 12199421 ticks (12199 ms).
DEPTHWISE_CONV_2D took 2734009 ticks (2734 ms).
CONV_2D took 12199421 ticks (12199 ms).
DEPTHWISE_CONV_2D took 2734009 ticks (2734 ms).
CONV_2D took 12199421 ticks (12199 ms).
DEPTHWISE_CONV_2D took 2734009 ticks (2734 ms).
CONV_2D took 12199421 ticks (12199 ms).
DEPTHWISE_CONV_2D took 2734009 ticks (2734 ms).
CONV_2D took 12199421 ticks (12199 ms).
DEPTHWISE_CONV_2D took 685980 ticks (685 ms).
CONV_2D took 6099809 ticks (6099 ms).
DEPTHWISE_CONV_2D took 1197084 ticks (1197 ms).
CONV_2D took 11703137 ticks (11703 ms).
AVERAGE_POOL_2D took 75604 ticks (75 ms).
CONV_2D took 10983 ticks (10 ms).
RESHAPE took 290 ticks (0 ms).
SOFTMAX took 1933 ticks (1 ms).

Confirmed that the kernel_conv_test passes with:

make -f tensorflow/lite/micro/tools/make/Makefile TARGET=xtensa OPTIMIZED_KERNEL_DIR=xtensa TARGET_ARCH=fusion_f1 XTENSA_CORE=F1_190305_swupgrade test_kernel_conv_test -j8

Progress towards http://b/177457688

google-ml-butler · 2021-02-24T18:43:19Z

Thanks for contributing to TensorFlow Lite Micro.

To keep this process moving along, we'd like to make sure that you have completed the items on this list:

Read the contributing guidelines for TensorFlow Lite Micro
Created a TF Lite Micro Github issue
Linked to the issue from the PR description

We would like to have a discussion on the Github issue first to determine the best path forward, and then proceed to the PR review.

advaitjain · 2021-02-24T22:51:52Z

tensorflow/lite/micro/kernels/xtensa/conv.cc

+  TF_LITE_ENSURE_OK(
+      context, context->RequestScratchBufferInArena(
+                   context, required_scratch, &data->scratch_tensor_index));
+#endif


nit: add #endif // defined(FUSION_F1)

advaitjain · 2021-02-24T22:59:58Z

tensorflow/lite/micro/kernels/xtensa/conv.cc

+  /* Dilation is currently not supported on HiFi 4 NN Library */
+  if ((params.dilation_width_factor == 1) &&
+      (params.dilation_height_factor == 1)) {
+    const int32_t input_offset = -data.reference_op_data.input_zero_point;
+    const int32_t output_offset = data.reference_op_data.output_zero_point;
+    const int stride_width = params.stride_width;
+    const int stride_height = params.stride_height;
+    const int pad_width = data.reference_op_data.padding.width;
+    const int pad_height = data.reference_op_data.padding.height;
+    const int32_t output_activation_min =
+        data.reference_op_data.output_activation_min;
+    const int32_t output_activation_max =
+        data.reference_op_data.output_activation_max;
+
+    const RuntimeShape& input_shape = tflite::micro::GetTensorShape(input);
+    const RuntimeShape& filter_shape = tflite::micro::GetTensorShape(filter);
+    const RuntimeShape& output_shape = tflite::micro::GetTensorShape(output);
+    const int batches = MatchingDim(input_shape, 0, output_shape, 0);
+    const int input_depth = MatchingDim(input_shape, 3, filter_shape, 3);
+    const int output_depth = MatchingDim(filter_shape, 0, output_shape, 3);
+    const int input_height = input_shape.Dims(1);
+    const int input_width = input_shape.Dims(2);


let's move this code into an EvalHifi4 function, something like

tensorflow/tensorflow/lite/micro/kernels/xtensa/softmax.cc

Lines 212 to 237 in c023a82

TfLiteStatus EvalHifi4(const OpData* op_data, const TfLiteEvalTensor* input,

TfLiteEvalTensor* output, TfLiteContext* context) {

const RuntimeShape& input_shape = tflite::micro::GetTensorShape(input);

const int8_t* input_data = tflite::micro::GetTensorData<int8_t>(input);

const RuntimeShape& output_shape = tflite::micro::GetTensorShape(output);

int16_t* output_data = tflite::micro::GetTensorData<int16_t>(output);

const int trailing_dim = input_shape.DimensionsCount() - 1;

const int outer_size =

MatchingFlatSizeSkipDim(input_shape, trailing_dim, output_shape);

const int depth =

MatchingDim(input_shape, trailing_dim, output_shape, trailing_dim);

void* p_scratch = static_cast<void*>(

context->GetScratchBuffer(context, op_data->scratch_tensor_index));

for (int i = 0; i < outer_size; ++i) {

int err = xa_nn_vec_softmax_asym8s_16(

&output_data[i * depth], &input_data[i * depth],

op_data->params.diff_min, op_data->params.input_left_shift,

op_data->params.input_multiplier, depth, p_scratch);

TF_LITE_ENSURE(context, err == 0);

}

return kTfLiteOk;

}

#endif // defined(FUSION_F1)

advaitjain · 2021-02-24T23:01:28Z

tensorflow/lite/micro/kernels/xtensa/conv.cc

+struct OpData {
+  OpDataConv reference_op_data;
+
+  int scratch_tensor_index;
+};


want to use this struct only for FUSION_F1? the scratch_tensor_index isn't really meaningful for hifimini or a reference fallback.

#ifdef'd the extra member for F1 only.

advaitjain · 2021-02-24T23:42:00Z

tensorflow/lite/micro/kernels/xtensa/conv.cc

-                 tflite::micro::GetTensorShape(output),
-                 tflite::micro::GetTensorData<int8_t>(output));
-#elif defined(FUSION_F1)
+TfLiteStatus EvalHifi4(TfLiteContext* context, TfLiteNode* node,


I think this would also have to be within #if defined(FUSION_F1) else the hifimini build would break?

google-ml-butler bot added the size:L CL Change Size: Large label Feb 24, 2021

google-cla bot added the cla: yes label Feb 24, 2021

njeffrie requested a review from advaitjain February 24, 2021 18:43

njeffrie added 3 commits February 24, 2021 11:03

Fix buildifier issue

343a807

Use xa_nnlib for conv for Fusion F1

a8b7311

Clean up conv prepare + eval

eb9728c

njeffrie force-pushed the f1_optimizations branch from 1c61c97 to eb9728c Compare February 24, 2021 20:02

advaitjain reviewed Feb 24, 2021

View reviewed changes

Address review comments.

db6b6f9

advaitjain approved these changes Feb 24, 2021

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Feb 24, 2021

kokoro-team removed the kokoro:force-run Tests on submitted change label Feb 24, 2021

copybara-service bot merged commit 8214b55 into tensorflow:master Feb 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use xa_nnlib for conv for Fusion F1 #47378

Use xa_nnlib for conv for Fusion F1 #47378

njeffrie commented Feb 24, 2021 •

edited

google-ml-butler bot commented Feb 24, 2021

advaitjain Feb 24, 2021

njeffrie Feb 24, 2021 •

edited

advaitjain Feb 24, 2021

njeffrie Feb 24, 2021

advaitjain Feb 24, 2021

njeffrie Feb 24, 2021

advaitjain Feb 24, 2021

	TfLiteStatus EvalHifi4(const OpData* op_data, const TfLiteEvalTensor* input,
	TfLiteEvalTensor* output, TfLiteContext* context) {
	const RuntimeShape& input_shape = tflite::micro::GetTensorShape(input);
	const int8_t* input_data = tflite::micro::GetTensorData<int8_t>(input);
	const RuntimeShape& output_shape = tflite::micro::GetTensorShape(output);
	int16_t* output_data = tflite::micro::GetTensorData<int16_t>(output);
	const int trailing_dim = input_shape.DimensionsCount() - 1;
	const int outer_size =
	MatchingFlatSizeSkipDim(input_shape, trailing_dim, output_shape);
	const int depth =
	MatchingDim(input_shape, trailing_dim, output_shape, trailing_dim);

	void* p_scratch = static_cast<void*>(
	context->GetScratchBuffer(context, op_data->scratch_tensor_index));

	for (int i = 0; i < outer_size; ++i) {
	int err = xa_nn_vec_softmax_asym8s_16(
	&output_data[i * depth], &input_data[i * depth],
	op_data->params.diff_min, op_data->params.input_left_shift,
	op_data->params.input_multiplier, depth, p_scratch);
	TF_LITE_ENSURE(context, err == 0);
	}
	return kTfLiteOk;
	}

	#endif // defined(FUSION_F1)

Use xa_nnlib for conv for Fusion F1 #47378

Use xa_nnlib for conv for Fusion F1 #47378

Conversation

njeffrie commented Feb 24, 2021 • edited

google-ml-butler bot commented Feb 24, 2021

advaitjain Feb 24, 2021

Choose a reason for hiding this comment

njeffrie Feb 24, 2021 • edited

Choose a reason for hiding this comment

advaitjain Feb 24, 2021

Choose a reason for hiding this comment

njeffrie Feb 24, 2021

Choose a reason for hiding this comment

advaitjain Feb 24, 2021

Choose a reason for hiding this comment

njeffrie Feb 24, 2021

Choose a reason for hiding this comment

advaitjain Feb 24, 2021

Choose a reason for hiding this comment

njeffrie commented Feb 24, 2021 •

edited

njeffrie Feb 24, 2021 •

edited