Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use xa_nnlib for conv for Fusion F1 #47378

Merged
merged 4 commits into from Feb 25, 2021

Conversation

njeffrie
Copy link
Contributor

@njeffrie njeffrie commented Feb 24, 2021

The code in this change is the subset of functionality needed for int8 svdf for Hifi4 copied from pnikam-cad/tensorflow@a737c1e/tensorflow/lite/micro/kernels/xtensa_hifi/conv.cc

Note that the current change has not pulled in the floating point, uint8 implementation or the Hifi5 implementation.

Profiled the person_detection_benchmark with the following command:

make -f tensorflow/lite/micro/tools/make/Makefile TARGET=xtensa OPTIMIZED_KERNEL_DIR=xtensa TARGET_ARCH=fusion_f1 XTENSA_CORE=F1_190305_swupgrade run_person_detection_benchmark -j8
gives a latency of 73.761M ticks with this change vs 212.980M ticks without this change.

Per OP latency with this change:

KeywordRunNIerations(1) took 38516 ticks (38 ms)
DEPTHWISE_CONV_2D took 11961939 ticks (11961 ms).
DEPTHWISE_CONV_2D took 12296923 ticks (12296 ms).
CONV_2D took 987358 ticks (987 ms).
DEPTHWISE_CONV_2D took 6138259 ticks (6138 ms).
CONV_2D took 554614 ticks (554 ms).
DEPTHWISE_CONV_2D took 12063331 ticks (12063 ms).
CONV_2D took 665206 ticks (665 ms).
DEPTHWISE_CONV_2D took 3018615 ticks (3018 ms).
CONV_2D took 334246 ticks (334 ms).
DEPTHWISE_CONV_2D took 5837463 ticks (5837 ms).
CONV_2D took 444838 ticks (444 ms).
DEPTHWISE_CONV_2D took 1462009 ticks (1462 ms).
CONV_2D took 225286 ticks (225 ms).
DEPTHWISE_CONV_2D took 2734009 ticks (2734 ms).
CONV_2D took 335878 ticks (335 ms).
DEPTHWISE_CONV_2D took 2734009 ticks (2734 ms).
CONV_2D took 335878 ticks (335 ms).
DEPTHWISE_CONV_2D took 2734009 ticks (2734 ms).
CONV_2D took 335878 ticks (335 ms).
DEPTHWISE_CONV_2D took 2734009 ticks (2734 ms).
CONV_2D took 335878 ticks (335 ms).
DEPTHWISE_CONV_2D took 2734009 ticks (2734 ms).
CONV_2D took 335878 ticks (335 ms).
DEPTHWISE_CONV_2D took 685980 ticks (685 ms).
CONV_2D took 173254 ticks (173 ms).
DEPTHWISE_CONV_2D took 1197084 ticks (1197 ms).
CONV_2D took 283846 ticks (283 ms).
AVERAGE_POOL_2D took 75604 ticks (75 ms).
CONV_2D took 3382 ticks (3 ms).
RESHAPE took 290 ticks (0 ms).
SOFTMAX took 1933 ticks (1 ms).

Without this change:

WithPersonDataIterations(1) took 212980371 ticks (212980 ms)
DEPTHWISE_CONV_2D took 11961939 ticks (11961 ms).
DEPTHWISE_CONV_2D took 12296923 ticks (12296 ms).
CONV_2D took 13604549 ticks (13604 ms).
DEPTHWISE_CONV_2D took 6138259 ticks (6138 ms).
CONV_2D took 9585893 ticks (9585 ms).
DEPTHWISE_CONV_2D took 12063331 ticks (12063 ms).
CONV_2D took 15189221 ticks (15189 ms).
DEPTHWISE_CONV_2D took 3018615 ticks (3018 ms).
CONV_2D took 7590389 ticks (7590 ms).
DEPTHWISE_CONV_2D took 5837463 ticks (5837 ms).
CONV_2D took 13193717 ticks (13193 ms).
DEPTHWISE_CONV_2D took 1462009 ticks (1462 ms).
CONV_2D took 6596093 ticks (6596 ms).
DEPTHWISE_CONV_2D took 2734009 ticks (2734 ms).
CONV_2D took 12199421 ticks (12199 ms).
DEPTHWISE_CONV_2D took 2734009 ticks (2734 ms).
CONV_2D took 12199421 ticks (12199 ms).
DEPTHWISE_CONV_2D took 2734009 ticks (2734 ms).
CONV_2D took 12199421 ticks (12199 ms).
DEPTHWISE_CONV_2D took 2734009 ticks (2734 ms).
CONV_2D took 12199421 ticks (12199 ms).
DEPTHWISE_CONV_2D took 2734009 ticks (2734 ms).
CONV_2D took 12199421 ticks (12199 ms).
DEPTHWISE_CONV_2D took 685980 ticks (685 ms).
CONV_2D took 6099809 ticks (6099 ms).
DEPTHWISE_CONV_2D took 1197084 ticks (1197 ms).
CONV_2D took 11703137 ticks (11703 ms).
AVERAGE_POOL_2D took 75604 ticks (75 ms).
CONV_2D took 10983 ticks (10 ms).
RESHAPE took 290 ticks (0 ms).
SOFTMAX took 1933 ticks (1 ms).

Confirmed that the kernel_conv_test passes with:

make -f tensorflow/lite/micro/tools/make/Makefile TARGET=xtensa OPTIMIZED_KERNEL_DIR=xtensa TARGET_ARCH=fusion_f1 XTENSA_CORE=F1_190305_swupgrade test_kernel_conv_test -j8

Progress towards http://b/177457688

@google-ml-butler google-ml-butler bot added the size:L CL Change Size: Large label Feb 24, 2021
@google-ml-butler
Copy link

Thanks for contributing to TensorFlow Lite Micro.

To keep this process moving along, we'd like to make sure that you have completed the items on this list:

We would like to have a discussion on the Github issue first to determine the best path forward, and then proceed to the PR review.

TF_LITE_ENSURE_OK(
context, context->RequestScratchBufferInArena(
context, required_scratch, &data->scratch_tensor_index));
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add #endif // defined(FUSION_F1)

Copy link
Contributor Author

@njeffrie njeffrie Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +311 to +332
/* Dilation is currently not supported on HiFi 4 NN Library */
if ((params.dilation_width_factor == 1) &&
(params.dilation_height_factor == 1)) {
const int32_t input_offset = -data.reference_op_data.input_zero_point;
const int32_t output_offset = data.reference_op_data.output_zero_point;
const int stride_width = params.stride_width;
const int stride_height = params.stride_height;
const int pad_width = data.reference_op_data.padding.width;
const int pad_height = data.reference_op_data.padding.height;
const int32_t output_activation_min =
data.reference_op_data.output_activation_min;
const int32_t output_activation_max =
data.reference_op_data.output_activation_max;

const RuntimeShape& input_shape = tflite::micro::GetTensorShape(input);
const RuntimeShape& filter_shape = tflite::micro::GetTensorShape(filter);
const RuntimeShape& output_shape = tflite::micro::GetTensorShape(output);
const int batches = MatchingDim(input_shape, 0, output_shape, 0);
const int input_depth = MatchingDim(input_shape, 3, filter_shape, 3);
const int output_depth = MatchingDim(filter_shape, 0, output_shape, 3);
const int input_height = input_shape.Dims(1);
const int input_width = input_shape.Dims(2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move this code into an EvalHifi4 function, something like

TfLiteStatus EvalHifi4(const OpData* op_data, const TfLiteEvalTensor* input,
TfLiteEvalTensor* output, TfLiteContext* context) {
const RuntimeShape& input_shape = tflite::micro::GetTensorShape(input);
const int8_t* input_data = tflite::micro::GetTensorData<int8_t>(input);
const RuntimeShape& output_shape = tflite::micro::GetTensorShape(output);
int16_t* output_data = tflite::micro::GetTensorData<int16_t>(output);
const int trailing_dim = input_shape.DimensionsCount() - 1;
const int outer_size =
MatchingFlatSizeSkipDim(input_shape, trailing_dim, output_shape);
const int depth =
MatchingDim(input_shape, trailing_dim, output_shape, trailing_dim);
void* p_scratch = static_cast<void*>(
context->GetScratchBuffer(context, op_data->scratch_tensor_index));
for (int i = 0; i < outer_size; ++i) {
int err = xa_nn_vec_softmax_asym8s_16(
&output_data[i * depth], &input_data[i * depth],
op_data->params.diff_min, op_data->params.input_left_shift,
op_data->params.input_multiplier, depth, p_scratch);
TF_LITE_ENSURE(context, err == 0);
}
return kTfLiteOk;
}
#endif // defined(FUSION_F1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 34 to 38
struct OpData {
OpDataConv reference_op_data;

int scratch_tensor_index;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

want to use this struct only for FUSION_F1? the scratch_tensor_index isn't really meaningful for hifimini or a reference fallback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#ifdef'd the extra member for F1 only.

tflite::micro::GetTensorShape(output),
tflite::micro::GetTensorData<int8_t>(output));
#elif defined(FUSION_F1)
TfLiteStatus EvalHifi4(TfLiteContext* context, TfLiteNode* node,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would also have to be within #if defined(FUSION_F1) else the hifimini build would break?

@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Feb 24, 2021
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Feb 24, 2021
@copybara-service copybara-service bot merged commit 8214b55 into tensorflow:master Feb 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes ready to pull PR ready for merge process size:L CL Change Size: Large
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants