New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use xa_nnlib for conv for Fusion F1 #47378
Use xa_nnlib for conv for Fusion F1 #47378
Conversation
Thanks for contributing to TensorFlow Lite Micro. To keep this process moving along, we'd like to make sure that you have completed the items on this list:
We would like to have a discussion on the Github issue first to determine the best path forward, and then proceed to the PR review. |
1c61c97
to
eb9728c
Compare
TF_LITE_ENSURE_OK( | ||
context, context->RequestScratchBufferInArena( | ||
context, required_scratch, &data->scratch_tensor_index)); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add #endif // defined(FUSION_F1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/* Dilation is currently not supported on HiFi 4 NN Library */ | ||
if ((params.dilation_width_factor == 1) && | ||
(params.dilation_height_factor == 1)) { | ||
const int32_t input_offset = -data.reference_op_data.input_zero_point; | ||
const int32_t output_offset = data.reference_op_data.output_zero_point; | ||
const int stride_width = params.stride_width; | ||
const int stride_height = params.stride_height; | ||
const int pad_width = data.reference_op_data.padding.width; | ||
const int pad_height = data.reference_op_data.padding.height; | ||
const int32_t output_activation_min = | ||
data.reference_op_data.output_activation_min; | ||
const int32_t output_activation_max = | ||
data.reference_op_data.output_activation_max; | ||
|
||
const RuntimeShape& input_shape = tflite::micro::GetTensorShape(input); | ||
const RuntimeShape& filter_shape = tflite::micro::GetTensorShape(filter); | ||
const RuntimeShape& output_shape = tflite::micro::GetTensorShape(output); | ||
const int batches = MatchingDim(input_shape, 0, output_shape, 0); | ||
const int input_depth = MatchingDim(input_shape, 3, filter_shape, 3); | ||
const int output_depth = MatchingDim(filter_shape, 0, output_shape, 3); | ||
const int input_height = input_shape.Dims(1); | ||
const int input_width = input_shape.Dims(2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's move this code into an EvalHifi4 function, something like
tensorflow/tensorflow/lite/micro/kernels/xtensa/softmax.cc
Lines 212 to 237 in c023a82
TfLiteStatus EvalHifi4(const OpData* op_data, const TfLiteEvalTensor* input, | |
TfLiteEvalTensor* output, TfLiteContext* context) { | |
const RuntimeShape& input_shape = tflite::micro::GetTensorShape(input); | |
const int8_t* input_data = tflite::micro::GetTensorData<int8_t>(input); | |
const RuntimeShape& output_shape = tflite::micro::GetTensorShape(output); | |
int16_t* output_data = tflite::micro::GetTensorData<int16_t>(output); | |
const int trailing_dim = input_shape.DimensionsCount() - 1; | |
const int outer_size = | |
MatchingFlatSizeSkipDim(input_shape, trailing_dim, output_shape); | |
const int depth = | |
MatchingDim(input_shape, trailing_dim, output_shape, trailing_dim); | |
void* p_scratch = static_cast<void*>( | |
context->GetScratchBuffer(context, op_data->scratch_tensor_index)); | |
for (int i = 0; i < outer_size; ++i) { | |
int err = xa_nn_vec_softmax_asym8s_16( | |
&output_data[i * depth], &input_data[i * depth], | |
op_data->params.diff_min, op_data->params.input_left_shift, | |
op_data->params.input_multiplier, depth, p_scratch); | |
TF_LITE_ENSURE(context, err == 0); | |
} | |
return kTfLiteOk; | |
} | |
#endif // defined(FUSION_F1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
struct OpData { | ||
OpDataConv reference_op_data; | ||
|
||
int scratch_tensor_index; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
want to use this struct only for FUSION_F1? the scratch_tensor_index isn't really meaningful for hifimini or a reference fallback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#ifdef'd the extra member for F1 only.
tflite::micro::GetTensorShape(output), | ||
tflite::micro::GetTensorData<int8_t>(output)); | ||
#elif defined(FUSION_F1) | ||
TfLiteStatus EvalHifi4(TfLiteContext* context, TfLiteNode* node, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would also have to be within #if defined(FUSION_F1)
else the hifimini build would break?
The code in this change is the subset of functionality needed for int8 svdf for Hifi4 copied from pnikam-cad/tensorflow@a737c1e/tensorflow/lite/micro/kernels/xtensa_hifi/conv.cc
Note that the current change has not pulled in the floating point, uint8 implementation or the Hifi5 implementation.
Profiled the person_detection_benchmark with the following command:
make -f tensorflow/lite/micro/tools/make/Makefile TARGET=xtensa OPTIMIZED_KERNEL_DIR=xtensa TARGET_ARCH=fusion_f1 XTENSA_CORE=F1_190305_swupgrade run_person_detection_benchmark -j8
gives a latency of 73.761M ticks with this change vs 212.980M ticks without this change.
Per OP latency with this change:
Without this change:
Confirmed that the kernel_conv_test passes with:
Progress towards http://b/177457688