Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing TRANSPOSE Op Kernel #43472

Closed
victorromeo opened this issue Sep 23, 2020 · 17 comments · Fixed by dmpiergiacomo/tensorflow#2
Closed

Missing TRANSPOSE Op Kernel #43472

victorromeo opened this issue Sep 23, 2020 · 17 comments · Fixed by dmpiergiacomo/tensorflow#2
Assignees
Labels
comp:micro Related to TensorFlow Lite Microcontrollers stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author type:feature Feature requests

Comments

@victorromeo
Copy link

System information

  • Linux Ubuntu 20.04:
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (or github SHA if from source): tf-nightly==2.4.0.dev20200917

I'm attempting to use a TFLite converted model, which was created and trained using TF2 + Keras. The converter successfully created the TFLite file, and I've loaded it into a micro-controller app, as a flatbuffer cpp + h file.

I'm unable to share the model at this time due to confidentiality, however the model contains Conv2D, BatchNormalization, ReLu, MaxPooling2D, Permute, Dropout, Flatten, Dense and Softmax.

After conversion, the model is loaded into an Arduino sketch, but upon loading the model, an error is reported.

8 bytes lost due to alignment. To avoid this loss, please make sure the tensor_arena is 16 bytes aligned.
Didn't find op for builtin opcode 'TRANSPOSE' version '2'

Failed to get registration from op code TRANSPOSE
 
Failed starting model allocation.

Given that this operation was chosen from the Builtin operation set, I believe this is a fault/bug. Can you please advise?

type registration

const tflite::Model* model = nullptr;
tflite::MicroInterpreter* interpreter = nullptr;
TfLiteTensor* input = nullptr;
TfLiteTensor* output = nullptr;
alignas(16) uint8_t tensor_arena[kTensorArenaSize]

AllOps missing TRANSPOSE

static tflite::AllOpsResolver resolver;  // NO TRANSPOSE kernel registration

MicroMutableOpResolver missing TRANSPOSE registration

tflite::MicroMutableOpResolver<6> resolver;
resolver.AddConv2D();    
resolver.AddDepthwiseConv2D();
resolver.AddFullyConnected();
resolver.AddReshape();
resolver.AddSoftmax();
resolver.AddBuiltin(tflite::BuiltinOperator_MAX_POOL_2D    
    ,tflite::ops::micro::Register_MAX_POOL_2D()); 
// resolver.AddBuiltin(tflite::BuiltinOperator_TRANSPOSE,
//     ,tflite::ops::micro::Register_TRANSPOSE); //BuiltinOperator_TRANSPOSE exists, but no Register_TRANSPOSE exists

Standalone code to reproduce the issue

 def model_to_tflite(self, features_path = None, tflite_path = None):
        '''Converts a Keras model into a TFLite model'''
        assert self.model is not None, 'TFLite conversion requires the model be loaded'
        assert self.x_data is not None and self.y_data is not None, 'Sample data must be loaded'

        if os.path.exists(tflite_path):
            logging.warning(f'TFLite file already exists: {tflite_path}')

        logging.info(f'Found {len(self.x_data)} features')

        # Construction of a representative dataset
        def representative_dataset():
            for i in range(len(self.x_data)):
                yield([self.x_data[i:i+1,:,:,:]])

        # Construction of a TFLite converter
        converter = tf.lite.TFLiteConverter.from_keras_model(self.model)
        converter.representative_dataset = representative_dataset
        
        converter.optimizations = [ tf.lite.Optimize.OPTIMIZE_FOR_LATENCY ]
        converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
        
        converter.inference_input_type = tf.int8
        converter.inference_output_type = tf.int8
        
        tflite_model = converter.convert()
        bytes_written = open(tflite_path, 'wb').write(tflite_model)

        return bytes_written

Any other info / logs

/home/ian/Documents/source/acdnet_pipeline/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2289: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
2020-09-23 14:34:59.654363: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
/home/ian/Documents/source/acdnet_pipeline/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py:1376: UserWarning: `layer.updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`layer.updates` will be removed in a future version. '
2020-09-23 14:35:02.599832: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 14:35:02.600152: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2020-09-23 14:35:02.600313: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-09-23 14:35:02.600650: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2020-09-23 14:35:02.600775: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 14:35:02.601273: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1060 computeCapability: 6.1
coreClock: 1.6705GHz coreCount: 10 deviceMemorySize: 5.94GiB deviceMemoryBandwidth: 178.99GiB/s
2020-09-23 14:35:02.601398: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2020-09-23 14:35:02.601464: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2020-09-23 14:35:02.601493: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-23 14:35:02.601521: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-23 14:35:02.601531: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-23 14:35:02.601622: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2020-09-23 14:35:02.601715: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2020-09-23 14:35:02.601745: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-09-23 14:35:02.894277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-23 14:35:02.894345: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2020-09-23 14:35:02.894368: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2020-09-23 14:35:02.919186: I tensorflow/core/platform/profile_utils/cpu_utils.cc:108] CPU Frequency: 2599990000 Hz
2020-09-23 14:35:02.982735: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:872] Optimization results for grappler item: graph_to_optimize
  function_optimizer: function_optimizer did nothing. time = 5.462ms.
  function_optimizer: function_optimizer did nothing. time = 0.003ms.

2020-09-23 14:35:03.203503: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:315] Ignored output_format.
2020-09-23 14:35:03.203549: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:318] Ignored drop_control_dependency.
2020-09-23 14:35:03.469971: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2020-09-23 14:35:03.470367: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 14:35:03.471405: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1060 computeCapability: 6.1
coreClock: 1.6705GHz coreCount: 10 deviceMemorySize: 5.94GiB deviceMemoryBandwidth: 178.99GiB/s
2020-09-23 14:35:03.471730: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2020-09-23 14:35:03.471928: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2020-09-23 14:35:03.471981: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-23 14:35:03.472028: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-23 14:35:03.472067: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-23 14:35:03.472238: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2020-09-23 14:35:03.472414: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2020-09-23 14:35:03.472450: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-09-23 14:35:03.472494: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-23 14:35:03.472519: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2020-09-23 14:35:03.472540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 

@victorromeo victorromeo added the comp:lite TF Lite related issues label Sep 23, 2020
@Saduf2019
Copy link
Contributor

@victorromeo
I ran the code shared on 2.3 and do not face any errors,please find the gist here. Please share a colab gist with the erro reported.
With respect to the error,please refer to link

@Saduf2019 Saduf2019 added the stat:awaiting response Status - Awaiting response from author label Sep 25, 2020
@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

@google-ml-butler google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Oct 2, 2020
@victorromeo
Copy link
Author

Bump, as I'm still authoring a TRANSPOSE operation for the micro interpreter.

@google-ml-butler google-ml-butler bot removed the stale This label marks the issue/pr stale - to be closed automatically if no activity label Oct 5, 2020
@Saduf2019
Copy link
Contributor

@victorromeo

As informed above we do not face any errors in the code shared, please confirm and if issue exist share a colab gist with the error or move the issue to closed status if resolved.Thanks!

@victorromeo
Copy link
Author

This is an issue of a missing C++ micro kernel operation, not a core kernel operation and as such is not appropriate for a python colab gist. As mentioned earlier, I’m working on this as a custom c++ operation. Does Colab support C++11 with Bezal compilation?

A Keras TF2 model, when converted to TFLite micro, includes an operation which is not yet supported called TRANSPOSE. This already exists as a TFLite operation (But when used on an ARM microcontroller, this is simply not available).

@Saduf2019 Saduf2019 added comp:runtime c++ runtime, performance issues (cpu) TF 2.3 Issues related to TF 2.3 type:bug Bug and removed stat:awaiting response Status - Awaiting response from author labels Oct 5, 2020
@Saduf2019 Saduf2019 assigned gowthamkpr and ymodak and unassigned Saduf2019 and gowthamkpr Oct 5, 2020
@ymodak ymodak removed the comp:runtime c++ runtime, performance issues (cpu) label Oct 5, 2020
@ymodak ymodak assigned jdduke and unassigned ymodak Oct 5, 2020
@victorromeo
Copy link
Author

victorromeo commented Oct 6, 2020

@jdduke May I please get some pointers if possible to confirm the approach for an appropriate Transpose operation? There are guides for TFLite custom operations, however not for missing micro operations or custom micro operations.

I'm planning on reusing the TransposeContext, from Transpose.cc but dropping the KernelType

struct TransposeContext {
    TransposeContext(TfLiteContext* context, TfLiteNode* node) {
        input = GetInput(context, node, 0);
        perm = GetInput(context, node, 1);
        output = GetOutput(context, node, 0);
    }
    const TfLiteTensor* input;
    const TfLiteTensor* perm;
    TfLiteTensor* output;
};

I'm thinking of basing the micro implementation off of ComputePermutation in tfl_ops.cc

// Computes the permutation of a constant `input_tensor` according to `perm`.
// The function recursively traverses the dimensions of the output tensor in
// a row-major order and writes the value in the output tensor into
// `new_values`.
void ComputePermutation(ElementsAttr input_tensor, ArrayRef<int32_t> perm,
                        ArrayRef<int64_t> output_shape, int num_dimensions,
                        int output_axis, std::vector<uint64_t> *input_indices,
                        std::vector<Attribute> *new_values) {
  // Refer to the implementation of `Transpose` function in
  // tensorflow/lite/kernels/internal/reference/reference_ops.h
  assert(output_axis < num_dimensions);
  const int input_axis = perm[output_axis];
  for (int i = 0; i < output_shape[output_axis]; ++i) {
    // Update the input indices on `input_axis`.
    input_indices->at(input_axis) = i;
    // Write the value from `input_tensor` if it is the last axis or
    // recurse into the next axis.
    const bool is_last_axis = output_axis == num_dimensions - 1;
    if (is_last_axis) {
      new_values->push_back(input_tensor.getValue(*input_indices));
    } else {
      ComputePermutation(input_tensor, perm, output_shape, num_dimensions,
                         output_axis + 1, input_indices, new_values);
    }
  }
}

Finally, I'm hoping that the current /lite/transpose_test.cc are appropriate for reuse, with generic implementations but specific tests for int8 quantization.

Here's my work in progress.
Forked micro_transpose_op branch

@jdduke jdduke assigned petewarden and advaitjain and unassigned jdduke Oct 6, 2020
@jdduke
Copy link
Member

jdduke commented Oct 6, 2020

@advaitjain or @petewarden can you advise on the recommended process for porting a Lite kernel to Micro?

@advaitjain
Copy link
Member

@victorromeo, as you have found out, TFLM supports a subset of the TfLite ops.

We will be adding a guide for porting from Lite to as there are subtleties around not having dynamic memory allocation as well as other differences between Lite and Micro.

I would recommend to start by sharing more details about the actual model that you are trying to use, based on our contribution guidelines to help motivate the need for this additional Op in Micro. We can then give you some specific pointers on how to proceed.

One example of an Op (along with motivation for why it is worth adding to Micro at this time) that is currently being ported via community contributions is PR #43384 and #43381

@advaitjain advaitjain added comp:micro Related to TensorFlow Lite Microcontrollers and removed TF 2.3 Issues related to TF 2.3 comp:lite TF Lite related issues labels Oct 6, 2020
@advaitjain advaitjain added type:feature Feature requests and removed type:bug Bug labels Oct 6, 2020
@victorromeo
Copy link
Author

Thanks @advaitjain

The model is currently being used to prepare an audio classifier, which fits onto a micro-controller. The Transpose operation has been shown to be a valuable kernel layer toward a greater goal of effectively switching from time domain to frequency domain and back. As the model is currently being developed for research papers, I have been advised I am unable to share the entire architecture. The architecture doesn't use FFT preprocessing. The origin of the issue is due to the Keras.Permute operation being converted into TFLite.Transpose. I appreciate it is hard to fully appreciate the benefit of adding the operation, without evidence, but this will be in the paper.

The model is planned to run on Cortex M4 devices such as the moderately sized Arduino Nano 33 BLE Sense.

Side note, it would be great to see a flag to check compatibility of builtin micro operations for use during TFLite conversion, as I had to circle a while to deduce that Transpose was the issue.

@advaitjain
Copy link
Member

Thanks for the additional context, I understand that you're not able to share model architecture pre-publication.

Let's wait for around a week for me to put together a guide for porting ops from lite to micro and I'll link to it from this issue as well.

In the meantime, if you want to send a PR with flatbuffer changes for transpose (similar to #43384), that would be great.

Since the transpose op does not have any BuilinOptions, the parsing function will be something along the lines of

// We have this parse function instead of directly returning kTfLiteOk from the
// switch-case in ParseOpData because this function is used as part of the
// selective registration for the OpResolver implementation in micro.
TfLiteStatus ParseGreater(const Operator*, ErrorReporter*,
BuiltinDataAllocator*, void**) {
return kTfLiteOk;
}

and

case BuiltinOperator_GREATER: {
return ParseGreater(op, error_reporter, allocator, builtin_data);
}

@victorromeo
Copy link
Author

Thanks for your additional feedback. I've further implemented all compatible unit tests for Transpose in my fork here and these are now compiling. Its worth noting my implemetnation is not optimal, but does achieve 1D-4D transpose support. Acheived using a port of existing TFLite core Transpose, using micro frameworks and static arrays. Onto final stages of testing now. Cheers.

@mohaimenz
Copy link

mohaimenz commented Oct 13, 2020

@advaitjain

Let's wait for around a week for me to put together a guide for porting ops from lite to micro and I'll link to it from this issue as well.
Any luck yet?
Could you please update if you have guide ready for porting a TF Lite kernel to TF Lite Micro?
It is badly needed for real world practical projects.

@victorromeo
Would you be happy to share you Transpose implementation for micro? It might help others or you can get some valuable support from others as well.
Thanks

@victorromeo
Copy link
Author

The most stable implementation of Transpose is https://github.com/victorromeo/tensorflow/tree/v2.3.1_transpose

I'm going to rebase this off master, then create a pull request for consideration from the team.

@dmpiergiacomo
Copy link
Contributor

I created a PR 48192 that should solve this issue.

Thank you @victorromeo for your contribution. I refactor some of your code cherry-picking it, in a way your name appears. Having your name associated with the PR, Google now asks for both of our signature in their Contributor License Agreement. It would be really nice if you could quickly sign it so the PR will be pushed forward :)

Thank you!

@sanatmpa1
Copy link

@victorromeo,

We can see that the PR that you submitted has been merged. Can you kindly confirm if we can close this issue? Thanks!

@sanatmpa1 sanatmpa1 self-assigned this Aug 10, 2021
@sanatmpa1 sanatmpa1 added the stat:awaiting response Status - Awaiting response from author label Aug 10, 2021
@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

@google-ml-butler google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Aug 17, 2021
@google-ml-butler
Copy link

Closing as stale. Please reopen if you'd like to work on this further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:micro Related to TensorFlow Lite Microcontrollers stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author type:feature Feature requests
Projects
None yet
10 participants