Missing TRANSPOSE Op Kernel #43472

victorromeo · 2020-09-23T04:53:32Z

System information

Linux Ubuntu 20.04:
TensorFlow installed from (source or binary): binary
TensorFlow version (or github SHA if from source): tf-nightly==2.4.0.dev20200917

I'm attempting to use a TFLite converted model, which was created and trained using TF2 + Keras. The converter successfully created the TFLite file, and I've loaded it into a micro-controller app, as a flatbuffer cpp + h file.

I'm unable to share the model at this time due to confidentiality, however the model contains Conv2D, BatchNormalization, ReLu, MaxPooling2D, Permute, Dropout, Flatten, Dense and Softmax.

After conversion, the model is loaded into an Arduino sketch, but upon loading the model, an error is reported.

8 bytes lost due to alignment. To avoid this loss, please make sure the tensor_arena is 16 bytes aligned.
Didn't find op for builtin opcode 'TRANSPOSE' version '2'

Failed to get registration from op code TRANSPOSE
 
Failed starting model allocation.

Given that this operation was chosen from the Builtin operation set, I believe this is a fault/bug. Can you please advise?

type registration

const tflite::Model* model = nullptr;
tflite::MicroInterpreter* interpreter = nullptr;
TfLiteTensor* input = nullptr;
TfLiteTensor* output = nullptr;
alignas(16) uint8_t tensor_arena[kTensorArenaSize]

AllOps missing TRANSPOSE

static tflite::AllOpsResolver resolver;  // NO TRANSPOSE kernel registration

MicroMutableOpResolver missing TRANSPOSE registration

tflite::MicroMutableOpResolver<6> resolver;
resolver.AddConv2D();    
resolver.AddDepthwiseConv2D();
resolver.AddFullyConnected();
resolver.AddReshape();
resolver.AddSoftmax();
resolver.AddBuiltin(tflite::BuiltinOperator_MAX_POOL_2D    
    ,tflite::ops::micro::Register_MAX_POOL_2D()); 
// resolver.AddBuiltin(tflite::BuiltinOperator_TRANSPOSE,
//     ,tflite::ops::micro::Register_TRANSPOSE); //BuiltinOperator_TRANSPOSE exists, but no Register_TRANSPOSE exists

Standalone code to reproduce the issue

 def model_to_tflite(self, features_path = None, tflite_path = None):
        '''Converts a Keras model into a TFLite model'''
        assert self.model is not None, 'TFLite conversion requires the model be loaded'
        assert self.x_data is not None and self.y_data is not None, 'Sample data must be loaded'

        if os.path.exists(tflite_path):
            logging.warning(f'TFLite file already exists: {tflite_path}')

        logging.info(f'Found {len(self.x_data)} features')

        # Construction of a representative dataset
        def representative_dataset():
            for i in range(len(self.x_data)):
                yield([self.x_data[i:i+1,:,:,:]])

        # Construction of a TFLite converter
        converter = tf.lite.TFLiteConverter.from_keras_model(self.model)
        converter.representative_dataset = representative_dataset
        
        converter.optimizations = [ tf.lite.Optimize.OPTIMIZE_FOR_LATENCY ]
        converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
        
        converter.inference_input_type = tf.int8
        converter.inference_output_type = tf.int8
        
        tflite_model = converter.convert()
        bytes_written = open(tflite_path, 'wb').write(tflite_model)

        return bytes_written

Any other info / logs

/home/ian/Documents/source/acdnet_pipeline/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py:2289: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
2020-09-23 14:34:59.654363: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
/home/ian/Documents/source/acdnet_pipeline/venv/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py:1376: UserWarning: `layer.updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`layer.updates` will be removed in a future version. '
2020-09-23 14:35:02.599832: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 14:35:02.600152: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2020-09-23 14:35:02.600313: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-09-23 14:35:02.600650: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2020-09-23 14:35:02.600775: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 14:35:02.601273: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1060 computeCapability: 6.1
coreClock: 1.6705GHz coreCount: 10 deviceMemorySize: 5.94GiB deviceMemoryBandwidth: 178.99GiB/s
2020-09-23 14:35:02.601398: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2020-09-23 14:35:02.601464: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2020-09-23 14:35:02.601493: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-23 14:35:02.601521: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-23 14:35:02.601531: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-23 14:35:02.601622: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2020-09-23 14:35:02.601715: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2020-09-23 14:35:02.601745: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-09-23 14:35:02.894277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-23 14:35:02.894345: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2020-09-23 14:35:02.894368: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2020-09-23 14:35:02.919186: I tensorflow/core/platform/profile_utils/cpu_utils.cc:108] CPU Frequency: 2599990000 Hz
2020-09-23 14:35:02.982735: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:872] Optimization results for grappler item: graph_to_optimize
  function_optimizer: function_optimizer did nothing. time = 5.462ms.
  function_optimizer: function_optimizer did nothing. time = 0.003ms.

2020-09-23 14:35:03.203503: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:315] Ignored output_format.
2020-09-23 14:35:03.203549: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:318] Ignored drop_control_dependency.
2020-09-23 14:35:03.469971: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2020-09-23 14:35:03.470367: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-09-23 14:35:03.471405: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1060 computeCapability: 6.1
coreClock: 1.6705GHz coreCount: 10 deviceMemorySize: 5.94GiB deviceMemoryBandwidth: 178.99GiB/s
2020-09-23 14:35:03.471730: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2020-09-23 14:35:03.471928: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2020-09-23 14:35:03.471981: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-23 14:35:03.472028: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-23 14:35:03.472067: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-23 14:35:03.472238: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2020-09-23 14:35:03.472414: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2020-09-23 14:35:03.472450: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-09-23 14:35:03.472494: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-23 14:35:03.472519: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2020-09-23 14:35:03.472540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N

The text was updated successfully, but these errors were encountered:

Saduf2019 · 2020-09-25T07:50:51Z

@victorromeo
I ran the code shared on 2.3 and do not face any errors,please find the gist here. Please share a colab gist with the erro reported.
With respect to the error,please refer to link

google-ml-butler · 2020-10-02T08:13:44Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

victorromeo · 2020-10-05T00:20:25Z

Bump, as I'm still authoring a TRANSPOSE operation for the micro interpreter.

Saduf2019 · 2020-10-05T08:40:25Z

@victorromeo

As informed above we do not face any errors in the code shared, please confirm and if issue exist share a colab gist with the error or move the issue to closed status if resolved.Thanks!

victorromeo · 2020-10-05T09:15:10Z

This is an issue of a missing C++ micro kernel operation, not a core kernel operation and as such is not appropriate for a python colab gist. As mentioned earlier, I’m working on this as a custom c++ operation. Does Colab support C++11 with Bezal compilation?

A Keras TF2 model, when converted to TFLite micro, includes an operation which is not yet supported called TRANSPOSE. This already exists as a TFLite operation (But when used on an ARM microcontroller, this is simply not available).

victorromeo · 2020-10-06T12:31:45Z

@jdduke May I please get some pointers if possible to confirm the approach for an appropriate Transpose operation? There are guides for TFLite custom operations, however not for missing micro operations or custom micro operations.

I'm planning on reusing the TransposeContext, from Transpose.cc but dropping the KernelType

struct TransposeContext {
    TransposeContext(TfLiteContext* context, TfLiteNode* node) {
        input = GetInput(context, node, 0);
        perm = GetInput(context, node, 1);
        output = GetOutput(context, node, 0);
    }
    const TfLiteTensor* input;
    const TfLiteTensor* perm;
    TfLiteTensor* output;
};

I'm thinking of basing the micro implementation off of ComputePermutation in tfl_ops.cc

// Computes the permutation of a constant `input_tensor` according to `perm`.
// The function recursively traverses the dimensions of the output tensor in
// a row-major order and writes the value in the output tensor into
// `new_values`.
void ComputePermutation(ElementsAttr input_tensor, ArrayRef<int32_t> perm,
                        ArrayRef<int64_t> output_shape, int num_dimensions,
                        int output_axis, std::vector<uint64_t> *input_indices,
                        std::vector<Attribute> *new_values) {
  // Refer to the implementation of `Transpose` function in
  // tensorflow/lite/kernels/internal/reference/reference_ops.h
  assert(output_axis < num_dimensions);
  const int input_axis = perm[output_axis];
  for (int i = 0; i < output_shape[output_axis]; ++i) {
    // Update the input indices on `input_axis`.
    input_indices->at(input_axis) = i;
    // Write the value from `input_tensor` if it is the last axis or
    // recurse into the next axis.
    const bool is_last_axis = output_axis == num_dimensions - 1;
    if (is_last_axis) {
      new_values->push_back(input_tensor.getValue(*input_indices));
    } else {
      ComputePermutation(input_tensor, perm, output_shape, num_dimensions,
                         output_axis + 1, input_indices, new_values);
    }
  }
}

Finally, I'm hoping that the current /lite/transpose_test.cc are appropriate for reuse, with generic implementations but specific tests for int8 quantization.

Here's my work in progress.
Forked micro_transpose_op branch

jdduke · 2020-10-06T16:19:35Z

@advaitjain or @petewarden can you advise on the recommended process for porting a Lite kernel to Micro?

advaitjain · 2020-10-06T18:30:25Z

@victorromeo, as you have found out, TFLM supports a subset of the TfLite ops.

We will be adding a guide for porting from Lite to as there are subtleties around not having dynamic memory allocation as well as other differences between Lite and Micro.

I would recommend to start by sharing more details about the actual model that you are trying to use, based on our contribution guidelines to help motivate the need for this additional Op in Micro. We can then give you some specific pointers on how to proceed.

One example of an Op (along with motivation for why it is worth adding to Micro at this time) that is currently being ported via community contributions is PR #43384 and #43381

victorromeo · 2020-10-07T02:46:44Z

Thanks @advaitjain

The model is currently being used to prepare an audio classifier, which fits onto a micro-controller. The Transpose operation has been shown to be a valuable kernel layer toward a greater goal of effectively switching from time domain to frequency domain and back. As the model is currently being developed for research papers, I have been advised I am unable to share the entire architecture. The architecture doesn't use FFT preprocessing. The origin of the issue is due to the Keras.Permute operation being converted into TFLite.Transpose. I appreciate it is hard to fully appreciate the benefit of adding the operation, without evidence, but this will be in the paper.

The model is planned to run on Cortex M4 devices such as the moderately sized Arduino Nano 33 BLE Sense.

Side note, it would be great to see a flag to check compatibility of builtin micro operations for use during TFLite conversion, as I had to circle a while to deduce that Transpose was the issue.

advaitjain · 2020-10-07T17:54:15Z

Thanks for the additional context, I understand that you're not able to share model architecture pre-publication.

Let's wait for around a week for me to put together a guide for porting ops from lite to micro and I'll link to it from this issue as well.

In the meantime, if you want to send a PR with flatbuffer changes for transpose (similar to #43384), that would be great.

Since the transpose op does not have any BuilinOptions, the parsing function will be something along the lines of

tensorflow/tensorflow/lite/core/api/flatbuffer_conversions.cc

Lines 1124 to 1130 in e0ed4b4

    
           // We have this parse function instead of directly returning kTfLiteOk from the 
        
           // switch-case in ParseOpData because this function is used as part of the 
        
           // selective registration for the OpResolver implementation in micro. 
        
           TfLiteStatus ParseGreater(const Operator*, ErrorReporter*, 
        
                                     BuiltinDataAllocator*, void**) { 
        
             return kTfLiteOk; 
        
           }

and

tensorflow/tensorflow/lite/core/api/flatbuffer_conversions.cc

Lines 212 to 214 in e0ed4b4

    
           case BuiltinOperator_GREATER: { 
        
             return ParseGreater(op, error_reporter, allocator, builtin_data); 
        
           }

victorromeo · 2020-10-08T23:44:01Z

Thanks for your additional feedback. I've further implemented all compatible unit tests for Transpose in my fork here and these are now compiling. Its worth noting my implemetnation is not optimal, but does achieve 1D-4D transpose support. Acheived using a port of existing TFLite core Transpose, using micro frameworks and static arrays. Onto final stages of testing now. Cheers.

mohaimenz · 2020-10-13T08:07:49Z

@advaitjain

Let's wait for around a week for me to put together a guide for porting ops from lite to micro and I'll link to it from this issue as well.
Any luck yet?
Could you please update if you have guide ready for porting a TF Lite kernel to TF Lite Micro?
It is badly needed for real world practical projects.

@victorromeo
Would you be happy to share you Transpose implementation for micro? It might help others or you can get some valuable support from others as well.
Thanks

victorromeo · 2020-11-18T06:50:01Z

The most stable implementation of Transpose is https://github.com/victorromeo/tensorflow/tree/v2.3.1_transpose

I'm going to rebase this off master, then create a pull request for consideration from the team.

dmpiergiacomo · 2021-03-30T16:01:51Z

I created a PR 48192 that should solve this issue.

Thank you @victorromeo for your contribution. I refactor some of your code cherry-picking it, in a way your name appears. Having your name associated with the PR, Google now asks for both of our signature in their Contributor License Agreement. It would be really nice if you could quickly sign it so the PR will be pushed forward :)

Thank you!

sanatmpa1 · 2021-08-10T06:36:35Z

@victorromeo,

We can see that the PR that you submitted has been merged. Can you kindly confirm if we can close this issue? Thanks!

google-ml-butler · 2021-08-17T06:52:48Z

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler · 2021-08-24T07:16:04Z

Closing as stale. Please reopen if you'd like to work on this further.

victorromeo added the comp:lite TF Lite related issues label Sep 23, 2020

google-ml-butler bot assigned Saduf2019 Sep 23, 2020

Saduf2019 added the stat:awaiting response Status - Awaiting response from author label Sep 25, 2020

google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Oct 2, 2020

google-ml-butler bot removed the stale This label marks the issue/pr stale - to be closed automatically if no activity label Oct 5, 2020

Saduf2019 added comp:runtime c++ runtime, performance issues (cpu) TF 2.3 Issues related to TF 2.3 type:bug Bug and removed stat:awaiting response Status - Awaiting response from author labels Oct 5, 2020

Saduf2019 assigned gowthamkpr and ymodak and unassigned Saduf2019 and gowthamkpr Oct 5, 2020

ymodak removed the comp:runtime c++ runtime, performance issues (cpu) label Oct 5, 2020

ymodak assigned jdduke and unassigned ymodak Oct 5, 2020

jdduke assigned petewarden and advaitjain and unassigned jdduke Oct 6, 2020

advaitjain added comp:micro Related to TensorFlow Lite Microcontrollers and removed TF 2.3 Issues related to TF 2.3 comp:lite TF Lite related issues labels Oct 6, 2020

advaitjain added type:feature Feature requests and removed type:bug Bug labels Oct 6, 2020

dmpiergiacomo mentioned this issue Mar 30, 2021

Micro transpose op ported and tested for TFLM #48192

Closed

dmpiergiacomo mentioned this issue Mar 30, 2021

Micro transpose op ported and tested for TFLM dmpiergiacomo/tensorflow#2

Merged

sanatmpa1 self-assigned this Aug 10, 2021

sanatmpa1 added the stat:awaiting response Status - Awaiting response from author label Aug 10, 2021

google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Aug 17, 2021

google-ml-butler bot closed this as completed Aug 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing TRANSPOSE Op Kernel #43472

Missing TRANSPOSE Op Kernel #43472

victorromeo commented Sep 23, 2020

Saduf2019 commented Sep 25, 2020

google-ml-butler bot commented Oct 2, 2020

victorromeo commented Oct 5, 2020

Saduf2019 commented Oct 5, 2020

victorromeo commented Oct 5, 2020

victorromeo commented Oct 6, 2020 •

edited

Loading

jdduke commented Oct 6, 2020

advaitjain commented Oct 6, 2020

victorromeo commented Oct 7, 2020

advaitjain commented Oct 7, 2020

victorromeo commented Oct 8, 2020

mohaimenz commented Oct 13, 2020 •

edited

Loading

victorromeo commented Nov 18, 2020

dmpiergiacomo commented Mar 30, 2021

sanatmpa1 commented Aug 10, 2021

google-ml-butler bot commented Aug 17, 2021

google-ml-butler bot commented Aug 24, 2021

Missing TRANSPOSE Op Kernel #43472

Missing TRANSPOSE Op Kernel #43472

Comments

victorromeo commented Sep 23, 2020

Saduf2019 commented Sep 25, 2020

google-ml-butler bot commented Oct 2, 2020

victorromeo commented Oct 5, 2020

Saduf2019 commented Oct 5, 2020

victorromeo commented Oct 5, 2020

victorromeo commented Oct 6, 2020 • edited Loading

jdduke commented Oct 6, 2020

advaitjain commented Oct 6, 2020

victorromeo commented Oct 7, 2020

advaitjain commented Oct 7, 2020

victorromeo commented Oct 8, 2020

mohaimenz commented Oct 13, 2020 • edited Loading

victorromeo commented Nov 18, 2020

dmpiergiacomo commented Mar 30, 2021

sanatmpa1 commented Aug 10, 2021

google-ml-butler bot commented Aug 17, 2021

google-ml-butler bot commented Aug 24, 2021

victorromeo commented Oct 6, 2020 •

edited

Loading

mohaimenz commented Oct 13, 2020 •

edited

Loading