If you don't find an answer to your question here, please look through our detailed documentation for the topic or file a GitHub issue.
The TensorFlow Lite converter supports the following formats:
- SavedModels: TFLiteConverter.from_saved_model
- Frozen GraphDefs generated by freeze_graph.py: TFLiteConverter.from_frozen_graph
- tf.keras HDF5 models: TFLiteConverter.from_keras_model_file
- tf.Session: TFLiteConverter.from_session
The recommended approach is to integrate the Python converter into your model pipeline in order to detect compatibility issues early on.
Since the number of TensorFlow Lite operations is smaller than TensorFlow's, some inference models may not be able to convert. For unimplemented operations, take a look at the question on missing operators. Unsupported operators include embeddings and LSTM/RNNs. For conversion issues not related to missing operations, search our GitHub issues or file a new one.
The easiest way to inspect a graph from a .pb
file is to use the
summarize_graph
tool.
If that approach yields an error, you can visualize the GraphDef with
TensorBoard and
look for the inputs and outputs in the graph. To visualize a .pb
file, use the
import_pb_to_tensorboard.py
script like below:
python import_pb_to_tensorboard.py --model_dir <model path> --log_dir <log dir path>
TensorFlow Lite models can be visualized using the visualize.py script in our repository.
- Clone the TensorFlow repository
- Run the
visualize.py
script with bazel:
bazel run //tensorflow/lite/tools:visualize model.tflite visualized_model.html
In order to keep TensorFlow Lite lightweight, only certain operations were used in the converter. The Compatibility Guide provides a list of operations currently supported by TensorFlow Lite.
If you don’t see a specific operation (or an equivalent) listed, it's likely that it has not been prioritized. The team tracks requests for new operations on GitHub issue #21526. Leave a comment if your request hasn’t already been mentioned.
In the meanwhile, you could try implementing a custom operator or using a different model that only contains supported operators. If binary size is not a constraint, try using TensorFlow Lite with select TensorFlow ops.
The best way to test the behavior of a TensorFlow Lite model is to use our API with test data and compare the outputs to TensorFlow for the same inputs. Take a look at our Python Interpreter example that generates random data to feed to the interpreter.
Post-training quantization can be used during conversion to TensorFlow Lite to reduce the size of the model. Post-training quantization quantizes weights to 8-bits of precision from floating-point and dequantizes them during runtime to perform floating point computations. However, note that this could have some accuracy implications.
If retraining the model is an option, consider Quantization-aware training. However, note that quantization-aware training is only available for a subset of convolutional neural network architectures.
For a deeper understanding of different optimization methods, look at Model optimization.
The high-level process to optimize TensorFlow Lite performance looks something like this:
- Make sure that you have the right model for the task. For image classification, check out our list of hosted models.
- Tweak the number of threads. Many TensorFlow Lite operators support
multi-threaded kernels. You can use
SetNumThreads()
in the C++ API to do this. However, increasing threads results in performance variability depending on the environment. - Use Hardware Accelerators. TensorFlow Lite supports model acceleration for
specific hardware using delegates. For example, to use Android’s Neural
Networks API, call
UseNNAPI
on the interpreter. Or take a look at our GPU delegate tutorial. - (Advanced) Profile Model. The Tensorflow Lite benchmarking tool has a built-in profiler that can show per-operator statistics. If you know how you can optimize an operator’s performance for your specific platform, you can implement a custom operator.
For a more in-depth discussion on how to optimize performance, take a look at Best Practices.