diff --git a/community/en/docs/tfmobile/android_build.md b/community/en/docs/tfmobile/android_build.md new file mode 100644 index 00000000000..cc43dc19708 --- /dev/null +++ b/community/en/docs/tfmobile/android_build.md @@ -0,0 +1,196 @@ +# Building TensorFlow on Android + +Warning: TensorFlow Mobile is __deprecated__. + +
+

+ TensorFlow Lite is our main + mobile and embedded offering. We are + working hard to close the feature gap between TensorFlow Mobile and + TensorFlow Lite. We expect to deprecate TensorFlow Mobile in early 2019. We + will give ample notice to our users when we get to that point and will + provide help and support to ensure easy migrations. +

+

+ In the meantime, please use TensorFlow Lite. If you have a feature request, + such as a missing op, please post to our GitHub. +

+
+ +To get you started working with TensorFlow on Android, we'll walk through two +ways to build our TensorFlow mobile demos and deploying them on an Android +device. The first is Android Studio, which lets you build and deploy in an +IDE. The second is building with Bazel and deploying with ADB on the command +line. + +Why choose one or the other of these methods? + +The simplest way to use TensorFlow on Android is to use Android Studio. If you +aren't planning to customize your TensorFlow build at all, or if you want to use +Android Studio's editor and other features to build an app and just want to add +TensorFlow to it, we recommend using Android Studio. + +If you are using custom ops, or have some other reason to build TensorFlow from +scratch, scroll down and see our instructions +for building the demo with Bazel. + +## Build the demo using Android Studio + +**Prerequisites** + +If you haven't already, do the following two things: + +- Install [Android Studio](https://developer.android.com/studio/index.html), + following the instructions on their website. + +- Clone the TensorFlow repository from GitHub: + + git clone https://github.com/tensorflow/tensorflow + +**Building** + +1. Open Android Studio, and from the Welcome screen, select **Open an existing + Android Studio project**. + +2. From the **Open File or Project** window that appears, navigate to and select + the `tensorflow/examples/android` directory from wherever you cloned the + TensorFlow GitHub repo. Click OK. + + If it asks you to do a Gradle Sync, click OK. + + You may also need to install various platforms and tools, if you get + errors like "Failed to find target with hash string 'android-23' and similar. + +3. Open the `build.gradle` file (you can go to **1:Project** in the side panel + and find it under the **Gradle Scripts** zippy under **Android**). Look for + the `nativeBuildSystem` variable and set it to `none` if it isn't already: + + // set to 'bazel', 'cmake', 'makefile', 'none' + def nativeBuildSystem = 'none' + +4. Click the *Run* button (the green arrow) or select *Run > Run 'android'* from the + top menu. You may need to rebuild the project using *Build > Rebuild Project*. + + If it asks you to use Instant Run, click **Proceed Without Instant Run**. + + Also, you need to have an Android device plugged in with developer options + enabled at this + point. See [here](https://developer.android.com/studio/run/device.html) for + more details on setting up developer devices. + +This installs three apps on your phone that are all part of the TensorFlow +Demo. See [Android Sample Apps](#android_sample_apps) for more information about +them. + +## Adding TensorFlow to your apps using Android Studio + +To add TensorFlow to your own apps on Android, the simplest way is to add the +following lines to your Gradle build file: + + allprojects { + repositories { + jcenter() + } + } + + dependencies { + implementation 'org.tensorflow:tensorflow-android:+' + } + +This automatically downloads the latest stable version of TensorFlow as an AAR +and installs it in your project. + +## Build the demo using Bazel + +Another way to use TensorFlow on Android is to build an APK +using Bazel and load it onto your device +using [ADB](https://developer.android.com/studio/command-line/adb.html). This +requires some knowledge of build systems and Android developer tools, but we'll +guide you through the basics here. + +- First, follow our instructions for + installing from sources. + This will also guide you through installing Bazel and cloning the + TensorFlow code. + +- Download the Android [SDK](https://developer.android.com/studio/index.html) + and [NDK](https://developer.android.com/ndk/downloads/index.html) if you do + not already have them. You need at least version 12b of the NDK, and 23 of the + SDK. + +- In your copy of the TensorFlow source, update the + [WORKSPACE](https://github.com/tensorflow/tensorflow/blob/master/WORKSPACE) + file with the location of your SDK and NDK, where it says <PATH_TO_NDK> + and <PATH_TO_SDK>. + +- Run Bazel to build the demo APK: + + bazel build -c opt //tensorflow/examples/android:tensorflow_demo + +- Use [ADB](https://developer.android.com/studio/command-line/adb.html#move) to + install the APK onto your device: + + adb install -r bazel-bin/tensorflow/examples/android/tensorflow_demo.apk + +Note: In general when compiling for Android with Bazel you need +`--config=android` on the Bazel command line, though in this case this +particular example is Android-only, so you don't need it here. + +This installs three apps on your phone that are all part of the TensorFlow +Demo. See [Android Sample Apps](#android_sample_apps) for more information about +them. + +## Android Sample Apps + +The +[Android example code](https://www.tensorflow.org/code/tensorflow/examples/android/) is +a single project that builds and installs three sample apps which all use the +same underlying code. The sample apps all take video input from a phone's +camera: + +- **TF Classify** uses the Inception v3 model to label the objects it’s pointed + at with classes from Imagenet. There are only 1,000 categories in Imagenet, + which misses most everyday objects and includes many things you’re unlikely to + encounter often in real life, so the results can often be quite amusing. For + example there’s no ‘person’ category, so instead it will often guess things it + does know that are often associated with pictures of people, like a seat belt + or an oxygen mask. If you do want to customize this example to recognize + objects you care about, you can use + the + [TensorFlow for Poets codelab](https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/index.html#0) as + an example for how to train a model based on your own data. + +- **TF Detect** uses a multibox model to try to draw bounding boxes around the + locations of people in the camera. These boxes are annotated with the + confidence for each detection result. Results will not be perfect, as this + kind of object detection is still an active research topic. The demo also + includes optical tracking for when objects move between frames, which runs + more frequently than the TensorFlow inference. This improves the user + experience since the apparent frame rate is faster, but it also gives the + ability to estimate which boxes refer to the same object between frames, which + is important for counting objects over time. + +- **TF Stylize** implements a real-time style transfer algorithm on the camera + feed. You can select which styles to use and mix between them using the + palette at the bottom of the screen, and also switch out the resolution of the + processing to go higher or lower rez. + +When you build and install the demo, you'll see three app icons on your phone, +one for each of the demos. Tapping on them should open up the app and let you +explore what they do. You can enable profiling statistics on-screen by tapping +the volume up button while they’re running. + +### Android Inference Library + +Because Android apps need to be written in Java, and core TensorFlow is in C++, +TensorFlow has a JNI library to interface between the two. Its interface is aimed +only at inference, so it provides the ability to load a graph, set up inputs, +and run the model to calculate particular outputs. You can see the full +documentation for the minimal set of methods in +[TensorFlowInferenceInterface.java](https://www.tensorflow.org/code/tensorflow/contrib/android/java/org/tensorflow/contrib/android/TensorFlowInferenceInterface.java) + +The demos applications use this interface, so they’re a good place to look for +example usage. You can download prebuilt binary jars +at +[ci.tensorflow.org](https://ci.tensorflow.org/view/Nightly/job/nightly-android/). diff --git a/community/en/docs/tfmobile/index.md b/community/en/docs/tfmobile/index.md new file mode 100644 index 00000000000..a1d80bfe376 --- /dev/null +++ b/community/en/docs/tfmobile/index.md @@ -0,0 +1,299 @@ +# Overview + +Warning: TensorFlow Mobile is __deprecated__. + +
+

+ TensorFlow Lite is our main + mobile and embedded offering. We are + working hard to close the feature gap between TensorFlow Mobile and + TensorFlow Lite. We expect to deprecate TensorFlow Mobile in early 2019. We + will give ample notice to our users when we get to that point and will + provide help and support to ensure easy migrations. +

+

+ In the meantime, please use TensorFlow Lite. If you have a feature request, + such as a missing op, please post to our GitHub. +

+
+ +TensorFlow was designed to be a good deep learning solution for mobile +platforms. Currently we have two solutions for deploying machine learning +applications on mobile and embedded devices: TensorFlow for Mobile and +TensorFlow Lite. + +## TensorFlow Lite versus TensorFlow Mobile + +Here are a few of the differences between the two: + +- TensorFlow Lite is an evolution of TensorFlow Mobile. In most cases, apps + developed with TensorFlow Lite will have a smaller binary size, fewer + dependencies, and better performance. + +- TensorFlow Lite is in developer preview, so not all use cases are covered yet. + We expect you to use TensorFlow Mobile to cover production cases. + +- TensorFlow Lite supports only a limited set of operators, so not all models + will work on it by default. TensorFlow for Mobile has a fuller set of + supported functionality. + +TensorFlow Lite provides better performance and a small binary size on mobile +platforms as well as the ability to leverage hardware acceleration if available +on their platforms. In addition, it has many fewer dependencies so it can be +built and hosted on simpler, more constrained device scenarios. TensorFlow Lite +also allows targeting accelerators through the +[Neural Networks API](https://developer.android.com/ndk/guides/neuralnetworks/index.html). + +TensorFlow Lite currently has coverage for a limited set of operators. While +TensorFlow for Mobile supports only a constrained set of ops by default, in +principle if you use an arbitrary operator in TensorFlow, it can be customized +to build that kernel. Thus use cases which are not currently supported by +TensorFlow Lite should continue to use TensorFlow for Mobile. As TensorFlow Lite +evolves, it will gain additional operators, and the decision will be easier to +make. + + +## Introduction to TensorFlow Mobile + +TensorFlow was designed from the ground up to be a good deep learning solution +for mobile platforms like Android and iOS. This mobile guide should help you +understand how machine learning can work on mobile platforms and how to +integrate TensorFlow into your mobile apps effectively and efficiently. + +## About this Guide + +This guide is aimed at developers who have a TensorFlow model that’s +successfully working in a desktop environment, who want to integrate it into +a mobile application, and cannot use TensorFlow Lite. Here are the +main challenges you’ll face during that process: + +- Understanding how to use Tensorflow for mobile. +- Building TensorFlow for your platform. +- Integrating the TensorFlow library into your application. +- Preparing your model file for mobile deployment. +- Optimizing for latency, RAM usage, model file size, and binary size. + +## Common use cases for mobile machine learning + +**Why run TensorFlow on mobile?** + +Traditionally, deep learning has been associated with data centers and giant +clusters of high-powered GPU machines. However, it can be very expensive and +time-consuming to send all of the data a device has access to across a network +connection. Running on mobile makes it possible to deliver very interactive +applications in a way that’s not possible when you have to wait for a network +round trip. + +Here are some common use cases for on-device deep learning: + +### Speech Recognition + +There are a lot of interesting applications that can be built with a +speech-driven interface, and many of these require on-device processing. Most of +the time a user isn’t giving commands, and so streaming audio continuously to a +remote server would be a waste of bandwidth, since it would mostly be silence or +background noises. To solve this problem it’s common to have a small neural +network running on-device +[listening out for a particular keyword](https://www.tensorflow.org/tutorials/sequences/audio_recognition). +Once that keyword has been spotted, the rest of the +conversation can be transmitted over to the server for further processing if +more computing power is needed. + +### Image Recognition + +It can be very useful for a mobile app to be able to make sense of a camera +image. If your users are taking photos, recognizing what’s in them can help your +camera apps apply appropriate filters, or label the photos so they’re easily +findable. It’s important for embedded applications too, since you can use image +sensors to detect all sorts of interesting conditions, whether it’s spotting +endangered animals in the wild +or +[reporting how late your train is running](https://svds.com/tensorflow-image-recognition-raspberry-pi/). + +TensorFlow comes with several examples of recognizing the types of objects +inside images along with a variety of different pre-trained models, and they can +all be run on mobile devices. You can try out +our +[Tensorflow for Poets](https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/index.html#0) and +[Tensorflow for Poets 2: Optimize for Mobile](https://codelabs.developers.google.com/codelabs/tensorflow-for-poets-2/index.html#0) codelabs to +see how to take a pretrained model and run some very fast and lightweight +training to teach it to recognize specific objects, and then optimize it to +run on mobile. + +### Object Localization + +Sometimes it’s important to know where objects are in an image as well as what +they are. There are lots of augmented reality use cases that could benefit a +mobile app, such as guiding users to the right component when offering them +help fixing their wireless network or providing informative overlays on top of +landscape features. Embedded applications often need to count objects that are +passing by them, whether it’s pests in a field of crops, or people, cars and +bikes going past a street lamp. + +TensorFlow offers a pretrained model for drawing bounding boxes around people +detected in images, together with tracking code to follow them over time. The +tracking is especially important for applications where you’re trying to count +how many objects are present over time, since it gives you a good idea when a +new object enters or leaves the scene. We have some sample code for this +available for Android [on +GitHub](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android), +and also a [more general object detection +model](https://www.tensorflow.org/code/tensorflow_models/object_detection/README.md) +available as well. + +### Gesture Recognition + +It can be useful to be able to control applications with hand or other +gestures, either recognized from images or through analyzing accelerometer +sensor data. Creating those models is beyond the scope of this guide, but +TensorFlow is an effective way of deploying them. + +### Optical Character Recognition + +Google Translate’s live camera view is a great example of how effective +interactive on-device detection of text can be. + +
+ +
+ +There are multiple steps involved in recognizing text in images. You first have +to identify the areas where the text is present, which is a variation on the +object localization problem, and can be solved with similar techniques. Once you +have an area of text, you then need to interpret it as letters, and then use a +language model to help guess what words they represent. The simplest way to +estimate what letters are present is to segment the line of text into individual +letters, and then apply a simple neural network to the bounding box of each. You +can get good results with the kind of models used for MNIST, which you can find +in TensorFlow’s tutorials, though you may want a higher-resolution input. A +more advanced alternative is to use an LSTM model to process a whole line of +text at once, with the model itself handling the segmentation into different +characters. + +### Translation + +Translating from one language to another quickly and accurately, even if you +don’t have a network connection, is an important use case. Deep networks are +very effective at this sort of task, and you can find descriptions of a lot of +different models in the literature. Often these are sequence-to-sequence +recurrent models where you’re able to run a single graph to do the whole +translation, without needing to run separate parsing stages. + +### Text Classification + +If you want to suggest relevant prompts to users based on what they’re typing or +reading, it can be very useful to understand the meaning of the text. This is +where text classification comes in. Text classification is an umbrella term +that covers everything from sentiment analysis to topic discovery. You’re likely +to have your own categories or labels that you want to apply, so the best place +to start is with an example +like +[Skip-Thoughts](https://www.tensorflow.org/code/tensorflow_models/skip_thoughts/), +and then train on your own examples. + +### Voice Synthesis + +A synthesized voice can be a great way of giving users feedback or aiding +accessibility, and recent advances such as +[WaveNet](https://deepmind.com/blog/wavenet-generative-model-raw-audio/) show +that deep learning can offer very natural-sounding speech. + +## Mobile machine learning and the cloud + +These examples of use cases give an idea of how on-device networks can +complement cloud services. Cloud has a great deal of computing power in a +controlled environment, but running on devices can offer higher interactivity. +In situations where the cloud is unavailable, or your cloud capacity is limited, +you can provide an offline experience, or reduce cloud workload by processing +easy cases on device. + +Doing on-device computation can also signal when it's time to switch to working +on the cloud. A good example of this is hotword detection in speech. Since +devices are able to constantly listen out for the keywords, this then triggers a +lot of traffic to cloud-based speech recognition once one is recognized. Without +the on-device component, the whole application wouldn’t be feasible, and this +pattern exists across several other applications as well. Recognizing that some +sensor input is interesting enough for further processing makes a lot of +interesting products possible. + +## What hardware and software should you have? + +TensorFlow runs on Ubuntu Linux, Windows 10, and OS X. For a list of all +supported operating systems and instructions to install TensorFlow, see +Installing Tensorflow. + +Note that some of the sample code we provide for mobile TensorFlow requires you +to compile TensorFlow from source, so you’ll need more than just `pip install` +to work through all the sample code. + +To try out the mobile examples, you’ll need a device set up for development, +using +either [Android Studio](https://developer.android.com/studio/install.html), +or [XCode](https://developer.apple.com/xcode/) if you're developing for iOS. + +## What should you do before you get started? + +Before thinking about how to get your solution on mobile: + +1. Determine whether your problem is solvable by mobile machine learning +2. Create a labelled dataset to define your problem +3. Pick an effective model for the problem + +We'll discuss these in more detail below. + +### Is your problem solvable by mobile machine learning? + +Once you have an idea of the problem you want to solve, you need to make a plan +of how to build your solution. The most important first step is making sure that +your problem is actually solvable, and the best way to do that is to mock it up +using humans in the loop. + +For example, if you want to drive a robot toy car using voice commands, try +recording some audio from the device and listen back to it to see if you can +make sense of what’s being said. Often you’ll find there are problems in the +capture process, such as the motor drowning out speech or not being able to hear +at a distance, and you should tackle these problems before investing in the +modeling process. + +Another example would be giving photos taken from your app to people see if they +can classify what’s in them, in the way you’re looking for. If they can’t do +that (for example, trying to estimate calories in food from photos may be +impossible because all white soups look the same), then you’ll need to redesign +your experience to cope with that. A good rule of thumb is that if a human can’t +handle the task then it will be difficult to train a computer to do better. + +### Create a labelled dataset + +After you’ve solved any fundamental issues with your use case, you need to +create a labeled dataset to define what problem you’re trying to solve. This +step is extremely important, more than picking which model to use. You want it +to be as representative as possible of your actual use case, since the model +will only be effective at the task you teach it. It’s also worth investing in +tools to make labeling the data as efficient and accurate as possible. For +example, if you’re able to switch from having to click a button on a web +interface to simple keyboard shortcuts, you may be able to speed up the +generation process a lot. You should also start by doing the initial labeling +yourself, so you can learn about the difficulties and likely errors, and +possibly change your labeling or data capture process to avoid them. Once you +and your team are able to consistently label examples (that is once you +generally agree on the same labels for most examples), you can then try and +capture your knowledge in a manual and teach external raters how to run the same +process. + +### Pick an effective model + +The next step is to pick an effective model to use. You might be able to avoid +training a model from scratch if someone else has already implemented a model +similar to what you need; we have a repository of models implemented in +TensorFlow [on GitHub](https://github.com/tensorflow/models) that you can look +through. Lean towards the simplest model you can find, and try to get started as +soon as you have even a small amount of labelled data, since you’ll get the best +results when you’re able to iterate quickly. The shorter the time it takes to +try training a model and running it in its real application, the better overall +results you’ll see. It’s common for an algorithm to get great training accuracy +numbers but then fail to be useful within a real application because there’s a +mismatch between the dataset and real usage. Prototype end-to-end usage as soon +as possible to create a consistent user experience. diff --git a/community/en/docs/tfmobile/ios_build.md b/community/en/docs/tfmobile/ios_build.md new file mode 100644 index 00000000000..080ba660214 --- /dev/null +++ b/community/en/docs/tfmobile/ios_build.md @@ -0,0 +1,125 @@ +# Building TensorFlow on iOS + +Warning: TensorFlow Mobile is __deprecated__. + +
+

+ TensorFlow Lite is our main + mobile and embedded offering. We are + working hard to close the feature gap between TensorFlow Mobile and + TensorFlow Lite. We expect to deprecate TensorFlow Mobile in early 2019. We + will give ample notice to our users when we get to that point and will + provide help and support to ensure easy migrations. +

+

+ In the meantime, please use TensorFlow Lite. If you have a feature request, + such as a missing op, please post to our GitHub. +

+
+ +## Using CocoaPods + +The simplest way to get started with TensorFlow on iOS is using the CocoaPods +package management system. You can add the `TensorFlow-experimental` pod to your +Podfile, which installs a universal binary framework. This makes it easy to get +started but has the disadvantage of being hard to customize, which is important +in case you want to shrink your binary size. If you do need the ability to +customize your libraries, see later sections on how to do that. + +## Creating your own app + +If you'd like to add TensorFlow capabilities to your own app, do the following: + +- Create your own app or load your already-created app in XCode. + +- Add a file named Podfile at the project root directory with the following content: + + target 'YourProjectName' + pod 'TensorFlow-experimental' + +- Run `pod install` to download and install the `TensorFlow-experimental` pod. + +- Open `YourProjectName.xcworkspace` and add your code. + +- In your app's **Build Settings**, make sure to add `$(inherited)` to the + **Other Linker Flags**, and **Header Search Paths** sections. + +## Running the Samples + +You'll need Xcode 7.3 or later to run our iOS samples. + +There are currently three examples: simple, benchmark, and camera. For now, you +can download the sample code by cloning the main tensorflow repository (we are +planning to make the samples available as a separate repository later). + +From the root of the tensorflow folder, download [Inception +v1](https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip), +and extract the label and graph files into the data folders inside both the +simple and camera examples using these steps: + + mkdir -p ~/graphs + curl -o ~/graphs/inception5h.zip \ + https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip \ + && unzip ~/graphs/inception5h.zip -d ~/graphs/inception5h + cp ~/graphs/inception5h/* tensorflow/examples/ios/benchmark/data/ + cp ~/graphs/inception5h/* tensorflow/examples/ios/camera/data/ + cp ~/graphs/inception5h/* tensorflow/examples/ios/simple/data/ + +Change into one of the sample directories, download the +[Tensorflow-experimental](https://cocoapods.org/pods/TensorFlow-experimental) +pod, and open the Xcode workspace. Note that installing the pod can take a long +time since it is big (~450MB). If you want to run the simple example, then: + + cd tensorflow/examples/ios/simple + pod install + open tf_simple_example.xcworkspace # note .xcworkspace, not .xcodeproj + # this is created by pod install + +Run the simple app in the XCode simulator. You should see a single-screen app +with a **Run Model** button. Tap that, and you should see some debug output +appear below indicating that the example Grace Hopper image in directory data +has been analyzed, with a military uniform recognized. + +Run the other samples using the same process. The camera example requires a real +device connected. Once you build and run that, you should get a live camera view +that you can point at objects to get real-time recognition results. + +### iOS Example details + +There are three demo applications for iOS, all defined in Xcode projects inside +[tensorflow/examples/ios](https://www.tensorflow.org/code/tensorflow/examples/ios/). + +- **Simple**: This is a minimal example showing how to load and run a TensorFlow + model in as few lines as possible. It just consists of a single view with a + button that executes the model loading and inference when its pressed. + +- **Camera**: This is very similar to the Android TF Classify demo. It loads + Inception v3 and outputs its best label estimate for what’s in the live camera + view. As with the Android version, you can train your own custom model using + TensorFlow for Poets and drop it into this example with minimal code changes. + +- **Benchmark**: is quite close to Simple, but it runs the graph repeatedly and + outputs similar statistics to the benchmark tool on Android. + + +### Troubleshooting + +- Make sure you use the TensorFlow-experimental pod (and not TensorFlow). + +- The TensorFlow-experimental pod is current about ~450MB. The reason it is so + big is because we are bundling multiple platforms, and the pod includes all + TensorFlow functionality (e.g. operations). The final app size after build is + substantially smaller though (~25MB). Working with the complete pod is + convenient during development, but see below section on how you can build your + own custom TensorFlow library to reduce the size. + +## Building the TensorFlow iOS libraries from source + +While Cocoapods is the quickest and easiest way of getting started, you sometimes +need more flexibility to determine which parts of TensorFlow your app should be +shipped with. For such cases, you can build the iOS libraries from the +sources. [This +guide](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/ios#building-the-tensorflow-ios-libraries-from-source) +contains detailed instructions on how to do that. + diff --git a/community/en/docs/tfmobile/linking_libs.md b/community/en/docs/tfmobile/linking_libs.md new file mode 100644 index 00000000000..0f40984245f --- /dev/null +++ b/community/en/docs/tfmobile/linking_libs.md @@ -0,0 +1,271 @@ +# Integrating TensorFlow libraries + +Warning: TensorFlow Mobile is __deprecated__. + +
+

+ TensorFlow Lite is our main + mobile and embedded offering. We are + working hard to close the feature gap between TensorFlow Mobile and + TensorFlow Lite. We expect to deprecate TensorFlow Mobile in early 2019. We + will give ample notice to our users when we get to that point and will + provide help and support to ensure easy migrations. +

+

+ In the meantime, please use TensorFlow Lite. If you have a feature request, + such as a missing op, please post to our GitHub. +

+
+ +Once you have made some progress on a model that addresses the problem you’re +trying to solve, it’s important to test it out inside your application +immediately. There are often unexpected differences between your training data +and what users actually encounter in the real world, and getting a clear picture +of the gap as soon as possible improves the product experience. + +This page talks about how to integrate the TensorFlow libraries into your own +mobile applications, once you have already successfully built and deployed the +TensorFlow mobile demo apps. + +## Linking the library + +After you've managed to build the examples, you'll probably want to call +TensorFlow from one of your existing applications. The very easiest way to do +this is to use the Pod installation steps described in +Building TensorFlow on iOS, but if you want to build +TensorFlow from source (for example to customize which operators are included) +you'll need to break out TensorFlow as a framework, include the right header +files, and link against the built libraries and dependencies. + +### Android + +For Android, you just need to link in a Java library contained in a JAR file +called `libandroid_tensorflow_inference_java.jar`. There are three ways to +include this functionality in your program: + +1. Include the jcenter AAR which contains it, as in this + [example app](https://github.com/googlecodelabs/tensorflow-for-poets-2/blob/master/android/tfmobile/build.gradle#L59-L65) + +2. Download the nightly precompiled version from +[ci.tensorflow.org](http://ci.tensorflow.org/view/Nightly/job/nightly-android/lastSuccessfulBuild/artifact/out/). + +3. Build the JAR file yourself using the instructions [in our Android GitHub repo](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/android) + +### iOS + +Pulling in the TensorFlow libraries on iOS is a little more complicated. Here is +a checklist of what you’ll need to do to your iOS app: + +- Link against tensorflow/contrib/makefile/gen/lib/libtensorflow-core.a, usually + by adding `-L/your/path/tensorflow/contrib/makefile/gen/lib/` and + `-ltensorflow-core` to your linker flags. + +- Link against the generated protobuf libraries by adding + `-L/your/path/tensorflow/contrib/makefile/gen/protobuf_ios/lib` and + `-lprotobuf` and `-lprotobuf-lite` to your command line. + +- For the include paths, you need the root of your TensorFlow source folder as + the first entry, followed by + `tensorflow/contrib/makefile/downloads/protobuf/src`, + `tensorflow/contrib/makefile/downloads`, + `tensorflow/contrib/makefile/downloads/eigen`, and + `tensorflow/contrib/makefile/gen/proto`. + +- Make sure your binary is built with `-force_load` (or the equivalent on your + platform), aimed at the TensorFlow library to ensure that it’s linked + correctly. More detail on why this is necessary can be found in the next + section, [Global constructor magic](#global_constructor_magic). On Linux-like + platforms, you’ll need different flags, more like + `-Wl,--allow-multiple-definition -Wl,--whole-archive`. + +You’ll also need to link in the Accelerator framework, since this is used to +speed up some of the operations. + +## Global constructor magic + +One of the subtlest problems you may run up against is the “No session factory +registered for the given session options” error when trying to call TensorFlow +from your own application. To understand why this is happening and how to fix +it, you need to know a bit about the architecture of TensorFlow. + +The framework is designed to be very modular, with a thin core and a large +number of specific objects that are independent and can be mixed and matched as +needed. To enable this, the coding pattern in C++ had to let modules easily +notify the framework about the services they offer, without requiring a central +list that has to be updated separately from each implementation. It also had to +allow separate libraries to add their own implementations without needing a +recompile of the core. + +To achieve this capability, TensorFlow uses a registration pattern in a lot of +places. In the code, it looks like this: + +``` +class MulKernel : OpKernel { + Status Compute(OpKernelContext* context) { … } +}; +REGISTER_KERNEL(MulKernel, “Mul”); +``` + +This would be in a standalone `.cc` file linked into your application, either +as part of the main set of kernels or as a separate custom library. The magic +part is that the `REGISTER_KERNEL()` macro is able to inform the core of +TensorFlow that it has an implementation of the Mul operation, so that it can be +called in any graphs that require it. + +From a programming point of view, this setup is very convenient. The +implementation and registration code live in the same file, and adding new +implementations is as simple as compiling and linking it in. The difficult part +comes from the way that the `REGISTER_KERNEL()` macro is implemented. C++ +doesn’t offer a good mechanism for doing this sort of registration, so we have +to resort to some tricky code. Under the hood, the macro is implemented so that +it produces something like this: + +``` +class RegisterMul { + public: + RegisterMul() { + global_kernel_registry()->Register(“Mul”, [](){ + return new MulKernel() + }); + } +}; +RegisterMul g_register_mul; +``` + +This sets up a class `RegisterMul` with a constructor that tells the global +kernel registry what function to call when somebody asks it how to create a +“Mul” kernel. Then there’s a global object of that class, and so the constructor +should be called at the start of any program. + +While this may sound sensible, the unfortunate part is that the global object +that’s defined is not used by any other code, so linkers not designed with this +in mind will decide that it can be deleted. As a result, the constructor is +never called, and the class is never registered. All sorts of modules use this +pattern in TensorFlow, and it happens that `Session` implementations are the +first to be looked for when the code is run, which is why it shows up as the +characteristic error when this problem occurs. + +The solution is to force the linker to not strip any code from the library, even +if it believes it’s unused. On iOS, this step can be accomplished with the +`-force_load` flag, specifying a library path, and on Linux you need +`--whole-archive`. These persuade the linker to not be as aggressive about +stripping, and should retain the globals. + +The actual implementation of the various `REGISTER_*` macros is a bit more +complicated in practice, but they all suffer the same underlying problem. If +you’re interested in how they work, [op_kernel.h](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/op_kernel.h#L1091) +is a good place to start investigating. + +## Protobuf problems + +TensorFlow relies on +the [Protocol Buffer](https://developers.google.com/protocol-buffers/) library, +commonly known as protobuf. This library takes definitions of data structures +and produces serialization and access code for them in a variety of +languages. The tricky part is that this generated code needs to be linked +against shared libraries for the exact same version of the framework that was +used for the generator. This can be an issue when `protoc`, the tool used to +generate the code, is from a different version of protobuf than the libraries in +the standard linking and include paths. For example, you might be using a copy +of `protoc` that was built locally in `~/projects/protobuf-3.0.1.a`, but you have +libraries installed at `/usr/local/lib` and `/usr/local/include` that are from +3.0.0. + +The symptoms of this issue are errors during the compilation or linking phases +with protobufs. Usually, the build tools take care of this, but if you’re using +the makefile, make sure you’re building the protobuf library locally and using +it, as shown in [this Makefile](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/makefile/Makefile#L18). + +Another situation that can cause problems is when protobuf headers and source +files need to be generated as part of the build process. This process makes +building more complex, since the first phase has to be a pass over the protobuf +definitions to create all the needed code files, and only after that can you go +ahead and do a build of the library code. + +### Multiple versions of protobufs in the same app + +Protobufs generate headers that are needed as part of the C++ interface to the +overall TensorFlow library. This complicates using the library as a standalone +framework. + +If your application is already using version 1 of the protocol buffers library, +you may have trouble integrating TensorFlow because it requires version 2. If +you just try to link both versions into the same binary, you’ll see linking +errors because some of the symbols clash. To solve this particular problem, we +have an experimental script at [rename_protobuf.sh](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/makefile/rename_protobuf.sh). + +You need to run this as part of the makefile build, after you’ve downloaded all +the dependencies: + +``` +tensorflow/contrib/makefile/download_dependencies.sh +tensorflow/contrib/makefile/rename_protobuf.sh +``` + +## Calling the TensorFlow API + +Once you have the framework available, you then need to call into it. The usual +pattern is that you first load your model, which represents a preset set of +numeric computations, and then you run inputs through that model (for example, +images from a camera) and receive outputs (for example, predicted labels). + +On Android, we provide the Java Inference Library that is focused on just this +use case, while on iOS and Raspberry Pi you call directly into the C++ API. + +### Android + +Here’s what a typical Inference Library sequence looks like on Android: + +``` +// Load the model from disk. +TensorFlowInferenceInterface inferenceInterface = +new TensorFlowInferenceInterface(assetManager, modelFilename); + +// Copy the input data into TensorFlow. +inferenceInterface.feed(inputName, floatValues, 1, inputSize, inputSize, 3); + +// Run the inference call. +inferenceInterface.run(outputNames, logStats); + +// Copy the output Tensor back into the output array. +inferenceInterface.fetch(outputName, outputs); +``` + +You can find the source of this code in the [Android examples](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/android/src/org/tensorflow/demo/TensorFlowImageClassifier.java#L107). + +### iOS and Raspberry Pi + +Here’s the equivalent code for iOS and Raspberry Pi: + +``` +// Load the model. +PortableReadFileToProto(file_path, &tensorflow_graph); + +// Create a session from the model. +tensorflow::Status s = session->Create(tensorflow_graph); +if (!s.ok()) { + LOG(FATAL) << "Could not create TensorFlow Graph: " << s; +} + +// Run the model. +std::string input_layer = "input"; +std::string output_layer = "output"; +std::vector outputs; +tensorflow::Status run_status = session->Run({\{input_layer, image_tensor}}, + {output_layer}, {}, &outputs); +if (!run_status.ok()) { + LOG(FATAL) << "Running model failed: " << run_status; +} + +// Access the output data. +tensorflow::Tensor* output = &outputs[0]; +``` + +This is all based on the +[iOS sample code](https://www.tensorflow.org/code/tensorflow/examples/ios/simple/RunModelViewController.mm), +but there’s nothing iOS-specific; the same code should be usable on any platform +that supports C++. + +You can also find specific examples for Raspberry Pi +[here](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/pi_examples/label_image/label_image.cc). diff --git a/community/en/docs/tfmobile/optimizing.md b/community/en/docs/tfmobile/optimizing.md new file mode 100644 index 00000000000..1b3f3e60318 --- /dev/null +++ b/community/en/docs/tfmobile/optimizing.md @@ -0,0 +1,519 @@ +# Optimizing for mobile + +Warning: TensorFlow Mobile is __deprecated__. + +
+

+ TensorFlow Lite is our main + mobile and embedded offering. We are + working hard to close the feature gap between TensorFlow Mobile and + TensorFlow Lite. We expect to deprecate TensorFlow Mobile in early 2019. We + will give ample notice to our users when we get to that point and will + provide help and support to ensure easy migrations. +

+

+ In the meantime, please use TensorFlow Lite. If you have a feature request, + such as a missing op, please post to our GitHub. +

+
+ +There are some special issues that you have to deal with when you’re trying to +ship on mobile or embedded devices, and you’ll need to think about these as +you’re developing your model. + +These issues are: + +- Model and Binary Size +- App speed and model loading speed +- Performance and threading + +We'll discuss a few of these below. + +## What are the minimum device requirements for TensorFlow? + +You need at least one megabyte of program memory and several megabytes of RAM to +run the base TensorFlow runtime, so it’s not suitable for DSPs or +microcontrollers. Other than those, the biggest constraint is usually the +calculation speed of the device, and whether you can run the model you need for +your application with a low enough latency. You can use the benchmarking tools +in [How to Profile your Model](#how_to_profile_your_model) to get an idea of how +many FLOPs are required for a model, and then use that to make rule-of-thumb +estimates of how fast they will run on different devices. For example, a modern +smartphone might be able to run 10 GFLOPs per second, so the best you could hope +for from a 5 GFLOP model is two frames per second, though you may do worse +depending on what the exact computation patterns are. + +This model dependence means that it’s possible to run TensorFlow even on very +old or constrained phones, as long as you optimize your network to fit within +the latency budget and possibly within limited RAM too. For memory usage, you +mostly need to make sure that the intermediate buffers that TensorFlow creates +aren’t too large, which you can examine in the benchmark output too. + +## Speed + +One of the highest priorities of most model deployments is figuring out how to +run the inference fast enough to give a good user experience. The first place to +start is by looking at the total number of floating point operations that are +required to execute the graph. You can get a very rough estimate of this by +using the `benchmark_model` tool: + + bazel build -c opt tensorflow/tools/benchmark:benchmark_model && \ + bazel-bin/tensorflow/tools/benchmark/benchmark_model \ + --graph=/tmp/inception_graph.pb --input_layer="Mul:0" \ + --input_layer_shape="1,299,299,3" --input_layer_type="float" \ + --output_layer="softmax:0" --show_run_order=false --show_time=false \ + --show_memory=false --show_summary=true --show_flops=true --logtostderr + +This should show you an estimate of how many operations are needed to run the +graph. You can then use that information to figure out how feasible your model +is to run on the devices you’re targeting. For an example, a high-end phone from +2016 might be able to do 20 billion FLOPs per second, so the best speed you +could hope for from a model that requires 10 billion FLOPs is around 500ms. On a +device like the Raspberry Pi 3 that can do about 5 billion FLOPs, you may only +get one inference every two seconds. + +Having this estimate helps you plan for what you’ll be able to realistically +achieve on a device. If the model is using too many ops, then there are a lot of +opportunities to optimize the architecture to reduce that number. + +Advanced techniques include [SqueezeNet](https://arxiv.org/abs/1602.07360) +and [MobileNet](https://arxiv.org/abs/1704.04861), which are architectures +designed to produce models for mobile -- lean and fast but with a small accuracy +cost. You can also just look at alternative models, even older ones, which may +be smaller. For example, Inception v1 only has around 7 million parameters, +compared to Inception v3’s 24 million, and requires only 3 billion FLOPs rather +than 9 billion for v3. + +## Model Size + +Models that run on a device need to be stored somewhere on the device, and very +large neural networks can be hundreds of megabytes. Most users are reluctant to +download very large app bundles from app stores, so you want to make your model +as small as possible. Furthermore, smaller neural networks can persist in and +out of a mobile device's memory faster. + +To understand how large your network will be on disk, start by looking at the +size on disk of your `GraphDef` file after you’ve run `freeze_graph` and +`strip_unused_nodes` on it (see Preparing models for +more details on these tools), since then it should only contain +inference-related nodes. To double-check that your results are as expected, run +the `summarize_graph` tool to see how many parameters are in constants: + + bazel build tensorflow/tools/graph_transforms:summarize_graph && \ + bazel-bin/tensorflow/tools/graph_transforms/summarize_graph \ + --in_graph=/tmp/tensorflow_inception_graph.pb + +That command should give you output that looks something like this: + + No inputs spotted. + Found 1 possible outputs: (name=softmax, op=Softmax) + Found 23885411 (23.89M) const parameters, 0 (0) variable parameters, + and 99 control_edges + Op types used: 489 Const, 99 CheckNumerics, 99 Identity, 94 + BatchNormWithGlobalNormalization, 94 Conv2D, 94 Relu, 11 Concat, 9 AvgPool, + 5 MaxPool, 1 Sub, 1 Softmax, 1 ResizeBilinear, 1 Reshape, 1 Mul, 1 MatMul, + 1 ExpandDims, 1 DecodeJpeg, 1 Cast, 1 BiasAdd + +The important part for our current purposes is the number of const +parameters. In most models these will be stored as 32-bit floats to start, so if +you multiply the number of const parameters by four, you should get something +that’s close to the size of the file on disk. You can often get away with only +eight-bits per parameter with very little loss of accuracy in the final result, +so if your file size is too large you can try using +quantize_weights +to transform the parameters down. + + bazel build tensorflow/tools/graph_transforms:transform_graph && \ + bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ + --in_graph=/tmp/tensorflow_inception_optimized.pb \ + --out_graph=/tmp/tensorflow_inception_quantized.pb \ + --inputs='Mul:0' --outputs='softmax:0' --transforms='quantize_weights' + +If you look at the resulting file size, you should see that it’s about a quarter +of the original at 23MB. + +Another transform is `round_weights`, which doesn't make the file smaller, but it +makes the file compressible to about the same size as when `quantize_weights` is +used. This is particularly useful for mobile development, taking advantage of +the fact that app bundles are compressed before they’re downloaded by consumers. + +The original file does not compress well with standard algorithms, because the +bit patterns of even very similar numbers can be very different. The +`round_weights` transform keeps the weight parameters stored as floats, but +rounds them to a set number of step values. This means there are a lot more +repeated byte patterns in the stored model, and so compression can often bring +the size down dramatically, in many cases to near the size it would be if they +were stored as eight bit. + +Another advantage of `round_weights` is that the framework doesn’t have to +allocate a temporary buffer to unpack the parameters into, as we have to when +we just use `quantize_weights`. This saves a little bit of latency (though the +results should be cached so it’s only costly on the first run) and makes it +possible to use memory mapping, as described later. + +## Binary Size + +One of the biggest differences between mobile and server development is the +importance of binary size. On desktop machines it’s not unusual to have +executables that are hundreds of megabytes on disk, but for mobile and embedded +apps it’s vital to keep the binary as small as possible so that user downloads +are easy. As mentioned above, TensorFlow only includes a subset of op +implementations by default, but this still results in a 12 MB final +executable. To reduce this, you can set up the library to only include the +implementations of the ops that you actually need, based on automatically +analyzing your model. To use it: + +- Run `tools/print_required_ops/print_selective_registration_header.py` on your + model to produce a header file that only enables the ops it uses. + +- Place the `ops_to_register.h` file somewhere that the compiler can find + it. This can be in the root of your TensorFlow source folder. + +- Build TensorFlow with `SELECTIVE_REGISTRATION` defined, for example by passing + in `--copts=”-DSELECTIVE_REGISTRATION”` to your Bazel build command. + +This process recompiles the library so that only the needed ops and types are +included, which can dramatically reduce the executable size. For example, with +Inception v3, the new size is only 1.5MB. + +## How to Profile your Model + +Once you have an idea of what your device's peak performance range is, it’s +worth looking at its actual current performance. Using a standalone TensorFlow +benchmark, rather than running it inside a larger app, helps isolate just the +Tensorflow contribution to the +latency. The +[tensorflow/tools/benchmark](https://www.tensorflow.org/code/tensorflow/tools/benchmark/) tool +is designed to help you do this. To run it on Inception v3 on your desktop +machine, build this benchmark model: + + bazel build -c opt tensorflow/tools/benchmark:benchmark_model && \ + bazel-bin/tensorflow/tools/benchmark/benchmark_model \ + --graph=/tmp/tensorflow_inception_graph.pb --input_layer="Mul" \ + --input_layer_shape="1,299,299,3" --input_layer_type="float" \ + --output_layer="softmax:0" --show_run_order=false --show_time=false \ + --show_memory=false --show_summary=true --show_flops=true --logtostderr + +You should see output that looks something like this: + +
+============================== Top by Computation Time ==============================
+[node
+ type]  [start]  [first] [avg ms]     [%]  [cdf%]  [mem KB]  [Name]
+Conv2D   22.859   14.212   13.700  4.972%  4.972%  3871.488  conv_4/Conv2D
+Conv2D    8.116    8.964   11.315  4.106%  9.078%  5531.904  conv_2/Conv2D
+Conv2D   62.066   16.504    7.274  2.640% 11.717%   443.904  mixed_3/conv/Conv2D
+Conv2D    2.530    6.226    4.939  1.792% 13.510%  2765.952  conv_1/Conv2D
+Conv2D   55.585    4.605    4.665  1.693% 15.203%   313.600  mixed_2/tower/conv_1/Conv2D
+Conv2D  127.114    5.469    4.630  1.680% 16.883%    81.920  mixed_10/conv/Conv2D
+Conv2D   47.391    6.994    4.588  1.665% 18.548%   313.600  mixed_1/tower/conv_1/Conv2D
+Conv2D   39.463    7.878    4.336  1.574% 20.122%   313.600  mixed/tower/conv_1/Conv2D
+Conv2D  127.113    4.192    3.894  1.413% 21.535%   114.688  mixed_10/tower_1/conv/Conv2D
+Conv2D   70.188    5.205    3.626  1.316% 22.850%   221.952  mixed_4/conv/Conv2D
+
+============================== Summary by node type ==============================
+[Node type]  [count]  [avg ms]    [avg %]    [cdf %]  [mem KB]
+Conv2D            94   244.899    88.952%    88.952% 35869.953
+BiasAdd           95     9.664     3.510%    92.462% 35873.984
+AvgPool            9     7.990     2.902%    95.364%  7493.504
+Relu              94     5.727     2.080%    97.444% 35869.953
+MaxPool            5     3.485     1.266%    98.710%  3358.848
+Const            192     1.727     0.627%    99.337%     0.000
+Concat            11     1.081     0.393%    99.730%  9892.096
+MatMul             1     0.665     0.242%    99.971%     4.032
+Softmax            1     0.040     0.015%    99.986%     4.032
+<>                 1     0.032     0.012%    99.997%     0.000
+Reshape            1     0.007     0.003%   100.000%     0.000
+
+Timings (microseconds): count=50 first=330849 curr=274803 min=232354 max=415352 avg=275563 std=44193
+Memory (bytes): count=50 curr=128366400(all same)
+514 nodes defined 504 nodes observed
+
+ +This is the summary view, which is enabled by the show_summary flag. To +interpret it, the first table is a list of the nodes that took the most time, in +order by how long they took. From left to right, the columns are: + +- Node type, what kind of operation this was. + +- Start time of the op, showing where it falls in the sequence of operations. + +- First time in milliseconds. This is how long the operation took on the first + run of the benchmark, since by default 20 runs are executed to get more + reliable statistics. The first time is useful to spot which ops are doing + expensive calculations on the first run, and then caching the results. + +- Average time for the operation across all runs, in milliseconds. + +- What percentage of the total time for one run the op took. This is useful to + understand where the hotspots are. + +- The cumulative total time of this and the previous ops in the table. This is + handy for understanding what the distribution of work is across the layers, to + see if just a few of the nodes are taking up most of the time. + +- The amount of memory consumed by outputs of this type of op. + +- Name of the node. + +The second table is similar, but instead of breaking down the timings by +particular named nodes, it groups them by the kind of op. This is very useful to +understand which op implementations you might want to optimize or eliminate from +your graph. The table is arranged with the most costly operations at the start, +and only shows the top ten entries, with a placeholder for other nodes. The +columns from left to right are: + +- Type of the nodes being analyzed. + +- Accumulated average time taken by all nodes of this type, in milliseconds. + +- What percentage of the total time was taken by this type of operation. + +- Cumulative time taken by this and op types higher in the table, so you can + understand the distribution of the workload. + +- How much memory the outputs of this op type took up. + +Both of these tables are set up so that you can easily copy and paste their +results into spreadsheet documents, since they are output with tabs as +separators between the columns. The summary by node type can be the most useful +when looking for optimization opportunities, since it’s a pointer to the code +that’s taking the most time. In this case, you can see that the Conv2D ops are +almost 90% of the execution time. This is a sign that the graph is pretty +optimal, since convolutions and matrix multiplies are expected to be the bulk of +a neural network’s computing workload. + +As a rule of thumb, it’s more worrying if you see a lot of other operations +taking up more than a small fraction of the time. For neural networks, the ops +that don’t involve large matrix multiplications should usually be dwarfed by the +ones that do, so if you see a lot of time going into those it’s a sign that +either your network is non-optimally constructed, or the code implementing those +ops is not as optimized as it could +be. [Performance bugs](https://github.com/tensorflow/tensorflow/issues) or +patches are always welcome if you do encounter this situation, especially if +they include an attached model exhibiting this behavior and the command line +used to run the benchmark tool on it. + +The run above was on your desktop, but the tool also works on Android, which is +where it’s most useful for mobile development. Here’s an example command line to +run it on a 64-bit ARM device: + + bazel build -c opt --config=android_arm64 \ + tensorflow/tools/benchmark:benchmark_model + adb push bazel-bin/tensorflow/tools/benchmark/benchmark_model /data/local/tmp + adb push /tmp/tensorflow_inception_graph.pb /data/local/tmp/ + adb shell '/data/local/tmp/benchmark_model \ + --graph=/data/local/tmp/tensorflow_inception_graph.pb --input_layer="Mul" \ + --input_layer_shape="1,299,299,3" --input_layer_type="float" \ + --output_layer="softmax:0" --show_run_order=false --show_time=false \ + --show_memory=false --show_summary=true' + +You can interpret the results in exactly the same way as the desktop version +above. If you have any trouble figuring out what the right input and output +names and types are, take a look at the +Preparing models +page for details about detecting these for your model, and look at the +`summarize_graph` tool which may give you +helpful information. + +There isn’t good support for command line tools on iOS, so instead there’s a +separate example +at +[tensorflow/examples/ios/benchmark](https://www.tensorflow.org/code/tensorflow/examples/ios/benchmark) that +packages the same functionality inside a standalone app. This outputs the +statistics to both the screen of the device and the debug log. If you want +on-screen statistics for the Android example apps, you can turn them on by +pressing the volume-up button. + +## Profiling within your own app + +The output you see from the benchmark tool is generated from modules that are +included as part of the standard TensorFlow runtime, which means you have access +to them within your own applications too. You can see an example of how to do +that [here](https://www.tensorflow.org/code/tensorflow/examples/ios/benchmark/BenchmarkViewController.mm?l=139). + +The basic steps are: + +1. Create a StatSummarizer object: + + tensorflow::StatSummarizer stat_summarizer(tensorflow_graph); + +2. Set up the options: + + tensorflow::RunOptions run_options; + run_options.set_trace_level(tensorflow::RunOptions::FULL_TRACE); + tensorflow::RunMetadata run_metadata; + +3. Run the graph: + + run_status = session->Run(run_options, inputs, output_layer_names, {}, + output_layers, &run_metadata); + +4. Calculate the results and print them out: + + assert(run_metadata.has_step_stats()); + const tensorflow::StepStats& step_stats = run_metadata.step_stats(); + stat_summarizer->ProcessStepStats(step_stats); + stat_summarizer->PrintStepStats(); + +## Visualizing Models + +The most effective way to speed up your code is by altering your model so it +does less work. To do that, you need to understand what your model is doing, and +visualizing it is a good first step. To get a high-level overview of your graph, +use [TensorBoard](https://github.com/tensorflow/tensorboard). + +## Threading + +The desktop version of TensorFlow has a sophisticated threading model, and will +try to run multiple operations in parallel if it can. In our terminology this is +called “inter-op parallelism” (though to avoid confusion with “intra-op”, you +could think of it as “between-op” instead), and can be set by specifying +`inter_op_parallelism_threads` in the session options. + +By default, mobile devices run operations serially; that is, +`inter_op_parallelism_threads` is set to 1. Mobile processors usually have few +cores and a small cache, so running multiple operations accessing disjoint parts +of memory usually doesn’t help performance. “Intra-op parallelism” (or +“within-op”) can be very helpful though, especially for computation-bound +operations like convolutions where different threads can feed off the same small +set of memory. + +On mobile, how many threads an op will use is set to the number of cores by +default, or 2 when the number of cores can't be determined. You can override the +default number of threads that ops are using by setting +`intra_op_parallelism_threads` in the session options. It’s a good idea to +reduce the default if your app has its own threads doing heavy processing, so +that they don’t interfere with each other. + +To see more details on session options, look at [ConfigProto](https://www.tensorflow.org/code/tensorflow/core/protobuf/config.proto). + +## Retrain with mobile data + +The biggest cause of accuracy problems when running models on mobile apps is +unrepresentative training data. For example, most of the Imagenet photos are +well-framed so that the object is in the center of the picture, well-lit, and +shot with a normal lens. Photos from mobile devices are often poorly framed, +badly lit, and can have fisheye distortions, especially selfies. + +The solution is to expand your training set with data actually captured from +your application. This step can involve extra work, since you’ll have to label +the examples yourself, but even if you just use it to expand your original +training data, it can help the training set dramatically. Improving the training +set by doing this, and by fixing other quality issues like duplicates or badly +labeled examples is the single best way to improve accuracy. It’s usually a +bigger help than altering your model architecture or using different techniques. + +## Reducing model loading time and/or memory footprint + +Most operating systems allow you to load a file using memory mapping, rather +than going through the usual I/O APIs. Instead of allocating an area of memory +on the heap and then copying bytes from disk into it, you simply tell the +operating system to make the entire contents of a file appear directly in +memory. This has several advantages: + +* Speeds loading +* Reduces paging (increases performance) +* Does not count towards RAM budget for your app + +TensorFlow has support for memory mapping the weights that form the bulk of most +model files. Because of limitations in the `ProtoBuf` serialization format, we +have to make a few changes to our model loading and processing code. The +way memory mapping works is that we have a single file where the first part is a +normal `GraphDef` serialized into the protocol buffer wire format, but then the +weights are appended in a form that can be directly mapped. + +To create this file, run the +`tensorflow/contrib/util:convert_graphdef_memmapped_format` tool. This takes in +a `GraphDef` file that’s been run through `freeze_graph` and converts it to the +format that has the weights appended at the end. Since that file’s no longer a +standard `GraphDef` protobuf, you then need to make some changes to the loading +code. You can see an example of this in +the +[iOS Camera demo app](https://www.tensorflow.org/code/tensorflow/examples/ios/camera/tensorflow_utils.mm?l=147), +in the `LoadMemoryMappedModel()` function. + +The same code (with the Objective C calls for getting the filenames substituted) +can be used on other platforms too. Because we’re using memory mapping, we need +to start by creating a special TensorFlow environment object that’s set up with +the file we’ll be using: + + std::unique_ptr memmapped_env; + memmapped_env->reset( + new tensorflow::MemmappedEnv(tensorflow::Env::Default())); + tensorflow::Status mmap_status = + (memmapped_env->get())->InitializeFromFile(file_path); + +You then need to pass in this environment to subsequent calls, like this one for +loading the graph: + + tensorflow::GraphDef tensorflow_graph; + tensorflow::Status load_graph_status = ReadBinaryProto( + memmapped_env->get(), + tensorflow::MemmappedFileSystem::kMemmappedPackageDefaultGraphDef, + &tensorflow_graph); + +You also need to create the session with a pointer to the environment you’ve +created: + + tensorflow::SessionOptions options; + options.config.mutable_graph_options() + ->mutable_optimizer_options() + ->set_opt_level(::tensorflow::OptimizerOptions::L0); + options.env = memmapped_env->get(); + + tensorflow::Session* session_pointer = nullptr; + tensorflow::Status session_status = + tensorflow::NewSession(options, &session_pointer); + +One thing to notice here is that we’re also disabling automatic optimizations, +since in some cases these will fold constant sub-trees, and so create copies of +tensor values that we don’t want and use up more RAM. + +Once you’ve gone through these steps, you can use the session and graph as +normal, and you should see a reduction in loading time and memory usage. + +## Protecting model files from easy copying + +By default, your models will be stored in the standard serialized protobuf +format on disk. In theory this means that anybody can copy your model, which you +may not want. However, in practice, most models are so application-specific and +obfuscated by optimizations that the risk is similar to that of competitors +disassembling and reusing your code, but if you do want to make it tougher for +casual users to access your files it is possible to take some basic steps. + +Most of our examples use +the +[ReadBinaryProto()](https://www.tensorflow.org/code/tensorflow/core/platform/env.cc?q=core/platform/env.cc&l=409) convenience +call to load a `GraphDef` from disk. This does require an unencrypted protobuf on +disk. Luckily though, the implementation of the call is pretty straightforward +and it should be easy to write an equivalent that can decrypt in memory. Here's +some code that shows how you can read and decrypt a protobuf using your own +decryption routine: + + Status ReadEncryptedProto(Env* env, const string& fname, + ::tensorflow::protobuf::MessageLite* proto) { + string data; + TF_RETURN_IF_ERROR(ReadFileToString(env, fname, &data)); + + DecryptData(&data); // Your own function here. + + if (!proto->ParseFromString(&data)) { + TF_RETURN_IF_ERROR(stream->status()); + return errors::DataLoss("Can't parse ", fname, " as binary proto"); + } + return Status::OK(); + } + +To use this you’d need to define the DecryptData() function yourself. It could +be as simple as something like: + + void DecryptData(string* data) { + for (int i = 0; i < data.size(); ++i) { + data[i] = data[i] ^ 0x23; + } + } + +You may want something more complex, but exactly what you’ll need is outside the +current scope here. diff --git a/community/en/docs/tfmobile/prepare_models.md b/community/en/docs/tfmobile/prepare_models.md new file mode 100644 index 00000000000..cd82a148b53 --- /dev/null +++ b/community/en/docs/tfmobile/prepare_models.md @@ -0,0 +1,319 @@ +# Preparing models for mobile deployment + +Warning: TensorFlow Mobile is __deprecated__. + +
+

+ TensorFlow Lite is our main + mobile and embedded offering. We are + working hard to close the feature gap between TensorFlow Mobile and + TensorFlow Lite. We expect to deprecate TensorFlow Mobile in early 2019. We + will give ample notice to our users when we get to that point and will + provide help and support to ensure easy migrations. +

+

+ In the meantime, please use TensorFlow Lite. If you have a feature request, + such as a missing op, please post to our GitHub. +

+
+ +The requirements for storing model information during training are very +different from when you want to release it as part of a mobile app. This section +covers the tools involved in converting from a training model to something +releasable in production. + +## What is up with all the different saved file formats? + +You may find yourself getting very confused by all the different ways that +TensorFlow can save out graphs. To help, here’s a rundown of some of the +different components, and what they are used for. The objects are mostly defined +and serialized as protocol buffers: + +- [NodeDef](https://www.tensorflow.org/code/tensorflow/core/framework/node_def.proto): + Defines a single operation in a model. It has a unique name, a list of the + names of other nodes it pulls inputs from, the operation type it implements + (for example `Add`, or `Mul`), and any attributes that are needed to control + that operation. This is the basic unit of computation for TensorFlow, and all + work is done by iterating through a network of these nodes, applying each one + in turn. One particular operation type that’s worth knowing about is `Const`, + since this holds information about a constant. This may be a single, scalar + number or string, but it can also hold an entire multi-dimensional tensor + array. The values for a `Const` are stored inside the `NodeDef`, and so large + constants can take up a lot of room when serialized. + +- [Checkpoint](https://www.tensorflow.org/code/tensorflow/core/util/tensor_bundle/tensor_bundle.h). Another + way of storing values for a model is by using `Variable` ops. Unlike `Const` + ops, these don’t store their content as part of the `NodeDef`, so they take up + very little space within the `GraphDef` file. Instead their values are held in + RAM while a computation is running, and then saved out to disk as checkpoint + files periodically. This typically happens as a neural network is being + trained and weights are updated, so it’s a time-critical operation, and it may + happen in a distributed fashion across many workers, so the file format has to + be both fast and flexible. They are stored as multiple checkpoint files, + together with metadata files that describe what’s contained within the + checkpoints. When you’re referring to a checkpoint in the API (for example + when passing a filename in as a command line argument), you’ll use the common + prefix for a set of related files. If you had these files: + + /tmp/model/model-chkpt-1000.data-00000-of-00002 + /tmp/model/model-chkpt-1000.data-00001-of-00002 + /tmp/model/model-chkpt-1000.index + /tmp/model/model-chkpt-1000.meta + + You would refer to them as `/tmp/model/chkpt-1000`. + +- [GraphDef](https://www.tensorflow.org/code/tensorflow/core/framework/graph.proto): + Has a list of `NodeDefs`, which together define the computational graph to + execute. During training, some of these nodes will be `Variables`, and so if + you want to have a complete graph you can run, including the weights, you’ll + need to call a restore operation to pull those values from + checkpoints. Because checkpoint loading has to be flexible to deal with all of + the training requirements, this can be tricky to implement on mobile and + embedded devices, especially those with no proper file system available like + iOS. This is where + the + [`freeze_graph.py`](https://www.tensorflow.org/code/tensorflow/python/tools/freeze_graph.py) script + comes in handy. As mentioned above, `Const` ops store their values as part of + the `NodeDef`, so if all the `Variable` weights are converted to `Const` nodes, + then we only need a single `GraphDef` file to hold the model architecture and + the weights. Freezing the graph handles the process of loading the + checkpoints, and then converts all Variables to Consts. You can then load the + resulting file in a single call, without having to restore variable values + from checkpoints. One thing to watch out for with `GraphDef` files is that + sometimes they’re stored in text format for easy inspection. These versions + usually have a ‘.pbtxt’ filename suffix, whereas the binary files end with + ‘.pb’. + +- [FunctionDefLibrary](https://www.tensorflow.org/code/tensorflow/core/framework/function.proto): + This appears in `GraphDef`, and is effectively a set of sub-graphs, each with + information about their input and output nodes. Each sub-graph can then be + used as an op in the main graph, allowing easy instantiation of different + nodes, in a similar way to how functions encapsulate code in other languages. + +- [MetaGraphDef](https://www.tensorflow.org/code/tensorflow/core/protobuf/meta_graph.proto): + A plain `GraphDef` only has information about the network of computations, but + doesn’t have any extra information about the model or how it can be + used. `MetaGraphDef` contains a `GraphDef` defining the computation part of + the model, but also includes information like ‘signatures’, which are + suggestions about which inputs and outputs you may want to call the model + with, data on how and where any checkpoint files are saved, and convenience + tags for grouping ops together for ease of use. + +- [SavedModel](https://www.tensorflow.org/code/tensorflow/core/protobuf/saved_model.proto): + It’s common to want to have different versions of a graph that rely on a + common set of variable checkpoints. For example, you might need a GPU and a + CPU version of the same graph, but keep the same weights for both. You might + also need some extra files (like label names) as part of your + model. The + [SavedModel](https://www.tensorflow.org/code/tensorflow/python/saved_model/README.md) format + addresses these needs by letting you save multiple versions of the same graph + without duplicating variables, and also storing asset files in the same + bundle. Under the hood, it uses `MetaGraphDef` and checkpoint files, along + with extra metadata files. It’s the format that you’ll want to use if you’re + deploying a web API using TensorFlow Serving, for example. + +## How do you get a model you can use on mobile? + +In most situations, training a model with TensorFlow will give you a folder +containing a `GraphDef` file (usually ending with the `.pb` or `.pbtxt` extension) and +a set of checkpoint files. What you need for mobile or embedded deployment is a +single `GraphDef` file that’s been ‘frozen’, or had its variables converted into +inline constants so everything’s in one file. To handle the conversion, you’ll +need the `freeze_graph.py` script, that’s held in +[`tensorflow/python/tools/freeze_graph.py`](https://www.tensorflow.org/code/tensorflow/python/tools/freeze_graph.py). You’ll run it like this: + + bazel build tensorflow/python/tools:freeze_graph + bazel-bin/tensorflow/python/tools/freeze_graph \ + --input_graph=/tmp/model/my_graph.pb \ + --input_checkpoint=/tmp/model/model.ckpt-1000 \ + --output_graph=/tmp/frozen_graph.pb \ + --output_node_names=output_node \ + +The `input_graph` argument should point to the `GraphDef` file that holds your +model architecture. It’s possible that your `GraphDef` has been stored in a text +format on disk, in which case it’s likely to end in `.pbtxt` instead of `.pb`, +and you should add an extra `--input_binary=false` flag to the command. + +The `input_checkpoint` should be the most recent saved checkpoint. As mentioned +in the checkpoint section, you need to give the common prefix to the set of +checkpoints here, rather than a full filename. + +`output_graph` defines where the resulting frozen `GraphDef` will be +saved. Because it’s likely to contain a lot of weight values that take up a +large amount of space in text format, it’s always saved as a binary protobuf. + +`output_node_names` is a list of the names of the nodes that you want to extract +the results of your graph from. This is needed because the freezing process +needs to understand which parts of the graph are actually needed, and which are +artifacts of the training process, like summarization ops. Only ops that +contribute to calculating the given output nodes will be kept. If you know how +your graph is going to be used, these should just be the names of the nodes you +pass into `Session::Run()` as your fetch targets. The easiest way to find the +node names is to inspect the Node objects while building your graph in python. +Inspecting your graph in TensorBoard is another simple way. You can get some +suggestions on likely outputs by running the [`summarize_graph` tool](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms/README.md#inspecting-graphs). + +Because the output format for TensorFlow has changed over time, there are a +variety of other less commonly used flags available too, like `input_saver`, but +hopefully you shouldn’t need these on graphs trained with modern versions of the +framework. + +## Using the Graph Transform Tool + +A lot of the things you need to do to efficiently run a model on device are +available through the [Graph Transform +Tool](https://www.tensorflow.org/code/tensorflow/tools/graph_transforms/README.md). This +command-line tool takes an input `GraphDef` file, applies the set of rewriting +rules you request, and then writes out the result as a `GraphDef`. See the +documentation for more information on how to build and run this tool. + +### Removing training-only nodes + +TensorFlow `GraphDefs` produced by the training code contain all of the +computation that’s needed for back-propagation and updates of weights, as well +as the queuing and decoding of inputs, and the saving out of checkpoints. All of +these nodes are no longer needed during inference, and some of the operations +like checkpoint saving aren’t even supported on mobile platforms. To create a +model file that you can load on devices you need to delete those unneeded +operations by running the `strip_unused_nodes` rule in the Graph Transform Tool. + +The trickiest part of this process is figuring out the names of the nodes you +want to use as inputs and outputs during inference. You'll need these anyway +once you start to run inference, but you also need them here so that the +transform can calculate which nodes are not needed on the inference-only +path. These may not be obvious from the training code. The easiest way to +determine the node name is to explore the graph with TensorBoard. + +Remember that mobile applications typically gather their data from sensors and +have it as arrays in memory, whereas training typically involves loading and +decoding representations of the data stored on disk. In the case of Inception v3 +for example, there’s a `DecodeJpeg` op at the start of the graph that’s designed +to take JPEG-encoded data from a file retrieved from disk and turn it into an +arbitrary-sized image. After that there’s a `BilinearResize` op to scale it to +the expected size, followed by a couple of other ops that convert the byte data +into float and scale the value magnitudes it in the way the rest of the graph +expects. A typical mobile app will skip most of these steps because it’s getting +its input directly from a live camera, so the input node you will actually +supply will be the output of the `Mul` node in this case. + + + +You’ll need to do a similar process of inspection to figure out the correct +output nodes. + +If you’ve just been given a frozen `GraphDef` file, and are not sure about the +contents, try using the `summarize_graph` tool to print out information +about the inputs and outputs it finds from the graph structure. Here’s an +example with the original Inception v3 file: + + bazel run tensorflow/tools/graph_transforms:summarize_graph -- + --in_graph=tensorflow_inception_graph.pb + +Once you have an idea of what the input and output nodes are, you can feed them +into the graph transform tool as the `--input_names` and `--output_names` +arguments, and call the `strip_unused_nodes` transform, like this: + + bazel run tensorflow/tools/graph_transforms:transform_graph -- + --in_graph=tensorflow_inception_graph.pb + --out_graph=optimized_inception_graph.pb --inputs='Mul' --outputs='softmax' + --transforms=' + strip_unused_nodes(type=float, shape="1,299,299,3") + fold_constants(ignore_errors=true) + fold_batch_norms + fold_old_batch_norms' + +One thing to look out for here is that you need to specify the size and type +that you want your inputs to be. This is because any values that you’re going to +be passing in as inputs to inference need to be fed to special `Placeholder` op +nodes, and the transform may need to create them if they don’t already exist. In +the case of Inception v3 for example, a `Placeholder` node replaces the old +`Mul` node that used to output the resized and rescaled image array, since we’re +going to be doing that processing ourselves before we call TensorFlow. It keeps +the original name though, which is why we always feed in inputs to `Mul` when we +run a session with our modified Inception graph. + +After you’ve run this process, you’ll have a graph that only contains the actual +nodes you need to run your prediction process. This is the point where it +becomes useful to run metrics on the graph, so it’s worth running +`summarize_graph` again to understand what’s in your model. + +## What ops should you include on mobile? + +There are hundreds of operations available in TensorFlow, and each one has +multiple implementations for different data types. On mobile platforms, the size +of the executable binary that’s produced after compilation is important, because +app download bundles need to be as small as possible for the best user +experience. If all of the ops and data types are compiled into the TensorFlow +library then the total size of the compiled library can be tens of megabytes, so +by default only a subset of ops and data types are included. + +That means that if you load a model file that’s been trained on a desktop +machine, you may see the error “No OpKernel was registered to support Op” when +you load it on mobile. The first thing to try is to make sure you’ve stripped +out any training-only nodes, since the error will occur at load time even if the +op is never executed. If you’re still hitting the same problem once that’s done, +you’ll need to look at adding the op to your built library. + +The criteria for including ops and types fall into several categories: + +- Are they only useful in back-propagation, for gradients? Since mobile is + focused on inference, we don’t include these. + +- Are they useful mainly for other training needs, such as checkpoint saving? + These we leave out. + +- Do they rely on frameworks that aren’t always available on mobile, such as + libjpeg? To avoid extra dependencies we don’t include ops like `DecodeJpeg`. + +- Are there types that aren’t commonly used? We don’t include boolean variants + of ops for example, since we don’t see much use of them in typical inference + graphs. + +These ops are trimmed by default to optimize for inference on mobile, but it is +possible to alter some build files to change the default. After alternating the +build files, you will need to recompile TensorFlow. See below for more details +on how to do this, and also see optimizing binary size +for more on reducing your binary size. + +### Locate the implementation + +Operations are broken into two parts. The first is the op definition, which +declares the signature of the operation, which inputs, outputs, and attributes +it has. These take up very little space, and so all are included by default. The +implementations of the op computations are done in kernels, which live in the +`tensorflow/core/kernels` folder. You need to compile the C++ file containing +the kernel implementation of the op you need into the library. To figure out +which file that is, you can search for the operation name in the source +files. + +[Here’s an example search in github](https://github.com/search?utf8=%E2%9C%93&q=repo%3Atensorflow%2Ftensorflow+extension%3Acc+path%3Atensorflow%2Fcore%2Fkernels+REGISTER+Mul&type=Code&ref=searchresults). + +You’ll see that this search is looking for the `Mul` op implementation, and it +finds it in `tensorflow/core/kernels/cwise_op_mul_1.cc`. You need to look for +macros beginning with `REGISTER`, with the op name you care about as one of the +string arguments. + +In this case, the implementations are actually broken up across multiple `.cc` +files, so you’d need to include all of them in your build. If you’re more +comfortable using the command line for code search, here’s a grep command that +also locates the right files if you run it from the root of your TensorFlow +repository: + +`grep 'REGISTER.*"Mul"' tensorflow/core/kernels/*.cc` + +### Add the implementation to the build + +If you’re using Bazel, and building for Android, you’ll want to add the files +you’ve found to +the +[`android_extended_ops_group1`](https://www.tensorflow.org/code/tensorflow/core/kernels/BUILD#L3565) or +[`android_extended_ops_group2`](https://www.tensorflow.org/code/tensorflow/core/kernels/BUILD#L3632) targets. You +may also need to include any .cc files they depend on in there. If the build +complains about missing header files, add the .h’s that are needed into +the +[`android_extended_ops`](https://www.tensorflow.org/code/tensorflow/core/kernels/BUILD#L3525) target. + +If you’re using a makefile targeting iOS, Raspberry Pi, etc, go to +[`tensorflow/contrib/makefile/tf_op_files.txt`](https://www.tensorflow.org/code/tensorflow/contrib/makefile/tf_op_files.txt) and +add the right implementation files there.