# What you will learn?
* How to optimize your model.
* How to compress your model.
* How to run it in a pre-made Android app

### Sanity Check

In [None]:
%%bash
wget https://media1.britannica.com/eb-media/82/73182-004-B826BA69.jpg
mv 73182-004-B826BA69.jpg rose.jpg

### How to use the model we just built?
Read [GraphDef proto, a SaverDef proto, and a set of variable values and output a GraphDef](https://www.tensorflow.org/extend/tool_developers/) 
If you ever want to convert a [Graph Def to SavedModel](https://stackoverflow.com/questions/44329185/convert-a-graph-proto-pb-pbtxt-to-a-savedmodel-for-use-in-tensorflow-serving-o)

In [1]:
%%bash
python -m scripts.label_image \
  --graph=tf_files/retrained_graph.pb  \
  --image=rose.jpg


Evaluation time (1-image): 0.179s

roses 0.998492
tulips 0.00149764
sunflowers 9.80894e-06
daisy 8.05707e-08
dandelion 5.84384e-09


2017-12-14 19:48:11.679839: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-14 19:48:11.679971: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-12-14 19:48:11.680026: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.


### Optimize for inference
To avoid problems caused by unsupported training ops, the TensorFlow installation includes a tool, optimize_for_inference, that removes all nodes that aren't needed for a given set of input and outputs.

The script also does a few other optimizations that help speed up the model, such as merging explicit batch normalization operations into the convolutional weights to reduce the number of calculations. This can give a 30% speed up, depending on the input model.

In [1]:
%%bash
python -m tensorflow.python.tools.optimize_for_inference \
  --input=tf_files/retrained_graph.pb \
  --output=tf_files/optimized_graph.pb \
  --input_names="input" \
  --output_names="final_result"

In [3]:
%%bash
ls -ltrh tf_files/

total 11M
drwxr-xr-x 3 root root 4.0K Dec  6 17:58 flower_photos
-rw-r--r-- 1 root root   40 Dec  6 17:58 retrained_labels.txt
-rw-r--r-- 1 root root 5.3M Dec  6 17:58 retrained_graph.pb
-rw-r--r-- 1 root root 5.3M Dec  6 19:38 optimized_graph.pb


### Verify the optimized model
To check that optimize_for_inference hasn't altered the output of the network, compare the label_image output for retrained_graph.pb with that of optimized_graph.pb:

In [4]:
%%bash
python -m scripts.label_image \
    --graph=tf_files/optimized_graph.pb \
    --image=rose.jpg


Evaluation time (1-image): 0.158s

roses 0.998492
tulips 0.00149764
sunflowers 9.80894e-06
daisy 8.05707e-08
dandelion 5.84384e-09


2017-12-06 19:40:45.867020: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX


### Check the compression baseline
The retrained model is still 84MB in size at this point. That large download size may be a limiting factor for any app that includes it.

Every mobile app distribution system compresses the package before distribution. So test how much the graph can be compressed using the gzip command:

In [12]:
%%bash
gzip -c tf_files/optimized_graph.pb > tf_files/optimized_graph.pb.gz

gzip -l tf_files/optimized_graph.pb.gz

         compressed        uncompressed  ratio uncompressed_name
            5027947             5460013   7.9% tf_files/optimized_graph.pb


The majority of the space taken up by the graph is by the weights, which are large blocks of floating point numbers. Each weight has a slightly different floating point value, with very little regularity.

But compression works by exploiting regularity in the data, which explains the failure here.

### Quantize the network weights
These days, we actually have a lot of models being deployed in commercial applications. The computation demands of training grow with the number of researchers, but the cycles needed for inference expand in proportion to users. That means pure inference efficiency has become a burning issue for a lot of teams. That is where quantization comes in. It's an umbrella term that covers a lot of different techniques to store numbers and perform calculations on them in more compact formats than 32-bit floating point

Training neural networks is done by applying many tiny nudges to the weights, and these small increments typically need floating point precision to work. Taking a pre-trained model and running inference is very different. One of the magical qualities of deep networks is that they tend to cope very well with high levels of noise in their inputs. If you think about recognizing an object in a photo you've just taken, the network has to ignore all the CCD noise, lighting changes, and other non-essential differences between it and the training examples it's seen before, and focus on the important similarities instead. This ability means that they seem to treat low-precision calculations as just another source of noise, and still produce accurate results even with numerical formats that hold less information.

More Details are [HERE](https://www.tensorflow.org/performance/quantization)

It does this without any changes to the structure of the network, it simply quantizes the constants in place. [Training Quantization Research paper HERE](https://arxiv.org/abs/1609.07061)

In [6]:
%%bash
python -m scripts.quantize_graph \
  --input=tf_files/optimized_graph.pb \
  --output=tf_files/rounded_graph.pb \
  --output_node_names=final_result \
  --mode=weights_rounded

In [7]:
%%bash
gzip -c tf_files/rounded_graph.pb > tf_files/rounded_graph.pb.gz
gzip -l tf_files/rounded_graph.pb.gz

         compressed        uncompressed  ratio uncompressed_name
            1633004             5460032  70.1% tf_files/rounded_graph.pb


### Now before you continue, verify that the quantization process hasn't had too negative an effect on the model's performance.

In [9]:
%%bash
python -m scripts.label_image \
  --image=rose.jpg \
  --graph=tf_files/rounded_graph.pb


Evaluation time (1-image): 0.180s

roses 0.985175
tulips 0.0147941
daisy 2.88691e-05
sunflowers 1.78422e-06
dandelion 5.62923e-07


2017-12-06 19:46:30.783505: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX


In [10]:
%%bash
ls -ltrh tf_files/

total 18M
drwxr-xr-x 3 root root 4.0K Dec  6 17:58 flower_photos
-rw-r--r-- 1 root root   40 Dec  6 17:58 retrained_labels.txt
-rw-r--r-- 1 root root 5.3M Dec  6 17:58 retrained_graph.pb
-rw-r--r-- 1 root root 5.3M Dec  6 19:38 optimized_graph.pb
-rw-r--r-- 1 root root 5.3M Dec  6 19:44 rounded_graph.pb
-rw-r--r-- 1 root root 1.6M Dec  6 19:45 rounded_graph.pb.gz
