##### *Copyright 2021 Google LLC*
*Licensed under the Apache License, Version 2.0 (the "License")*

In [None]:
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Retrain EfficientDet-Lite2

In this tutorial, we'll retrain the EfficientDet-Lite2 object detection model (derived from [EfficientDet](https://ai.googleblog.com/2020/04/efficientdet-towards-scalable-and.html)) using the [TensorFlow Lite Model Maker library](https://www.tensorflow.org/lite/guide/model_maker), and then compile it to run on the [Coral Edge TPU](https://www.coral.ai/products/).

We'll retrain the model using your custom dataset in the TFRecord format.



##Compute resources

Run the following code to make sure you're using a GPU.

In [None]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Select the Runtime > "Change runtime type" menu to enable a GPU accelerator, ')
  print('and then re-execute this cell.')
else:
  print(gpu_info)

If you have Colab Pro you have the option to access high-memory VMs when they are available. To set your notebook preference to use a high-memory runtime, select the Runtime > 'Change runtime type' menu, and then select High-RAM in the Runtime shape dropdown.

You can see how much memory you have available at any time by running the following code.

In [None]:
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))

if ram_gb < 20:
  print('To enable a high-RAM runtime, select the Runtime > "Change runtime type"')
  print('menu, and then select High-RAM in the Runtime shape dropdown. Then, ')
  print('re-execute this cell.')
else:
  print('You are using a high-RAM runtime!')

## Import the required packages

In [None]:
!pip install -q tflite-model-maker

In [None]:
import numpy as np
import os

from tflite_model_maker.config import ExportFormat
from tflite_model_maker import model_spec
from tflite_model_maker import object_detector

import tensorflow as tf
assert tf.__version__.startswith('2')

tf.get_logger().setLevel('ERROR')
from absl import logging
logging.set_verbosity(logging.ERROR)

## Load the training data


Create directory to store datasets.

In [None]:
%mkdir ~/content
%mkdir ~/content/dataset

Label your images and convert your datasets to the TFRecord format as `rocket_train.tfrecord` and `rocket_validation.tfrecord`. Then, save `rocket_train.tfrecord` and `rocket_validation.tfrecord` to the dataset directory `~/content/dataset`.

Model Maker requires that we load our dataset using the [`DataLoader`](https://www.tensorflow.org/lite/api_docs/python/tflite_model_maker/object_detector/DataLoader) API.



In [None]:
! test ! -f ~/content/dataset/rocket_train.tfrecord && echo "~/content/dataset/rocket_train.tfrecord not found"
! test ! -f ~/content/dataset/rocket_validation.tfrecord && echo "~/content/dataset/rocket_validation.tfrecord not found"

Model Maker requires that we load our dataset using the DataLoader API, which supports the TFRecord format. Load the training and validation data from their locations.

In [None]:
label_map = {1: 'Rocket'}
train_size = 791
validation_size = 170

train_data = object_detector.DataLoader(tfrecord_file_patten=os.path.expanduser('~/content/dataset/rocket_train.tfrecord'), size=train_size, label_map=label_map)
validation_data = object_detector.DataLoader(tfrecord_file_patten=os.path.expanduser('~/content/dataset/rocket_validation.tfrecord'), size=validation_size, label_map=label_map)

## Launch Tensorboard

Enable Tensorboard.

In [None]:
%load_ext tensorboard

Create the directory to save the TFLite model checkpoint information.



In [None]:
%mkdir ~/content/checkpoints

TensorBoard is optional but provides very helpful visualizations of your training progress and accuracy evaluations.

Because TensorBoard runs as a webserver on your local machine—and we're actually running this on a Colab virtual environment—we'll use a tool called ngrok to make this server accessible with a public URL:

In [None]:
%cd ~/content
! wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
! unzip -o ngrok-stable-linux-amd64.zip
! ./ngrok authtoken 1uHLb8EGigeEdiOZhydEmGphJ4h_7VuPxKSZjfmb6Bth3g2Cx

In [None]:
# Starts tensorboard, so we can monitor the training process.
get_ipython().magic(
    'tensorboard --logdir {} --host 0.0.0.0 --port 6006'
    .format(os.path.expanduser('~/content/checkpoints'))
)
get_ipython().system_raw('./ngrok http 6006 &')
print('Click this link to view training progress in TensorBoard:')
import time
time.sleep(1)
! curl -s http://localhost:4040/api/tunnels | python3 -c "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"
print('Don\'t worry about the error page below, Tensorboard is running in the background')

## Create and train the model

Model Maker supports the EfficientDet-Lite family of object detection models that are compatible with the Edge TPU. (EfficientDet-Lite is derived from [EfficientDet](https://ai.googleblog.com/2020/04/efficientdet-towards-scalable-and.html), which offers state-of-the-art accuracy in a small model size). There are several model sizes you can choose from:

|| Model architecture | Size(MB)* | Latency(ms)** | Average Precision*** |
|-|--------------------|-----------|---------------|----------------------|
|| EfficientDet-Lite0 | 4.4       | 37            | 25.69%               |
|| EfficientDet-Lite1 | 5.8       | 49            | 30.55%               |
|| EfficientDet-Lite2 | 7.2       | 69            | 33.97%               |
|| EfficientDet-Lite3 | 11.4      | 116           | 37.70%               |
|| EfficientDet-Lite4 | 19.9      | 260           | 41.96%               |
| <td colspan=4><br><i>* File size of the integer quantized models. <br/>** Latency measured on Pixel 4 using 4 threads on CPU. <br/>*** Average Precision is the mAP (mean Average Precision) on the COCO 2017 validation dataset.</i></td> |

Beware that the bigger models (Lite3 and Lite4) do not fit onto the Edge TPU's onboard memory, so you'll see even greater latency when using those due to the cost of fetching data from the host system memory. Maybe this extra latency is okay for your application, but if it's not and you require the precision of the larger models, then you can [pipeline the model across multiple Edge TPUs](https://coral.ai/docs/edgetpu/pipeline/) (more about this when we compile the model below).

For this tutorial, we'll use Lite0:

In [None]:
spec = object_detector.EfficientDetLite2Spec(model_dir=os.path.expanduser('~/content/checkpoints'))

The [`EfficientDetLite2Spec`](https://www.tensorflow.org/lite/api_docs/python/tflite_model_maker/object_detector/EfficientDetLite2Spec) constructor also supports several arguments that specify training options, such as the max number of detections (default is 25 for the TF Lite model). You can also use the constructor to specify the number of training epochs and the batch size, but you can also specify those in the next steps.

Run the jollowing JS code in the browser console to stop Colab from disconnecting:

```
function ConnectButton(){
    console.log("Connect pushed"); 
    document.querySelector("#top-toolbar > colab-connect-button").shadowRoot.querySelector("#connect").click() 
}
setInterval(ConnectButton,60000);
```

Now we need to create our model according to the model spec, load our dataset into the model, specify training parameters, and begin training. 

Using Model Maker, we accomplished all of that with [`create()`](https://www.tensorflow.org/lite/api_docs/python/tflite_model_maker/object_detector/create):

In [None]:
epochs = 300
batch_size = 32

model = object_detector.create(train_data=train_data, 
                               model_spec=spec, 
                               validation_data=validation_data, 
                               epochs=epochs, 
                               batch_size=batch_size, 
                               train_whole_model=True)

## Choose the best model

Analyze the performance metrics at the Tensorboard and choose the best performing checkpoint that isn't overfitting.

In [None]:
best_checkpoint = 125
best_model = object_detector.create(train_data=train_data, 
                               model_spec=spec, 
                               validation_data=validation_data, 
                               epochs=epochs, 
                               batch_size=batch_size, 
                               train_whole_model=True,
                               do_train=False)
best_model.model.load_weights(os.path.expanduser('~/content/checkpoints/ckpt-' + best_checkpoint))

## Evaluate the best model

Now we'll use the remaining 25 images in our test dataset to evaluate how well the model performs with data it has never seen before.

The [`evaluate()`](https://www.tensorflow.org/lite/api_docs/python/tflite_model_maker/object_detector/ObjectDetector#evaluate) method provides output in the style of [COCO evaluation metrics](https://cocodataset.org/#detection-eval):

In [None]:
best_model.evaluate(data=validation_data, batch_size=batch_size)

Set `batch_size` to 32. Otherwise, there can be an out of memory error.

## Export to TensorFlow Lite

Next, we'll export the model to the TensorFlow Lite format. By default, the [`export()`](https://www.tensorflow.org/lite/api_docs/python/tflite_model_maker/object_detector/ObjectDetector#export) method performs [full integer post-training quantization](https://www.tensorflow.org/lite/performance/post_training_quantization#full_integer_quantization), which is exactly what we need for compatibility with the Edge TPU. (Model Maker uses the same dataset we gave to our model spec as a representative dataset, which is required for full-int quantization.)

We just need to specify the export directory and format. By default, it exports to TF Lite, but we also want a labels file, so we declare both:

In [None]:
%mkdir ~/content/models
best_model.export(export_dir=os.path.expanduser('~/content/models'),
                  tflite_filename='efficientdet-lite2-rocket-quant.tflite',
                  label_filename='rocket-labels.txt',
                  export_format=[ExportFormat.TFLITE, ExportFormat.LABEL])

### Evaluate the TF Lite model

Exporting the model to TensorFlow Lite can affect the model accuracy, due to the reduced numerical precision from quantization and because the original TensorFlow model uses per-class [non-max supression (NMS)](https://www.coursera.org/lecture/convolutional-neural-networks/non-max-suppression-dvrjH) for post-processing, while the TF Lite model uses global NMS, which is faster but less accurate.

Therefore you should always evaluate the exported TF Lite model and be sure it still meets your requirements:

In [None]:
model.evaluate_tflite(os.path.expanduser('~/content/models/efficientdet-lite2-rocket-quant.tflite'), validation_data)

### Test it on a new image

Just to be sure of things, let's run an inference with the TF Lite model ourselves. 

To simplify our code, we'll use the [PyCoral API](https://coral.ai/docs/reference/py/):

In [None]:
! python3 -m pip install --extra-index-url https://google-coral.github.io/py-repo/ pycoral

In [None]:
# Set the model files
MODEL_FILE = os.path.expanduser('~/content/models/efficientdet-lite2-rocket-quant.tflite')
LABELS_FILE = os.path.expanduser('~/content/models/rocket-labels.txt')
DETECTION_THRESHOLD = 0.2

from PIL import Image
from PIL import ImageDraw
from PIL import ImageFont

import tflite_runtime.interpreter as tflite 
from pycoral.adapters import common
from pycoral.adapters import detect
from pycoral.utils.dataset import read_label_file

def draw_objects(draw, objs, labels):
  """Draws the bounding box and label for each object."""
  COLORS = np.random.randint(0, 255, size=(len(labels), 3), dtype=np.uint8)
  for obj in objs:
    bbox = obj.bbox
    color = tuple(int(c) for c in COLORS[obj.id])
    draw.rectangle([(bbox.xmin, bbox.ymin), (bbox.xmax, bbox.ymax)],
                   outline=color, width=15)
    font = ImageFont.truetype("LiberationSans-Regular.ttf", size=90)
    draw.text((bbox.xmin + 20, bbox.ymin + 20),
              '%s\n%.2f' % (labels.get(obj.id, obj.id), obj.score),
              fill=color, font=font)

# Load the TF Lite model
labels = read_label_file(LABELS_FILE)
interpreter = tflite.Interpreter(MODEL_FILE)
interpreter.allocate_tensors()

#   # Resize the image
image = Image.open(os.path.expanduser('~/content/models/rocket0.jpg'))
_, scale = common.set_resized_input(
    interpreter, image.size, lambda size: image.resize(size, Image.ANTIALIAS))

# Run inference and draw boxes
interpreter.invoke()
objs = detect.get_objects(interpreter, DETECTION_THRESHOLD, scale)
draw_objects(ImageDraw.Draw(image), objs, labels)

# Show the results
width = 400
height_ratio = image.height / image.width
image.resize((width, int(width * height_ratio)))

## Compile for the Edge TPU

First we need to download the Edge TPU Compiler:

In [None]:
! curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

! echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list

! sudo apt-get update

! sudo apt-get install edgetpu-compiler

Before compiling the `.tflite` file for the Edge TPU, it's important to consider whether your model will fit into the Edge TPU memory. 

The Edge TPU has approximately 8 MB of SRAM for [caching model paramaters](https://coral.ai/docs/edgetpu/compiler/#parameter-data-caching), so any model close to or over 8 MB will not fit onto the Edge TPU memory. That means the inference times are longer, because some model parameters must be fetched from the host system memory.

One way to elimiate the extra latency is to use [model pipelining](https://coral.ai/docs/edgetpu/pipeline/), which splits the model into segments that can run on separate Edge TPUs in series. This can significantly reduce the latency for big models.

The following table provides recommendations for the number of Edge TPUs to use with each EfficientDet-Lite model.

| Model architecture | Minimum TPUs | Recommended TPUs
|--------------------|-------|-------|
| EfficientDet-Lite0 | 1     | 1     |
| EfficientDet-Lite1 | 1     | 1     |
| EfficientDet-Lite2 | 1     | 2     |
| EfficientDet-Lite3 | 2     | 2     |
| EfficientDet-Lite4 | 2     | 3     |

If you need extra Edge TPUs for your model, then update `NUMBER_OF_TPUS` here:

In [None]:
NUMBER_OF_TPUS =  1

!edgetpu_compiler efficientdet-lite2-rocket-quant_edgetpu.tflite --num_segments=$NUMBER_OF_TPUS

**Beware when using multiple segments:** The Edge TPU Comiler divides the model such that all segments have roughly equal amounts of parameter data, but that does not mean all segments have the same latency. Especially when dividing an SSD model such as EfficientDet, this results in a latency-imbalance between segments, because SSD models have a large post-processing op that actually executes on the CPU, not on the Edge TPU. So although segmenting your model this way is better than running the whole model on just one Edge TPU, we recommend that you segment the EfficientDet-Lite model using our [profiling-based partitioner tool](https://github.com/google-coral/libcoral/tree/master/coral/tools/partitioner#profiling-based-partitioner-for-the-edge-tpu-compiler), which measures each segment's latency on the Edge TPU and then iteratively adjusts the segmentation sizes to provide balanced latency between all segments.