# Notebook for Training Model
This notebook contains all of the commands, environment setup, and code to begin training the model.

## 1. Generating TensorFlow Records for Data
Generating TF Records is an important step in generating the model as a whole.
This component allows TF to parse and train the model and for the distribution
of the model in a cross-platform format. See `src/notebooks/Generate TF Record` 
notebook for more detail.

In [1]:
from __future__ import division
from __future__ import print_function
from __future__ import absolute_import

import os
import io
import pandas as pd

from tensorflow.python.framework.versions import VERSION
if VERSION >= "2.0.0a0":
    import tensorflow.compat.v1 as tf
else:
    import tensorflow as tf

from PIL import Image
from object_detection.utils import dataset_util
from collections import namedtuple, OrderedDict

In [2]:
csv_input = "../data/train_labels.csv"
image_dir = "../data/images"
output_path = "../model/train.record"

In [3]:
def class_text_to_int(row_label):
    if row_label == 'military tank':
        return 1
    elif row_label == 'military aircraft':
        return 2
    elif row_label == 'military truck':
        return 3
    elif row_label == 'civilian aircraft':
        return 4
    elif row_label == 'civilian car':
        return 5
    elif row_label == 'military helicopter':
        return 6
    else:
        return None
    
def split(df, group):
    data = namedtuple('data', ['filename', 'object'])
    gb = df.groupby(group)

    return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]

In [4]:
def create_tf_record(group, path):
    with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
        encoded_jpg = fid.read()

    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)
    width, height = image.size

    filename = group.filename.encode('utf8')
    image_format = b'jpg'
    xmins = []
    xmaxs = []
    ymins = []
    ymaxs = []
    classes_text = []
    classes = []

    # Add all of the objects to the arrays.
    for index, row in group.object.iterrows():
        xmins.append(row['xmin'] / width)
        xmaxs.append(row['xmax'] / width)
        ymins.append(row['ymin'] / height)
        ymaxs.append(row['ymax'] / height)
        classes_text.append(row['class'].encode('utf8'))
        classes.append(class_text_to_int(row['class']))
        
    tf_record = tf.train.Example(features=tf.train.Features(feature = {
        'image/height': dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(filename),
        'image/source_id': dataset_util.bytes_feature(filename),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/format': dataset_util.bytes_feature(image_format),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes)
    }))

    return tf_record

In [5]:
writer = tf.python_io.TFRecordWriter(output_path)
path = os.path.join(os.getcwd(), image_dir)

print("Reading CSV label file...")
examples = pd.read_csv(csv_input)
grouped = split(examples, 'filename')

print("Beginning compilation...")

for group in grouped:
    tf_record = create_tf_record(group, path)
    writer.write(tf_record.SerializeToString())

writer.close()
output_path = os.path.join(os.getcwd(), output_path)
print(f"Successfully created the TFRecords: \n{output_path}")

Reading CSV label file...
Beginning compilation...
Successfully created the TFRecords: 
/Users/wpach/Dropbox/School/USC/Fall 2022/CSCE-585/Project/src/notebooks/../model/train.record


## 2. Downloading Model and Configuration

In [6]:
%cd ../model/content

/Users/wpach/Dropbox/School/USC/Fall 2022/CSCE-585/Project/src/model/content


First, the model itelf must be downloaded. This model is specifically used for object detection and classification, though outhers are also available.

In [7]:
!wget http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/mobilenet_v2.tar.gz
!tar -xvf mobilenet_v2.tar.gz
!rm mobilenet_v2.tar.gz

--2022-09-21 19:14:02--  http://download.tensorflow.org/models/object_detection/classification/tf2/20200710/mobilenet_v2.tar.gz
Resolving download.tensorflow.org (download.tensorflow.org)... 2607:f8b0:4000:810::2010, 74.125.136.128
Connecting to download.tensorflow.org (download.tensorflow.org)|2607:f8b0:4000:810::2010|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8404070 (8.0M) [application/x-tar]
Saving to: ‘mobilenet_v2.tar.gz’


2022-09-21 19:14:04 (23.5 MB/s) - ‘mobilenet_v2.tar.gz’ saved [8404070/8404070]

x mobilenet_v2/
x mobilenet_v2/mobilenet_v2.ckpt-1.index
x mobilenet_v2/checkpoint
x mobilenet_v2/mobilenet_v2.ckpt-1.data-00001-of-00002
x mobilenet_v2/mobilenet_v2.ckpt-1.data-00000-of-00002


Next, the configuration must be downloaded.

In [8]:
!wget https://raw.githubusercontent.com/tensorflow/models/master/research/object_detection/configs/tf2/ssd_mobilenet_v2_320x320_coco17_tpu-8.config
!mv ssd_mobilenet_v2_320x320_coco17_tpu-8.config mobilenet_v2.config

--2022-09-21 19:14:09--  https://raw.githubusercontent.com/tensorflow/models/master/research/object_detection/configs/tf2/ssd_mobilenet_v2_320x320_coco17_tpu-8.config
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8000::154, 2606:50c0:8002::154, 2606:50c0:8003::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8000::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4484 (4.4K) [text/plain]
Saving to: ‘ssd_mobilenet_v2_320x320_coco17_tpu-8.config’


2022-09-21 19:14:09 (19.4 MB/s) - ‘ssd_mobilenet_v2_320x320_coco17_tpu-8.config’ saved [4484/4484]



TensorFlow has provided a simple [script](https://blog.tensorflow.org/2021/01/custom-object-detection-in-browser.html) automatically adjusting the configuration, employed below. These values can be tweaked and adjusted for more accurate/longer training times, or for expirementation.

In [None]:
num_classes = 6
batch_size = 96
num_steps = 7500
num_eval_steps = 1000

train_record_path = "../train.record"
test_record_path = "../test.record"
model_dir = "../training"
labelmap_path = "../labelmap.pbtxt"
pipeline_config_path = "mobilenet_v2.config"
fine_tune_checkpoint = "mobilenet_v2/mobilenet_v2.ckpt-1"

import re

with open(pipeline_config_path) as f:
    config = f.read()

with open(pipeline_config_path, 'w') as f:
    