# Train Model
Folder structure was inspired by this [Blog Post](https://neptune.ai/blog/how-to-train-your-own-object-detector-using-tensorflow-object-detection-api)

### Load Packages

In [34]:
#!pip uninstall -y tensorflow keras

Found existing installation: tensorflow 2.6.0
Uninstalling tensorflow-2.6.0:
  Successfully uninstalled tensorflow-2.6.0
Found existing installation: keras 2.6.0
Uninstalling keras-2.6.0:
  Successfully uninstalled keras-2.6.0


In [72]:
import os
import time

### Setup
- Go over to [Tensorflow Model Zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md) and select the model we want to fine-tune
- Select which model to train and copy the path of the url e.g. *efficientdet_d1_coco17_tpu-32.tar.gz*

In [15]:
WORKSPACE_PATH = "workspace"
DATA_PATH = os.path.join(WORKSPACE_PATH, "data")
PRE_TRAINED_MODELS_PATH = os.path.join(WORKSPACE_PATH, "pre_trained_models")

MODELS_PATH = os.path.join(WORKSPACE_PATH, "models")
BASE_MODEL_URL = "http://download.tensorflow.org/models/object_detection/tf2/20200711"
MODEL_NAME = "efficientdet_d1_coco17_tpu-32"

PIPELINE_CONFIG_PATH = os.path.join(MODELS_PATH, MODEL_NAME, "pipeline.config")

In [37]:
if not os.path.exists(WORKSPACE_PATH):
    os.mkdir(WORKSPACE_PATH)

if not os.path.exists(MODELS_PATH):
    os.mkdir(MODELS_PATH)
    
if not os.path.exists(DATA_PATH):
    os.mkdir(DATA_PATH)
    
if not os.path.exists(PRE_TRAINED_MODELS_PATH):
    os.mkdir(PRE_TRAINED_MODELS_PATH)
    
if not os.path.exists(os.path.join(MODELS_PATH, MODEL_NAME)):
    os.mkdir(os.path.join(MODELS_PATH, MODEL_NAME))

### Download Tensorflow Object Detection API
1. Download all files required for the Tensorflow Object Detection API
2. Download and move *builder.py* that is required for the API to work (fix for Schlaubox)
3. Compile Protos
4. Test whether TFOD API was successfully installed

In [38]:
!git clone https://github.com/tensorflow/models

fatal: destination path 'models' already exists and is not an empty directory.


In [2]:
%cd models/research/
!wget https://storage.googleapis.com/odml-dataset/others/setup.py
!pip install -q --user .
%cd ../..

/home/jovyan/top-down-object-detection/models/research
--2023-02-13 12:13:22--  https://storage.googleapis.com/odml-dataset/others/setup.py
Resolving storage.googleapis.com (storage.googleapis.com)... 216.58.212.176, 142.250.184.240, 142.250.185.240, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|216.58.212.176|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1401 (1.4K) [text/x-python]
Saving to: ‘setup.py.2’


2023-02-13 12:13:22 (23.9 MB/s) - ‘setup.py.2’ saved [1401/1401]

/home/jovyan/top-down-object-detection


In [7]:
!wget https://raw.githubusercontent.com/protocolbuffers/protobuf/main/python/google/protobuf/internal/builder.py
!mv builder.py /home/jovyan/.local/lib/python3.9/site-packages/google/protobuf/internal/

--2023-02-13 12:15:32--  https://raw.githubusercontent.com/protocolbuffers/protobuf/main/python/google/protobuf/internal/builder.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5188 (5.1K) [text/plain]
Saving to: ‘builder.py’


2023-02-13 12:15:32 (36.2 MB/s) - ‘builder.py’ saved [5188/5188]



In [9]:
# Compile protos.
!protoc models/research/object_detection/protos/*.proto --python_out=models/research

# Remove this file since it prevents the API to work in the Schlaubox
!rm models/research/opt/conda/lib/python3.9/site-packages/tensorflow/core/kernels/libtfkernel_sobol_op.so

# Test if the Object Dectection API is working correctly
!python3 models/research/object_detection/builders/model_builder_tf2_test.py

object_detection/protos/flexible_grid_anchor_generator.proto: File not found.
object_detection/protos/grid_anchor_generator.proto: File not found.
object_detection/protos/multiscale_anchor_generator.proto: File not found.
object_detection/protos/ssd_anchor_generator.proto: File not found.
models/research/object_detection/protos/anchor_generator.proto:5:1: Import "object_detection/protos/flexible_grid_anchor_generator.proto" was not found or had errors.
models/research/object_detection/protos/anchor_generator.proto:6:1: Import "object_detection/protos/grid_anchor_generator.proto" was not found or had errors.
models/research/object_detection/protos/anchor_generator.proto:7:1: Import "object_detection/protos/multiscale_anchor_generator.proto" was not found or had errors.
models/research/object_detection/protos/anchor_generator.proto:8:1: Import "object_detection/protos/ssd_anchor_generator.proto" was not found or had errors.
models/research/object_detection/protos/anchor_generator.proto:1

### Create TFRecords
1. Upload zip of folders with train, validation and test files (images and annotations)
2. Extract zip
3. Create TFRecords using ***generate_tf_record.py***
----
*Note: Have to create TFRecords here since creating these locally and afterwards uploading them seems to corrupt the files*

In [47]:
%%capture
!unzip {DATA_PATH}/splits.zip -d {DATA_PATH}

In [53]:
!python {WORKSPACE_PATH}'/scripts/generate_tfrecord.py' -x {DATA_PATH}'/train' -l {DATA_PATH}'/label_map.pbtxt' -o {DATA_PATH}'/train.tfrecord'
!python {WORKSPACE_PATH}'/scripts/generate_tfrecord.py' -x {DATA_PATH}'/val' -l {DATA_PATH}'/label_map.pbtxt' -o {DATA_PATH}'/val.tfrecord'
!python {WORKSPACE_PATH}'/scripts/generate_tfrecord.py' -x {DATA_PATH}'/test' -l {DATA_PATH}'/label_map.pbtxt' -o {DATA_PATH}'/test.tfrecord'

Successfully created the TFRecord file: workspace/data/val.tfrecord
Successfully created the TFRecord file: workspace/data/test.tfrecord


In [57]:
!ls {DATA_PATH} -l --block-size=M

total 2782M
-rw-r--r-- 1 jovyan users    1M Feb 12 23:01 label_map.pbtxt
-rw-r--r-- 1 jovyan users 1234M Feb 13 13:02 splits.zip
drwxr-xr-x 2 jovyan users    1M Feb 13 11:05 test
-rw-r--r-- 1 jovyan users  163M Feb 13 13:14 test.tfrecord
drwxr-xr-x 2 jovyan users    1M Feb 13 11:06 train
-rw-r--r-- 1 jovyan users 1078M Feb 13 13:12 train.tfrecord
drwxr-xr-x 2 jovyan users    1M Feb 13 11:05 val
-rw-r--r-- 1 jovyan users  309M Feb 13 13:14 val.tfrecord


### Download pre-trained model
1. Download model specified in **MODEL_NAME**
2. Extract the zipped model
3. Remove the zip-file we no longer need
4. Copy pipeline.config to our models/MODEL_NAME folder to configure it to our own liking

In [26]:
!wget {BASE_MODEL_URL}/{MODEL_NAME}.tar.gz -P {PRE_TRAINED_MODELS_PATH} 
!tar -xvzf {PRE_TRAINED_MODELS_PATH}/{MODEL_NAME}.tar.gz -C {PRE_TRAINED_MODELS_PATH}
!rm -rf {PRE_TRAINED_MODELS_PATH}/{MODEL_NAME}.tar.gz

--2023-02-12 22:46:17--  http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d1_coco17_tpu-32.tar.gz
Resolving download.tensorflow.org (download.tensorflow.org)... 142.250.185.208, 2a00:1450:4001:812::2010
Connecting to download.tensorflow.org (download.tensorflow.org)|142.250.185.208|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 51839363 (49M) [application/x-tar]
Saving to: ‘workspace/pre_trained_models/efficientdet_d1_coco17_tpu-32.tar.gz’


2023-02-12 22:46:22 (10.6 MB/s) - ‘workspace/pre_trained_models/efficientdet_d1_coco17_tpu-32.tar.gz’ saved [51839363/51839363]

efficientdet_d1_coco17_tpu-32/
efficientdet_d1_coco17_tpu-32/checkpoint/
efficientdet_d1_coco17_tpu-32/checkpoint/ckpt-0.data-00000-of-00001
efficientdet_d1_coco17_tpu-32/checkpoint/checkpoint
efficientdet_d1_coco17_tpu-32/checkpoint/ckpt-0.index
efficientdet_d1_coco17_tpu-32/pipeline.config
efficientdet_d1_coco17_tpu-32/saved_model/
efficientdet_d1_coco17_tpu-32/s

### Update parameters in the pipeline.config
- **Batch Size:** 4, 8
- **Learning Rate:** 1e-3, 5e-4, 1e-4
- **Epochs:** 10, 25, 50

-------

*Note: The pipeline.config does not have a epochs parameter. Instead we have to calculate the **num_steps** based on the epochs we want to use:*

- ***num_steps = epochs * (num_samples / batch_size)***
- ***epochs = num_steps / (num_samples / batch_size)***


In [10]:
# Manipulate pipeline.config
from object_detection.utils import config_util

In [108]:
NUM_SAMPLES = len([x for x in os.listdir(os.path.join(DATA_PATH, 'train')) if x.endswith('.xml')])
NUM_CLASSES = 10

EPOCHS = 10
BATCH_SIZE = 4
LEARNING_RATE =  1e-3
USE_AUGMENTATION = False

NUM_STEPS = int(EPOCHS * (NUM_SAMPLES / BATCH_SIZE))
COMBINATION_NAME = f"epochs_{EPOCHS}-batch_size_{BATCH_SIZE}-learning_rate_{LEARNING_RATE}-aug_{USE_AUGMENTATION}"

if not os.path.isdir(os.path.join(MODELS_PATH, MODEL_NAME, COMBINATION_NAME)):
    os.mkdir(os.path.join(MODELS_PATH, MODEL_NAME, COMBINATION_NAME))

In [113]:
!cp {PRE_TRAINED_MODELS_PATH}/{MODEL_NAME}/pipeline.config {MODELS_PATH}/{MODEL_NAME}/{COMBINATION_NAME}
config = config_util.get_configs_from_pipeline_file(PIPELINE_CONFIG_PATH)

config['model'].ssd.num_classes = NUM_CLASSES
config['train_config'].fine_tune_checkpoint_type = "detection"
config['train_config'].batch_size = BATCH_SIZE
config['train_config'].num_steps = NUM_STEPS
config['train_config'].fine_tune_checkpoint = os.path.join(PRE_TRAINED_MODELS_PATH, MODEL_NAME, "checkpoint", "ckpt-0")
config['train_config'].optimizer.momentum_optimizer.learning_rate.cosine_decay_learning_rate.learning_rate_base = LEARNING_RATE
config['train_config'].optimizer.momentum_optimizer.learning_rate.cosine_decay_learning_rate.warmup_learning_rate = LEARNING_RATE / 5
config['train_config'].optimizer.momentum_optimizer.learning_rate.cosine_decay_learning_rate.warmup_steps = int(NUM_STEPS * 0.01)
config['train_input_config'].label_map_path = os.path.join(DATA_PATH, 'label_map.pbtxt')
config['train_input_config'].tf_record_input_reader.input_path[0] = os.path.join(DATA_PATH, 'train.tfrecord')
config['eval_input_config'].label_map_path = os.path.join(DATA_PATH, 'label_map.pbtxt')
config['eval_input_config'].tf_record_input_reader.input_path[0] = os.path.join(DATA_PATH, 'val.tfrecord')

# Remove data augmentation
#del config['train_config'].data_augmentation_options[0]
#del config['train_config'].data_augmentation_options[0]

# Save updated config
config = config_util.create_pipeline_proto_from_configs(config)
config_util.save_pipeline_config(config, os.path.join(MODELS_PATH, MODEL_NAME, COMBINATION_NAME))

INFO:tensorflow:Writing pipeline config file to workspace/models/efficientdet_d1_coco17_tpu-32/epochs_10-batch_size_4-learning_rate_0.001-aug_False/pipeline.config


### Start training
1. Move the model_main_tf2.py training script to our workspace folder
2. Start the training process
3. Directly after starting, run the **eval.ipynb** for evaluation

In [19]:
!cp models/research/object_detection/model_main_tf2.py {WORKSPACE_PATH}

In [114]:
start_time = time.time()

!python3 workspace/model_main_tf2.py \
    --pipeline_config_path=workspace/models/{MODEL_NAME}/{COMBINATION_NAME}/pipeline.config \
    --model_dir=workspace/models/{MODEL_NAME}/{COMBINATION_NAME} \
    --alsologtostderr

end_time = time.time()
elapsed_time = (end_time - start_time) / 60

print(f"Elapsed time: {round(elapsed_time, 2)} minutes")

caused by: ['/home/jovyan/.local/lib/python3.9/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so: undefined symbol: _ZN3tsl5mutexC1Ev']
caused by: ['/home/jovyan/.local/lib/python3.9/site-packages/tensorflow_io/python/ops/libtensorflow_io.so: undefined symbol: _ZNK10tensorflow4data11DatasetBase3GetEPNS_15OpKernelContextElPSt6vectorINS_6TensorESaIS5_EE']
 The versions of TensorFlow you are currently using is 2.6.0 and is not supported. 
Some things might work, some things might not.
If you were to encounter a bug, do not file an issue.
If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version. 
You can find the compatibility matrix in TensorFlow Addon's readme:
https://github.com/tensorflow/addons
2023-02-13 16:16:01.449025: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following 