# Tensorflow Object Detection API and AWS Sagemaker

In this notebook, you will train and evaluate different models using the [Tensorflow Object Detection API](https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/) and [AWS Sagemaker](https://aws.amazon.com/sagemaker/). 

If you ever feel stuck, you can refer to this [tutorial](https://aws.amazon.com/blogs/machine-learning/training-and-deploying-models-using-tensorflow-2-with-the-object-detection-api-on-amazon-sagemaker/).

## Dataset

We are using the [Waymo Open Dataset](https://waymo.com/open/) for this project. The dataset has already been exported using the tfrecords format. The files have been created following the format described [here](https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#create-tensorflow-records). You can find data stored on [AWS S3](https://aws.amazon.com/s3/), AWS Object Storage. The images are saved with a resolution of 640x640.

In [1]:
%%capture
%pip install tensorflow_io sagemaker -U

In [2]:
import os
import sagemaker
from sagemaker.estimator import Estimator
from framework import CustomFramework

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


Save the IAM role in a variable called `role`. This would be useful when training the model.

In [3]:
role = sagemaker.get_execution_role()
print(role)

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml
arn:aws:iam::339840815706:role/service-role/AmazonSageMaker-ExecutionRole-20231111T220972


In [5]:
# The train and val paths below are public S3 buckets created by Udacity for this project
inputs = {'train': 's3://cd2688-object-detection-tf2/train/', 
          'val': 's3://cd2688-object-detection-tf2/val/'} 

# Insert path of a folder in your personal S3 bucket to store tensorboard logs.
tensorboard_s3_prefix = 's3://jludacity-object-detection-project/logs/'

## Container

To train the model, you will first need to build a [docker](https://www.docker.com/) container with all the dependencies required by the TF Object Detection API. The code below does the following:
* clone the Tensorflow models repository
* get the exporter and training scripts from the repository
* build the docker image and push it 
* print the container name

In [6]:
%%bash

# clone the repo and get the scripts
git clone https://github.com/tensorflow/models.git docker/models

# get model_main and exporter_main files from TF2 Object Detection GitHub repository
cp docker/models/research/object_detection/exporter_main_v2.py source_dir 
cp docker/models/research/object_detection/model_main_tf2.py source_dir

Cloning into 'docker/models'...


In [7]:
# build and push the docker image. This code can be commented out after being run once.
# This will take around 10 mins.
image_name = 'tf2-object-detection'
!sh ./docker/build_and_push.sh $image_name

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
Building image with name tf2-object-detection
Sending build context to Docker daemon  743.4MB
Step 1/14 : FROM tensorflow/tensorflow:2.13.0-gpu
2.13.0-gpu: Pulling from tensorflow/tensorflow

[1B351b9876: Pulling fs layer 
[1B47ab5eb5: Pulling fs layer 
[1B4004a3cd: Pulling fs layer 
[1Bef4b90c8: Pulling fs layer 
[1B5b7808f0: Pulling fs layer 
[1Bf27d9487: Pulling fs layer 
[1Bce5452b7: Pulling fs layer 
[1B96a5c562: Pulling fs layer 
[1B8581227d: Pulling fs layer 
[1B2fb4b02b: Pulling fs layer 
[1Baf93509e: Pulling fs layer 
[1B2ebc3ac7: Pulling fs layer 
[1Bb9c327ea: Pulling fs layer 
[1B50d98969: Pulling fs layer 
[1B2b037de7: Pulling fs layer 
[1Bc4c4e2d5: Pulling fs layer 
[1Ba97d184a: Pulling fs layer 
[1B37c77f3e: Pull complete 131kB/1.131kBB[16A[2K[18A[2K[17A[2K[18A[2K[14A[2K[12A[2K[13A[2K[16A[2K[11A[2K[10A[2K[18A[2K[9A[2K[10A[2K[9A[2K[1

Status: Downloaded newer image for tensorflow/tensorflow:2.13.0-gpu
 ---> 6bdca089cc38
Step 2/14 : ARG DEBIAN_FRONTEND=noninteractive
 ---> Running in 6fad33614784
Removing intermediate container 6fad33614784
 ---> 3f08f2a9fc83
Step 3/14 : RUN apt-get update && apt-get install -y     git     gpg-agent     python3-cairocffi     protobuf-compiler     python3-pil     python3-lxml     python3-tk     libgl1-mesa-dev     wget
 ---> Running in 61a2c9b33a42
Get:1 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease [1581 B]
Get:3 http://archive.ubuntu.com/ubuntu focal InRelease [265 kB]
Get:4 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [29.3 kB]
Ign:5 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Get:6 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  Packages [1271 kB]
Get:7 http:

Get:24 http://archive.ubuntu.com/ubuntu focal/main amd64 fonts-dejavu-core all 2.37-1 [1041 kB]
Get:25 http://archive.ubuntu.com/ubuntu focal/main amd64 fontconfig-config all 2.13.1-2ubuntu3 [28.8 kB]
Get:26 http://archive.ubuntu.com/ubuntu focal/main amd64 libfontconfig1 amd64 2.13.1-2ubuntu3 [114 kB]
Get:27 http://archive.ubuntu.com/ubuntu focal/main amd64 libxrender1 amd64 1:0.9.10-1 [18.7 kB]
Get:28 http://archive.ubuntu.com/ubuntu focal/main amd64 libxft2 amd64 2.3.3-0ubuntu1 [39.2 kB]
Get:29 http://archive.ubuntu.com/ubuntu focal/main amd64 x11-common all 1:7.7+19ubuntu14 [22.3 kB]
Get:30 http://archive.ubuntu.com/ubuntu focal/main amd64 libxss1 amd64 1:1.2.3-1 [8140 B]
Get:31 http://archive.ubuntu.com/ubuntu focal/main amd64 libtk8.6 amd64 8.6.10-1 [714 kB]
Get:32 http://archive.ubuntu.com/ubuntu focal/main amd64 tk8.6-blt2.5 amd64 2.5.3+dfsg-4 [572 kB]
Get:33 http://archive.ubuntu.com/ubuntu focal/main amd64 blt amd64 2.5.3+dfsg-4 [4944 B]
Get:34 http://archive.ubuntu.com/ubunt

Get:102 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libwebpdemux2 amd64 0.6.1-2ubuntu0.20.04.3 [9560 B]
Get:103 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libwebpmux3 amd64 0.6.1-2ubuntu0.20.04.3 [19.5 kB]
Get:104 http://archive.ubuntu.com/ubuntu focal/main amd64 libxcb-randr0 amd64 1.14-2 [16.3 kB]
Get:105 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libxslt1.1 amd64 1.1.34-4ubuntu0.20.04.1 [151 kB]
Get:106 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 mesa-vulkan-drivers amd64 21.2.6-0ubuntu0.1~20.04.2 [5788 kB]
Get:107 http://archive.ubuntu.com/ubuntu focal/main amd64 python3-soupsieve all 1.9.5+dfsg-1 [29.1 kB]
Get:108 http://archive.ubuntu.com/ubuntu focal/main amd64 python3-bs4 all 4.8.2-1 [83.0 kB]
Get:109 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 python3-ply all 3.11-3ubuntu0.1 [46.3 kB]
Get:110 http://archive.ubuntu.com/ubuntu focal/main amd64 python3-pycparser all 2.19-1ubuntu1 [71.0 kB]
Get:111 http://arch

Selecting previously unselected package liberror-perl.
Preparing to unpack .../034-liberror-perl_0.17029-1_all.deb ...
Unpacking liberror-perl (0.17029-1) ...
Selecting previously unselected package git-man.
Preparing to unpack .../035-git-man_1%3a2.25.1-1ubuntu3.11_all.deb ...
Unpacking git-man (1:2.25.1-1ubuntu3.11) ...
Selecting previously unselected package git.
Preparing to unpack .../036-git_1%3a2.25.1-1ubuntu3.11_amd64.deb ...
Unpacking git (1:2.25.1-1ubuntu3.11) ...
Selecting previously unselected package libpixman-1-0:amd64.
Preparing to unpack .../037-libpixman-1-0_0.38.4-0ubuntu2.1_amd64.deb ...
Unpacking libpixman-1-0:amd64 (0.38.4-0ubuntu2.1) ...
Selecting previously unselected package libxcb-render0:amd64.
Preparing to unpack .../038-libxcb-render0_1.14-2_amd64.deb ...
Unpacking libxcb-render0:amd64 (1.14-2) ...
Selecting previously unselected package libxcb-shm0:amd64.
Preparing to unpack .../039-libxcb-shm0_1.14-2_amd64.deb ...
Unpacking libxcb-shm0:amd64 (1.14-2) ...
S

Selecting previously unselected package libglx-dev:amd64.
Preparing to unpack .../080-libglx-dev_1.3.2-1~ubuntu0.20.04.2_amd64.deb ...
Unpacking libglx-dev:amd64 (1.3.2-1~ubuntu0.20.04.2) ...
Selecting previously unselected package libgl-dev:amd64.
Preparing to unpack .../081-libgl-dev_1.3.2-1~ubuntu0.20.04.2_amd64.deb ...
Unpacking libgl-dev:amd64 (1.3.2-1~ubuntu0.20.04.2) ...
Selecting previously unselected package libegl-dev:amd64.
Preparing to unpack .../082-libegl-dev_1.3.2-1~ubuntu0.20.04.2_amd64.deb ...
Unpacking libegl-dev:amd64 (1.3.2-1~ubuntu0.20.04.2) ...
Selecting previously unselected package libjpeg-turbo8:amd64.
Preparing to unpack .../083-libjpeg-turbo8_2.0.3-0ubuntu1.20.04.3_amd64.deb ...
Unpacking libjpeg-turbo8:amd64 (2.0.3-0ubuntu1.20.04.3) ...
Selecting previously unselected package libjpeg8:amd64.
Preparing to unpack .../084-libjpeg8_8c-2ubuntu8_amd64.deb ...
Unpacking libjpeg8:amd64 (8c-2ubuntu8) ...
Selecting previously unselected package libjbig0:amd64.
Prepari

Setting up libxau6:amd64 (1:1.0.9-0ubuntu1) ...
Setting up wget (1.20.3-1ubuntu2) ...
Setting up libglvnd0:amd64 (1.3.2-1~ubuntu0.20.04.2) ...
Setting up libprotobuf-lite17:amd64 (3.6.1.3-2ubuntu5.2) ...
Setting up python3-olefile (0.46-2) ...
Setting up python3-ply (3.11-3ubuntu0.1) ...
Setting up libgdk-pixbuf2.0-common (2.40.0+dfsg-3ubuntu0.4) ...
Setting up x11-common (1:7.7+19ubuntu14) ...
invoke-rc.d: could not determine current runlevel
invoke-rc.d: policy-rc.d denied execution of start.
Setting up libsensors-config (1:3.6.0-2ubuntu1.1) ...
Setting up less (551-1ubuntu0.1) ...
Setting up libcurl3-gnutls:amd64 (7.68.0-1ubuntu2.20) ...
Setting up libcbor0.6:amd64 (0.6.0-0ubuntu1) ...
Setting up libpthread-stubs0-dev:amd64 (0.4-1) ...
Setting up libjbig0:amd64 (2.1-3.1ubuntu0.20.04.1) ...
Setting up python3-webencodings (0.5.1-1ubuntu1) ...
Setting up libopengl0:amd64 (1.3.2-1~ubuntu0.20.04.2) ...
Setting up python3-pycparser (2.19-1ubuntu1) ...
Setting up liberror-perl (0.17029-1)

Processing /home/tensorflow/models/research
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting avro-python3 (from object-detection==0.1)
  Downloading avro-python3-1.10.2.tar.gz (38 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting apache-beam (from object-detection==0.1)
  Downloading apache_beam-2.51.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.1 kB)
Collecting pillow==9.5 (from object-detection==0.1)
  Downloading Pillow-9.5.0-cp38-cp38-manylinux_2_28_x86_64.whl (3.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.4/3.4 MB 94.2 MB/s eta 0:00:00
Collecting matplotlib (from object-detection==0.1)
  Downloading matplotlib-3.7.3-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.metadata (5.7 kB)
Collecting Cython (from object-detection==0.1)
  Downloading Cython-3.0.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Collecting crcmod<2.0,>=1.7 (from apache-beam->object-detection==0.1)
  Downloading crcmod-1.7.tar.gz (89 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89.7/89.7 kB 18.9 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting orjson<4,>=3.9.7 (from apache-beam->object-detection==0.1)
  Downloading orjson-3.9.10-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (49 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.3/49.3 kB 9.4 MB/s eta 0:00:00
Collecting dill<0.3.2,>=0.3.1.1 (from apache-beam->object-detection==0.1)
  Downloading dill-0.3.1.1.tar.gz (151 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 152.0/152.0 kB 29.3 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting cloudpickle~=2.2.1 (from apache-beam->object-detection==0.1)
  Downloading cloudpickle-2.2.1-py3-none-any.whl (25 kB)
Collecting fastavro<2,>=0.

Collecting dm-tree~=0.1.1 (from tensorflow-model-optimization>=0.4.1->tf-models-official>=2.5.1->object-detection==0.1)
  Downloading dm_tree-0.1.8-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (152 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 152.9/152.9 kB 29.7 MB/s eta 0:00:00
Collecting scikit-learn>=0.21.3 (from seqeval->tf-models-official>=2.5.1->object-detection==0.1)
  Downloading scikit_learn-1.3.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Collecting array-record (from tensorflow-datasets->tf-models-official>=2.5.1->object-detection==0.1)
  Downloading array_record-0.4.0-py38-none-any.whl.metadata (502 bytes)
Collecting click (from tensorflow-datasets->tf-models-official>=2.5.1->object-detection==0.1)
  Downloading click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Collecting etils>=0.9.0 (from etils[enp,epath]>=0.9.0->tensorflow-datasets->tf-models-official>=2.5.1->object-detection==0.1)
  Downloading etils-1.3.0-py3-none-any.whl.meta

   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 301.1/301.1 kB 47.6 MB/s eta 0:00:00
Downloading cycler-0.12.1-py3-none-any.whl (8.3 kB)
Downloading fastavro-1.9.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.1/3.1 MB 119.6 MB/s eta 0:00:00
Downloading fasteners-0.19-py3-none-any.whl (18 kB)
Downloading fonttools-4.44.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.6 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.6/4.6 MB 91.8 MB/s eta 0:00:00
Downloading google_api_python_client-2.107.0-py2.py3-none-any.whl (12.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.7/12.7 MB 131.9 MB/s eta 0:00:00
Downloading importlib_resources-6.1.1-py3-none-any.whl (33 kB)
Downloading kiwisolver-1.4.5-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 103.9 MB/s eta 0:00:00
Downloading opencv_python-4.8.1.78-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86

  Building wheel for docopt (setup.py): finished with status 'done'
  Created wheel for docopt: filename=docopt-0.6.2-py2.py3-none-any.whl size=13707 sha256=1f118ba659128df5f211eb7913ee85fd4a8f6b7b2f7cde9b4005633f5aac6a3f
  Stored in directory: /root/.cache/pip/wheels/56/ea/58/ead137b087d9e326852a851351d1debf4ada529b6ac0ec4e8c
  Building wheel for promise (setup.py): started
  Building wheel for promise (setup.py): finished with status 'done'
  Created wheel for promise: filename=promise-2.3-py3-none-any.whl size=21484 sha256=f27e5dfb15633d4498fd0d7e63d8abc49e2d1b88ebdf650d67b128a2023ee8e4
  Stored in directory: /root/.cache/pip/wheels/54/aa/01/724885182f93150035a2a91bce34a12877e8067a97baaf5dc8
Successfully built object-detection avro-python3 crcmod dill hdfs kaggle seqeval pyjsparser docopt promise
Installing collected packages: text-unidecode, sentencepiece, pytz, pyjsparser, py-cpuinfo, gin-config, docopt, dm-tree, crcmod, zstandard, uritemplate, tzdata, tqdm, toml, threadpoolctl, t

Collecting greenlet>=2.0.0 (from gevent->sagemaker-training)
  Downloading greenlet-3.0.1-cp38-cp38-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (3.7 kB)
Downloading boto3-1.28.84-py3-none-any.whl (135 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 135.8/135.8 kB 27.3 MB/s eta 0:00:00
Downloading botocore-1.31.84-py3-none-any.whl (11.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.3/11.3 MB 122.9 MB/s eta 0:00:00
Downloading paramiko-3.3.1-py3-none-any.whl (224 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 224.8/224.8 kB 41.0 MB/s eta 0:00:00
Downloading gevent-23.9.1-cp38-cp38-manylinux_2_28_x86_64.whl (6.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.5/6.5 MB 130.5 MB/s eta 0:00:00
Downloading cryptography-41.0.5-cp37-abi3-manylinux_2_28_x86_64.whl (4.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.4/4.4 MB 97.9 MB/s eta 0:00:00
Downloading greenlet-3.0.1-cp38-cp38-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (618 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

[14Be868aa5: Pushing  1.008GB/1.614GB[26A[2K[24A[2K[26A[2K[24A[2K[26A[2K[24A[2K[27A[2K[24A[2K[27A[2K[24A[2K[27A[2K[23A[2K[27A[2K[24A[2K[27A[2K[22A[2K[27A[2K[25A[2K[27A[2K[25A[2K[27A[2K[25A[2K[22A[2K[26A[2K[21A[2K[26A[2K[24A[2K[26A[2K[25A[2K[20A[2K[25A[2K[20A[2K[26A[2K[20A[2K[26A[2K[21A[2K[26A[2K[20A[2K[26A[2K[27A[2K[26A[2K[27A[2K[26A[2K[25A[2K[26A[2K[20A[2K[26A[2K[20A[2K[26A[2K[20A[2K[27A[2K[20A[2K[25A[2K[20A[2K[20A[2K[27A[2K[26A[2K[27A[2K[26A[2K[27A[2K[26A[2K[26A[2K[26A[2K[26A[2K[19A[2K[26A[2K[19A[2K[26A[2K[25A[2K[25A[2K[25A[2K[26A[2K[25A[2K[26A[2K[25A[2K[19A[2K[25A[2K[19A[2K[25A[2K[19A[2K[26A[2K[25A[2K[26A[2K[25A[2K[19A[2K[25A[2K[19A[2K[27A[2K[19A[2K[26A[2K[19A[2K[16A[2K[19A[2K[19A[2K[17A[2K[19A[2K[26A[2K[15A[2K[19A[2K[16A[2K[19A[2K[25A[2K[19A[2K[14A[2K[19A[2K[14A[2K[19A[2

[10B8b47754: Pushing  3.157GB/4.636GB[10A[2K[25A[2K[14A[2K[10A[2K[14A[2K[10A[2K[14A[2K[10A[2K[25A[2K[14A[2K[25A[2K[14A[2K[25A[2K[14A[2K[25A[2K[14A[2K[14A[2K[25A[2K[14A[2K[25A[2K[25A[2K[10A[2K[25A[2K[14A[2K[25A[2K[14A[2K[10A[2K[14A[2K[10A[2K[14A[2K[25A[2K[14A[2K[25A[2K[14A[2K[10A[2K[14A[2K[14A[2K[25A[2K[14A[2K[10A[2K[14A[2K[25A[2K[14A[2K[14A[2K[10A[2K[14A[2K[25A[2K[14A[2K[10A[2K[14A[2K[10A[2K[14A[2K[10A[2K[14A[2K[25A[2K[14A[2K[10A[2K[14A[2K[10A[2K[14A[2K[25A[2K[14A[2K[25A[2K[10A[2K[14A[2K[10A[2K[14A[2K[10A[2K[25A[2K[10A[2K[14A[2K[10A[2K[25A[2K[10A[2K[25A[2K[14A[2K[10A[2K[14A[2K[10A[2K[14A[2K[25A[2K[14A[2K[25A[2K[10A[2K[14A[2K[10A[2K[14A[2K[25A[2K[14A[2K[25A[2K[10A[2K[14A[2K[10A[2K[14A[2K[10A[2K[14A[2K[10A[2K[14A[2K[10A[2K[14A[2K[10A[2K[14A[2K[10A[2K[14A[2K[10A[2K[14A[2K[10A[2

[10B8b47754: Pushed   4.646GB/4.636GB[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2K[10A[2

To verify that the image was correctly pushed to the [Elastic Container Registry](https://aws.amazon.com/ecr/), you can look at it in the AWS webapp. For example, below you can see that three different images have been pushed to ECR. You should only see one, called `tf2-object-detection`.
![ECR Example](../data/example_ecr.png)


In [8]:
# display the container name
with open (os.path.join('docker', 'ecr_image_fullname.txt'), 'r') as f:
    container = f.readlines()[0][:-1]

print(container)

339840815706.dkr.ecr.us-east-1.amazonaws.com/tf2-object-detection:20231112032838


## Pre-trained model from model zoo

As often, we are not training from scratch and we will be using a pretrained model from the TF Object Detection model zoo. You can find pretrained checkpoints [here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md). Because your time is limited for this project, we recommend to only experiment with the following models:
* SSD MobileNet V2 FPNLite 640x640	
* SSD ResNet50 V1 FPN 640x640 (RetinaNet50)	
* Faster R-CNN ResNet50 V1 640x640	
* EfficientDet D1 640x640	
* Faster R-CNN ResNet152 V1 640x640	

In the code below, the EfficientDet D1 model is downloaded and extracted. This code should be adjusted if you were to experiment with other architectures.

In [9]:
%%bash
mkdir /tmp/checkpoint
mkdir source_dir/checkpoint
wget -O /tmp/efficientdet.tar.gz http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d1_coco17_tpu-32.tar.gz
tar -zxvf /tmp/efficientdet.tar.gz --strip-components 2 --directory source_dir/checkpoint efficientdet_d1_coco17_tpu-32/checkpoint

--2023-11-12 03:39:22--  http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d1_coco17_tpu-32.tar.gz
Resolving download.tensorflow.org (download.tensorflow.org)... 172.253.63.207, 142.250.31.207, 142.251.111.207, ...
Connecting to download.tensorflow.org (download.tensorflow.org)|172.253.63.207|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 51839363 (49M) [application/x-tar]
Saving to: ‘/tmp/efficientdet.tar.gz’

     0K .......... .......... .......... .......... ..........  0% 11.9M 4s
    50K .......... .......... .......... .......... ..........  0% 23.5M 3s
   100K .......... .......... .......... .......... ..........  0% 23.7M 3s
   150K .......... .......... .......... .......... ..........  0% 53.2M 2s
   200K .......... .......... .......... .......... ..........  0%  111M 2s
   250K .......... .......... .......... .......... ..........  0% 52.3M 2s
   300K .......... .......... .......... .......... ..........  0% 66.7M

  5100K .......... .......... .......... .......... .......... 10%  461M 1s
  5150K .......... .......... .......... .......... .......... 10%  467M 1s
  5200K .......... .......... .......... .......... .......... 10%  307M 1s
  5250K .......... .......... .......... .......... .......... 10%  309M 1s
  5300K .......... .......... .......... .......... .......... 10%  215M 1s
  5350K .......... .......... .......... .......... .......... 10%  350M 1s
  5400K .......... .......... .......... .......... .......... 10%  503M 1s
  5450K .......... .......... .......... .......... .......... 10%  222M 1s
  5500K .......... .......... .......... .......... .......... 10%  258M 1s
  5550K .......... .......... .......... .......... .......... 11%  201M 1s
  5600K .......... .......... .......... .......... .......... 11%  273M 1s
  5650K .......... .......... .......... .......... .......... 11%  374M 1s
  5700K .......... .......... .......... .......... .......... 11%  361M 1s
  5750K ....

 10500K .......... .......... .......... .......... .......... 20%  364M 1s
 10550K .......... .......... .......... .......... .......... 20%  159M 1s
 10600K .......... .......... .......... .......... .......... 21%  318M 1s
 10650K .......... .......... .......... .......... .......... 21%  280M 1s
 10700K .......... .......... .......... .......... .......... 21%  206M 1s
 10750K .......... .......... .......... .......... .......... 21%  146M 1s
 10800K .......... .......... .......... .......... .......... 21%  250M 1s
 10850K .......... .......... .......... .......... .......... 21%  218M 1s
 10900K .......... .......... .......... .......... .......... 21%  260M 1s
 10950K .......... .......... .......... .......... .......... 21%  426M 1s
 11000K .......... .......... .......... .......... .......... 21%  430M 1s
 11050K .......... .......... .......... .......... .......... 21%  301M 1s
 11100K .......... .......... .......... .......... .......... 22%  313M 1s
 11150K ....

 15900K .......... .......... .......... .......... .......... 31%  387M 1s
 15950K .......... .......... .......... .......... .......... 31%  260M 1s
 16000K .......... .......... .......... .......... .......... 31%  401M 1s
 16050K .......... .......... .......... .......... .......... 31%  356M 1s
 16100K .......... .......... .......... .......... .......... 31%  549M 1s
 16150K .......... .......... .......... .......... .......... 32%  492M 1s
 16200K .......... .......... .......... .......... .......... 32%  383M 1s
 16250K .......... .......... .......... .......... .......... 32%  497M 1s
 16300K .......... .......... .......... .......... .......... 32%  583M 1s
 16350K .......... .......... .......... .......... .......... 32% 2.75M 1s
 16400K .......... .......... .......... .......... .......... 32%  126M 1s
 16450K .......... .......... .......... .......... .......... 32%  210M 1s
 16500K .......... .......... .......... .......... .......... 32%  173M 1s
 16550K ....

 21300K .......... .......... .......... .......... .......... 42%  183M 0s
 21350K .......... .......... .......... .......... .......... 42%  218M 0s
 21400K .......... .......... .......... .......... .......... 42%  125M 0s
 21450K .......... .......... .......... .......... .......... 42%  300M 0s
 21500K .......... .......... .......... .......... .......... 42%  360M 0s
 21550K .......... .......... .......... .......... .......... 42%  340M 0s
 21600K .......... .......... .......... .......... .......... 42%  257M 0s
 21650K .......... .......... .......... .......... .......... 42%  317M 0s
 21700K .......... .......... .......... .......... .......... 42%  335M 0s
 21750K .......... .......... .......... .......... .......... 43%  361M 0s
 21800K .......... .......... .......... .......... .......... 43%  238M 0s
 21850K .......... .......... .......... .......... .......... 43%  342M 0s
 21900K .......... .......... .......... .......... .......... 43%  335M 0s
 21950K ....

 26700K .......... .......... .......... .......... .......... 52%  278M 0s
 26750K .......... .......... .......... .......... .......... 52%  394M 0s
 26800K .......... .......... .......... .......... .......... 53%  219M 0s
 26850K .......... .......... .......... .......... .......... 53%  319M 0s
 26900K .......... .......... .......... .......... .......... 53%  277M 0s
 26950K .......... .......... .......... .......... .......... 53%  329M 0s
 27000K .......... .......... .......... .......... .......... 53%  265M 0s
 27050K .......... .......... .......... .......... .......... 53%  366M 0s
 27100K .......... .......... .......... .......... .......... 53%  282M 0s
 27150K .......... .......... .......... .......... .......... 53%  219M 0s
 27200K .......... .......... .......... .......... .......... 53%  197M 0s
 27250K .......... .......... .......... .......... .......... 53%  203M 0s
 27300K .......... .......... .......... .......... .......... 54%  197M 0s
 27350K ....

 32100K .......... .......... .......... .......... .......... 63%  432M 0s
 32150K .......... .......... .......... .......... .......... 63%  515M 0s
 32200K .......... .......... .......... .......... .......... 63%  534M 0s
 32250K .......... .......... .......... .......... .......... 63%  484M 0s
 32300K .......... .......... .......... .......... .......... 63%  407M 0s
 32350K .......... .......... .......... .......... .......... 64%  469M 0s
 32400K .......... .......... .......... .......... .......... 64%  511M 0s
 32450K .......... .......... .......... .......... .......... 64%  577M 0s
 32500K .......... .......... .......... .......... .......... 64%  432M 0s
 32550K .......... .......... .......... .......... .......... 64%  479M 0s
 32600K .......... .......... .......... .......... .......... 64%  431M 0s
 32650K .......... .......... .......... .......... .......... 64%  534M 0s
 32700K .......... .......... .......... .......... .......... 64%  417M 0s
 32750K ....

 37500K .......... .......... .......... .......... .......... 74%  275M 0s
 37550K .......... .......... .......... .......... .......... 74%  213M 0s
 37600K .......... .......... .......... .......... .......... 74%  225M 0s
 37650K .......... .......... .......... .......... .......... 74%  159M 0s
 37700K .......... .......... .......... .......... .......... 74%  227M 0s
 37750K .......... .......... .......... .......... .......... 74%  190M 0s
 37800K .......... .......... .......... .......... .......... 74%  186M 0s
 37850K .......... .......... .......... .......... .......... 74%  221M 0s
 37900K .......... .......... .......... .......... .......... 74%  177M 0s
 37950K .......... .......... .......... .......... .......... 75%  229M 0s
 38000K .......... .......... .......... .......... .......... 75%  318M 0s
 38050K .......... .......... .......... .......... .......... 75%  266M 0s
 38100K .......... .......... .......... .......... .......... 75%  333M 0s
 38150K ....

 42900K .......... .......... .......... .......... .......... 84%  430M 0s
 42950K .......... .......... .......... .......... .......... 84%  394M 0s
 43000K .......... .......... .......... .......... .......... 85%  408M 0s
 43050K .......... .......... .......... .......... .......... 85%  408M 0s
 43100K .......... .......... .......... .......... .......... 85%  455M 0s
 43150K .......... .......... .......... .......... .......... 85%  408M 0s
 43200K .......... .......... .......... .......... .......... 85%  433M 0s
 43250K .......... .......... .......... .......... .......... 85%  439M 0s
 43300K .......... .......... .......... .......... .......... 85%  428M 0s
 43350K .......... .......... .......... .......... .......... 85%  435M 0s
 43400K .......... .......... .......... .......... .......... 85%  459M 0s
 43450K .......... .......... .......... .......... .......... 85%  454M 0s
 43500K .......... .......... .......... .......... .......... 86%  448M 0s
 43550K ....

 48300K .......... .......... .......... .......... .......... 95%  339M 0s
 48350K .......... .......... .......... .......... .......... 95%  333M 0s
 48400K .......... .......... .......... .......... .......... 95%  265M 0s
 48450K .......... .......... .......... .......... .......... 95%  298M 0s
 48500K .......... .......... .......... .......... .......... 95%  306M 0s
 48550K .......... .......... .......... .......... .......... 96%  319M 0s
 48600K .......... .......... .......... .......... .......... 96%  266M 0s
 48650K .......... .......... .......... .......... .......... 96%  338M 0s
 48700K .......... .......... .......... .......... .......... 96%  261M 0s
 48750K .......... .......... .......... .......... .......... 96%  206M 0s
 48800K .......... .......... .......... .......... .......... 96%  252M 0s
 48850K .......... .......... .......... .......... .......... 96%  244M 0s
 48900K .......... .......... .......... .......... .......... 96%  255M 0s
 48950K ....

efficientdet_d1_coco17_tpu-32/checkpoint/ckpt-0.data-00000-of-00001
efficientdet_d1_coco17_tpu-32/checkpoint/checkpoint
efficientdet_d1_coco17_tpu-32/checkpoint/ckpt-0.index


## Edit pipeline.config file

The [`pipeline.config`](source_dir/pipeline.config) in the `source_dir` folder should be updated when you experiment with different models. The different config files are available [here](https://github.com/tensorflow/models/tree/master/research/object_detection/configs/tf2).

>Note: The provided `pipeline.config` file works well with the `EfficientDet` model. You would need to modify it when working with other models.

## Launch Training Job

Now that we have a dataset, a docker image and some pretrained model weights, we can launch the training job. To do so, we create a [Sagemaker Framework](https://sagemaker.readthedocs.io/en/stable/frameworks/index.html), where we indicate the container name, name of the config file, number of training steps etc.

The `run_training.sh` script does the following:
* train the model for `num_train_steps` 
* evaluate over the val dataset
* export the model

Different metrics will be displayed during the evaluation phase, including the mean average precision. These metrics can be used to quantify your model performances and compare over the different iterations.

You can also monitor the training progress by navigating to **Training -> Training Jobs** from the Amazon Sagemaker dashboard in the Web UI.

In [None]:
tensorboard_output_config = sagemaker.debugger.TensorBoardOutputConfig(
    s3_output_path=tensorboard_s3_prefix,
    container_local_output_path='/opt/training/'
)

estimator = CustomFramework(
    role=role,
    image_uri=container,
    entry_point='run_training.sh',
    source_dir='source_dir/',
    hyperparameters={
        "model_dir": "/opt/training",        
        "pipeline_config_path": "pipeline.config",
        "num_train_steps": "2000",    
        "sample_1_of_n_eval_examples": "1"
    },
    instance_count=1,
    instance_type='ml.g5.xlarge',
    tensorboard_output_config=tensorboard_output_config,
    disable_profiler=True,
    base_job_name='tf2-object-detection'
)

estimator.fit(inputs)

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml
Using provided s3_resource


INFO:sagemaker:Creating training-job with name: tf2-object-detection-2023-11-12-03-46-06-707


2023-11-12 03:46:08 Starting - Starting the training job...
2023-11-12 03:46:23 Starting - Preparing the instances for training......
2023-11-12 03:47:27 Downloading - Downloading input data...
2023-11-12 03:47:49 Training - Downloading the training image.........
2023-11-12 03:49:29 Training - Training image download completed. Training in progress...[34m2023-11-12 03:49:46,482 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2023-11-12 03:49:46,509 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2023-11-12 03:49:46,536 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2023-11-12 03:49:46,548 sagemaker-training-toolkit INFO     Invoking user script[0m
[34mTraining Env:[0m
[34m{
    "additional_framework_parameters": {},
    "channel_input_dirs": {
        "train": "/opt/ml/input/data/train",
        "val": "/opt/ml/input/data/val"
    

[34mInstructions for updating:[0m
[34mCreate a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.[0m
[34mW1112 03:50:01.958314 139903762192192 deprecation.py:364] From /usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py:1176: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.[0m
[34mInstructions for updating:[0m
[34mCreate a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.[0m
[34mInstructions for updating:[0m
[34mUse `tf.cast` instead.[0m
[34mW1112 03:50:05.686354 139903762192192 deprecation.py:364] From /usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py:1176: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.[0m
[34mInstructions for updating:[0m
[34mUse `tf.cast` instead.[0m
[34mI1112 03:50:14.064810 139875225126656 api.py:460] feature_map_spatial_dims: [(80, 80), (40, 40), (20, 20), (

[34mINFO:tensorflow:Step 300 per-step time 0.672s[0m
[34mI1112 03:55:02.989130 139903762192192 model_lib_v2.py:705] Step 300 per-step time 0.672s[0m
[34mINFO:tensorflow:{'Loss/classification_loss': 0.34643736,
 'Loss/localization_loss': 0.022779623,
 'Loss/regularization_loss': 0.02954728,
 'Loss/total_loss': 0.39876425,
 'learning_rate': 0.010480001}[0m
[34mI1112 03:55:02.989407 139903762192192 model_lib_v2.py:708] {'Loss/classification_loss': 0.34643736,
 'Loss/localization_loss': 0.022779623,
 'Loss/regularization_loss': 0.02954728,
 'Loss/total_loss': 0.39876425,
 'learning_rate': 0.010480001}[0m
[34mINFO:tensorflow:Step 400 per-step time 0.672s[0m
[34mI1112 03:56:10.232573 139903762192192 model_lib_v2.py:705] Step 400 per-step time 0.672s[0m
[34mINFO:tensorflow:{'Loss/classification_loss': 0.26459628,
 'Loss/localization_loss': 0.018402964,
 'Loss/regularization_loss': 0.029554484,
 'Loss/total_loss': 0.31255373,
 'learning_rate': 0.0136400005}[0m
[34mI1112 03:56:10

You should be able to see your model training in the AWS webapp as shown below:
![ECR Example](../data/example_trainings.png)


## Improve on the initial model

Most likely, this initial experiment did not yield optimal results. However, you can make multiple changes to the `pipeline.config` file to improve this model. One obvious change consists in improving the data augmentation strategy. The [`preprocessor.proto`](https://github.com/tensorflow/models/blob/master/research/object_detection/protos/preprocessor.proto) file contains the different data augmentation method available in the Tf Object Detection API. Justify your choices of augmentations in the write-up.

Keep in mind that the following are also available:
* experiment with the optimizer: type of optimizer, learning rate, scheduler etc
* experiment with the architecture. The Tf Object Detection API model zoo offers many architectures. Keep in mind that the pipeline.config file is unique for each architecture and you will have to edit it.
* visualize results on the test frames using the `2_deploy_model` notebook available in this repository.

In the cell below, write down all the different approaches you have experimented with, why you have chosen them and what you would have done if you had more time and resources. Justify your choices using the tensorboard visualizations (take screenshots and insert them in your write-up), the metrics on the evaluation set and the generated animation you have created with [this tool](../2_run_inference/2_deploy_model.ipynb).

In [None]:
# your write-up goes here.