# WPILib ML Notebook


## Introduction

By using this notebook, you can train a TensorFlow Lite model for use on a Raspberry Pi and Google Coral USB Accelerator. We've designed this process to be as simple as possible. If you find an issue with this notebook, please create a new issue report on our GitHub page, where you downloaded this notebook.

## Workflow

This section will explain the four distinct steps to getting a trained model to run on your hardware.
### Getting Data

WPILib provides thousands of labelled images for this years game, which you can download here. However, you can train with custom data using this notebook as well. The below instructions describe how to gather and label your own data.

1. Plug a USB Camera into your laptop, and run a script similar to record_video.py, which simply makes an mp4 from the camera stream.
2. Create a [supervise.ly](supervise.ly) account. This is a very nice tool for labelling data.
3. (Optional) You can add other teammates to your Supervise.ly workspace by clicking 'Members' on the left and then 'INVITE' at the top.
4. Choose a workspace to work in, in the 'Workspaces' tab.
5. Upload the official WPILib labelled data to your workspace. Download the tar here, extract it, then click 'IMPORT DATA' or 'UPLOAD' inside of your workspace. Change the import plugin to Supervisely, then drag in the extracted FOLDER. Then, give the project a name, then click import.
6. Upload your own video to your workspace. Click 'UPLOAD' when inside of your workspace, change your import plugin to video, drag in your video, give the project a name, and click import.
7. Click into your newly import Dataset. Use the rectangle tool to draw appropriate boxes around the objects which you wish to label.

### Training

1. Download your datasets from Supervise.ly. Select the "json and jpeg" option.
2. Upload your tar to a new folder in an Amazon S3 bucket, or a brand new S3 bucket.
3. Create a new SageMaker notebook instance, and open the WPILib notebook.
4. Change estimator.fit() to use your new dataset, by specifying the folder in which the tar is stored.
5. Run each block of the notebook in order.
6. Training should take roughly 45 minutes. If you do not change anything in the notebook, it should absolutely not take longer than an hour. In the Training Job, CPU usage should be around 690 during the majority of the training, if running on an ml.c4.2xlarge. If it is less, something went wrong.

### Inference

1. Go to the training job in SageMaker, scroll to the bottom, and find the output S3 location
2. Download the the tar file in the bucket, extract it, and get your .tflite file
3. Put the tflite on your Raspberry Pi by plugging in the SD card into your computer and dragging it in to /home/pi
4. Run the python script, using `python3 object_detection.py --model output.tflite`


## Notebook
### Building and registering the container

This code block runs a script that builds a docker container, and saves it as an Amazon ECR image. This image is used by the training instance so that all proper dependencies and WPILib files are in place.

In [78]:
%%sh
#!/usr/bin/env bash
docker system prune --force > /dev/null 2>&1

# The name of our algorithm
algorithm_name=sagemaker-tf-wpi2

cd container

chmod -R +x coral/

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-west-2}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.

aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build  -t ${algorithm_name} . --no-cache > /dev/null 2>&1
docker tag ${algorithm_name} ${fullname} > /dev/null 2>&1

docker push ${fullname} > /dev/null 2>&1


Total reclaimed space: 0B
Login Succeeded
Sending build context to Docker daemon  65.02kB
Step 1/16 : FROM tensorflow/tensorflow:1.12.0-rc2-devel
 ---> f643a5376d9c
Step 2/16 : RUN git clone https://github.com/tensorflow/models.git &&     mv models /tensorflow/models
 ---> Running in 88d1d1d730d2
[91mCloning into 'models'...
[0mRemoving intermediate container 88d1d1d730d2
 ---> 1f883f22e959
Step 3/16 : RUN apt-get update -qq > /dev/null &&     apt-get install -y python python-tk python3 python3-pip -qq > /dev/null
 ---> Running in 271b7d2d2e59
[91mdebconf: delaying package configuration, since apt-utils is not installed
[0mRemoving intermediate container 271b7d2d2e59
 ---> 608d1a9c8c80
Step 4/16 : RUN apt-get update -qq > /dev/null &&     apt-get install -y --no-install-recommends nginx curl -qq > /dev/null
 ---> Running in 752dd175ded9
[91mdebconf: delaying package configuration, since apt-utils is not installed
[0mRemoving intermediate container 752dd175ded9
 ---> dae35625face


https://docs.docker.com/engine/reference/commandline/login/#credentials-store



### Get execution role

This gets the notebook instance's execution role, used for communicating with the training instance.

In [79]:
from sagemaker import get_execution_role

role = get_execution_role()

### Training on SageMaker
Training a model on SageMaker with the Python SDK is done in a way that is similar to the way we trained it locally. This is done by changing our train_instance_type from `local` to one of our [supported EC2 instance types](https://aws.amazon.com/sagemaker/pricing/instance-types/).

In addition, we must now specify the ECR image URL, which we just pushed above.

Finally, our local training dataset has to be in Amazon S3 and the S3 URL to our dataset is passed into the `fit()` call.

Let's first fetch our ECR image url that corresponds to the image we just built and pushed.

In [80]:
import boto3

client = boto3.client('sts')
account = client.get_caller_identity()['Account']

my_session = boto3.session.Session()
region = my_session.region_name

algorithm_name = 'sagemaker-tf-wpi2'

ecr_image = '{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(account, region, algorithm_name)

print(ecr_image)

766711008027.dkr.ecr.us-east-1.amazonaws.com/sagemaker-tf-wpi2:latest


This last step runs the training instance (default an ml.c4.2xlarge), and begins training with the data specified in `fit()`


In [None]:
from sagemaker.estimator import Estimator

# The type of computer used for training. The default is recommended. It costs 45 cents/hour to run.
instance_type = 'ml.c4.2xlarge'

# The number of epochs to train to. 500 is a safe number. With the default instance, it should take 45 minutes.
hyperparameters = {'train-steps': 500}

# The estimator object, using our notebook, training instance, the ECR image, and the specified training steps
estimator = Estimator(role=role,
                      train_instance_count=1,
                      train_instance_type=instance_type,
                      image_name=ecr_image,
                      hyperparameters=hyperparameters)

# Change this bucket if you want to train with your own data. The WPILib bucket contains thousands of high quality labeled images.
# s3://wpilib
estimator.fit("s3://wpilib")


2019-07-29 19:43:34 Starting - Starting the training job...
2019-07-29 19:43:37 Starting - Launching requested ML instances......
2019-07-29 19:44:38 Starting - Preparing the instances for training...
2019-07-29 19:45:18 Downloading - Downloading input data...
2019-07-29 19:45:52 Training - Downloading the training image........
[31mDownloading model[0m
[31mPreparing checkpoint[0m

2019-07-29 19:47:18 Training - Training image download completed. Training in progress.[31mSuccessfully created the TFRecords: /opt/ml/input/data/training/train.record[0m
[31mSuccessfully created the TFRecords: /opt/ml/input/data/training/eval.record[0m
[31mData made.[0m
[31mitem {
[0m
[31mid: 1
[0m
[31mname: "stickyvelcro"[0m
[31m}
[0m
[31mitem {
[0m
[31mid: 2
[0m
[31mname: "redrobot"[0m
[31m}
[0m
[31mitem {
[0m
[31mid: 3
[0m
[31mname: "hole"[0m
[31m}
[0m
[31mitem {
[0m
[31mid: 4
[0m
[31mname: "bluerobot"[0m
[31m}
[0m
[31mitem {
[0m
[31mid: 5
[0m
[31mname: "ali

##### The output

Go to the Training Jobs tab of SageMaker. Click on the newest Completed job. Scroll to the bottom. The S3 bucket containing the trained .tflite file (inside of a tar file) can be found there.
