# WPILib ML Notebook


## Introduction

By using this notebook, you can train a TensorFlow Lite model for use on a Raspberry Pi and Google Coral USB Accelerator. We've designed this process to be as simple as possible. If you find an issue with this notebook, please create a new issue report on our [GitHub page](https://github.com/wpilibsuite/CoralSagemaker), where you downloaded this notebook.

### Training

1. Download the WPILIB dataset as a .tar file [here](https://github.com/wpilibsuite/CoralSagemaker/releases/download/v1/WPILib.tar)
2. Upload your .tar file to a new folder in an Amazon S3 bucket, or a brand new S3 bucket.
3. Create a new SageMaker notebook instance, and open the WPILib notebook.
4. Change estimator.fit() in the last code cell to use your new dataset, by specifying the folder in which the tar is stored.
5. Run the code block.
6. Training should take roughly 10 minutes and cost roughly \\$0.55 if using the GPU instance, or 45 minutes and cost roughly \\$0.45 if using the CPU instance. If you do not change anything in the notebook, other than the S3 location, it should absolutely not take longer than an hour.

## Notebook


This step runs the training instance (default for GPU is a ml.p3.2xlarge and for the default is CPU is an ml.c4.2xlarge), and begins training with the data specified in `fit()`

This section has lots of configurable values
You need to change `estimator.fit(...)`:to be the location of the data used for training. (the bucket you uploaded the .tar to) It should be in the format `"s3://BUCKET-NAME"`


In [1]:
from sagemaker.estimator import Estimator
from sagemaker import get_execution_role


# Uses GPU by default, change to false to use CPU
use_gpu = True

role = get_execution_role()

instance_type = None
algorithm_name = None

if not use_gpu:
    instance_type = 'ml.c4.2xlarge'
    algorithm_name = 'sagemaker-tf-wpi'
else:
    instance_type = 'ml.p3.2xlarge'
    algorithm_name = 'wpi-gpu'

# The number of epochs to train to. 1000 is a safe number. With the default instance, it should take 45 minutes.
# Batch size is the number of images in a round of training. 32 is a safe bet with the default GPU instance.
hyperparameters = {'epochs': 1000,
                  'batch_size': 32}

ecr_image = "249838237784.dkr.ecr.us-east-1.amazonaws.com/{}:latest".format(algorithm_name)

# The estimator object, using our notebook, training instance, the ECR image, and the specified training steps
estimator = Estimator(role=role,
                      train_instance_count=1,
                      train_instance_type=instance_type,
                      image_name=ecr_image,
                      hyperparameters=hyperparameters)

# Change this bucket if you want to train with your own data. The WPILib bucket contains thousands of high quality labeled images.
# s3://wpilib
estimator.fit("s3://wpilib")



2019-12-17 23:02:31 Starting - Starting the training job...
2019-12-17 23:02:32 Starting - Launching requested ML instances......
2019-12-17 23:03:38 Starting - Preparing the instances for training...
2019-12-17 23:04:28 Downloading - Downloading input data...
2019-12-17 23:04:39 Training - Downloading the training image...........[34mDownloading model[0m

2019-12-17 23:06:43 Training - Training image download completed. Training in progress.[34mSuccessfully created the TFRecords: /opt/ml/input/data/training/train.record[0m
[34mSuccessfully created the TFRecords: /opt/ml/input/data/training/eval.record[0m
[34mRecords generated.[0m
[34mBeginning training on Docker image[0m
[34mConverting checkpoint to tflite[0m
[34mCompiling model for Edge TPU[0m

2019-12-17 23:13:22 Uploading - Uploading generated training model
2019-12-17 23:13:22 Completed - Training job completed
Training seconds: 534
Billable seconds: 534
