## Computer Vision I Final Project

In this walkthrough, we will look at reading training, test data and creating a submission file for your final project. Once you train your model and get your predictions, submit your model's .csv output to the class [Leaderboard](https://leaderboard.corp.amazon.com/tasks/312)

## Set up SageMaker

Are your models taking too long to train? Use a P2 instance as described in the README.

## Data Access

In this part, we will see how to read training and test data

## 1. Training data

In [None]:
#Let's read in our training data. ASINs correspond to those in Leaderboard's ID.
import pandas as pd
import urllib.request
            
urllib.request.urlretrieve (
    "https://d8mrh1kj01ho9.cloudfront.net/workshop/cv1/data/day1/training_data.pkl",
    "/tmp/training_data.pkl"
)

df = pd.read_pickle("/tmp/training_data.pkl")

In [None]:
#Let's see what kind of data we're working with
import matplotlib.pyplot as plt

plt.imshow(df['data'][90])

Our labels correspondend to the following:

* Class 0: *Inconclusive*
* Class 1: *Two wheels*
* Class 2: *Four wheels*
* Class 3: *Not luggage*
* Class 4: *Zero wheels*

In [None]:
# Let's take a look at this data in more detail and then start working. Remember 'label' is our target variable/column
df.loc[90]

## 2. Test Data

In [None]:
# If you're unsure of how to submit to Leaderboard, no problemo.You'll use the training file loaded above to make your ML model and then predict on the files below:

urllib.request.urlretrieve (
    "https://d8mrh1kj01ho9.cloudfront.net/workshop/cv1/data/day1/test_data.pkl",
    "/tmp/test_data.pkl"
)
test_df = pd.read_pickle("/tmp/test_data.pkl")
plt.imshow(test_df['data'][90])

In [None]:
test_df.head()

## Sample zero submission file

In [None]:
# Below is an example submission of a very poor model

urllib.request.urlretrieve (
    "https://d8mrh1kj01ho9.cloudfront.net/workshop/cv1/data/day1/sample_model_output.csv",
    "/tmp/sample_model_output.csv"
)
test_submission = pd.read_csv('/tmp/sample_model_output.csv', header=0)
test_submission.head(5)

## Your submission file

In [None]:
import pandas as pd
import os

result_df = pd.DataFrame(columns=['ID', 'label'])
result_df["ID"] = test_df["ID"]
# Get your model's predictions when submitting (not the zero submission here)
result_df["label"] = test_submission['label'].values

result_df.to_csv("results_cv_project.csv", index=False)

If you navigate to the day1/results folder in the Jupyter file browser, you can select the results_cv_project.csv and dowload it locally. Or just click this [link...](./results_cv_project.csv)

## Getting our model output into Leaderboard

We now have our model's output .csv and are ready to upload to Leaderboard
1. Search for your class [Leaderboard instance](https://leaderboard.corp.amazon.com/) and go to the 'Make a Submission' section
2. Upload your local file and include your notebook version URL for tracking
3. Your score on the public leaderboard should now appear. Marvel on how much room for improvement there is