# Perceptive Automata Data Science Coding Challenge 2018

## Introduction

Hi, there! 

This is the Coding Challenge for the Summer 2018 Apprenticeship at Perceptive Automata. This challenge focuses primarily on software skills and machine learning. I've tried to minimize the amount of imports, but these are all fairly standard packages. If you find yourself spending lots of time installing things before you can even start, shoot me an email as that is not the point of this challenge, and I'll try to help you get up and coding faster as best I can.

There are instructions throughout that should help guide you through the challenge.  You should be able to run all of the code on your machine, without any long runtimes. Also, if you find any errors or typos please be sure to email me at **avery@perceptiveautomata.com**. 

This should be inherently clear, but its worth reiterating: **all work should be your own!**  Please make sure your code is clear and commented so we understand what you did, and when you send the challenge back, include a brief description of your approach.

We're excited to see what you come up with!  

## Problems
- [1. Rain Collection](#1.-Rain-Collection)
- [2. Pedestrian Crossing Prediction](#2.-Pedestrian-Crossing-Prediction)

### Imports

In [14]:
import ast
import pandas as pd
from sklearn.ensemble import RandomForestClassifier

In [2]:
# Older version of sklearn
from sklearn.cross_validation import train_test_split

# Newer versions of sklearn
#from sklearn.model_selection import train_test_split

# 1. Rain Collection

In a 2D city, there are a series of rectangular buildings that are built one up against the other.  They all have flat roofs. However, since the buildings range in height, when it rains, the water collects on the roofs of buildings with taller buildings on both sides. 

The best way to see this is through a picture:

![alt Rain Collection Diagram](rain_diagram.png)

You can see how in the first picture, no water can be collected, since it will just run off.  In the second picture two units of water can be collected.  And if you add a 4-tall tower on the right side, you would get the result shown in the third picture.

A researcher wants to know how much water is collected on all of the roofs when it rains.  Please help them by writing up a method that takes in an array of building heights, and efficiently calculates how much rain can be collected.

In [3]:
def calculate_rain_collection(building_heights):
    """Calculate the amount of rain that can be stored on the roofs of the city.

    :param building_heights: an ordered array of building heights
    :return: the total amount of rain that can be captured
    """
    
    total_rain = 0
    
    # TODO: Write an efficient method that calculates how much rain can be stored

    return total_rain

## Tests

Here are some basic tests, please feel free to add more tests of your own.

In [None]:
assert calculate_rain_collection([1]) == 0
assert calculate_rain_collection([3,2,1]) == 0
assert calculate_rain_collection([2,1,3,1,2]) == 2
assert calculate_rain_collection([2,1,3,1,2,4,3]) == 4

### Describe your solution



(Write a few sentences here about what you did, any problems you had, how you fixed them, etc.)



# 2. Pedestrian Crossing Prediction

For the following problem, we use the opensource JAAD dataset which you can read more about here: http://data.nvision2.eecs.yorku.ca/JAAD_dataset/ if you are interested.

The dataset consists of tracked people in some videos from a car's dashcam.  Each of these people have been carefully annotated with a bunch of different attributes, such as whether or not they are stopped or moving fast or moving slow.  For this problem we will try to predict whether or not the pedestrian crossed the street.  You will use the bounding boxes of the pedestrians along with the other actions that they take to try to predict this for a test set

## Dataframe

Each row of the dataframe that we construct for you consists of some meta data about the video id and the ped id so that you can match them up with the JAAD videos, and then an ordered list of frames where that pedestrian appears.  

* frame_numbers - These should be continuous and there should be no gaps in these lists.  The other fields all align with the frame number field.
* bounding_boxes - This field is a series of boxes that aligns with the frame_numbers field.  Each box is constructed of [box x, box y, box width, box height], where x and y represent the upper left hand corner of the box
* moving_slow, stopped, handwave, look, clear path, moving fast, looking, standing, slow down, nod, speed up - The annotated attributes you will use to train the model, each is a list that aligns with the frame_numbers field of whether or not the attribute is true for that frame number
* crossing - Whether or not the pedestrian is crossing for this corresponding frame number 
* cross_overall - This is the field that you will try to predict, it is whether or the person crossed at any point in the sequence

In [44]:
pedestrians_df = pd.read_csv('pedestrian_df.csv')
for col_name in ['bounding_boxes', 'frame_numbers', 'moving slow', 'stopped', 'handwave', 'look', 'clear path', 'crossing', 'moving fast', 'looking', 'standing', 'slow down', 'nod', 'speed up']:
    pedestrians_df[col_name] = pedestrians_df[col_name].apply(ast.literal_eval)
pedestrians_df.head()

Unnamed: 0,video_id,ped_ind,frame_numbers,bounding_boxes,moving slow,stopped,handwave,look,clear path,crossing,moving fast,looking,standing,slow down,nod,speed up,cross_overall
0,video_0071,1,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...","[[1209, 598, 51, 191], [1214, 598, 52, 192], [...","[False, False, False, False, False, False, Fal...","[True, True, True, True, True, True, True, Tru...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...",False
1,video_0071,2,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...","[[1249, 621, 51, 127], [1254, 620, 51, 129], [...","[True, True, True, True, True, True, True, Tru...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[True, True, True, True, True, True, True, Tru...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...",True
2,video_0204,1,"[3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, ...","[[1135, 673, 28, 97], [1139, 672, 29, 92], [11...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[True, True, True, True, True, True, True, Tru...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...",True
3,video_0204,3,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...","[[906, 670, 35, 65], [906, 672, 32, 65], [907,...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[True, True, True, True, True, True, True, Tru...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...",True
4,video_0204,2,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...","[[1152, 657, 42, 114], [1158, 657, 42, 117], [...","[False, False, False, False, False, False, Fal...","[True, True, True, True, True, True, True, Tru...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[True, True, True, True, True, True, True, Tru...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...",False


In [47]:
df = pedestrians_df.drop(['cross_overall', 'crossing'], axis=1)
y = pedestrians_df['cross_overall']

## Extract Features

In [48]:
def get_start_box_x(bounding_box_list):
    return bounding_box_list[0][0]

def did_attr(attr_list):
    for attr in attr_list:
        if attr:
            return True
        
    return False

df['start_box_x'] = df['bounding_boxes'].apply(get_start_box_x)
df['did_nod'] = df['nod'].apply(did_attr)
df['did_look'] = df['look'].apply(did_attr)

In [49]:
# <YOUR CODE HERE!>
# You should extrat a whole bunch of features here and add them to 'X' below.
# Please feel free to use anything from frame_numbers, bounding_boxes, or the attributes to train your model.  
# Don't use 'cross_overall' or 'crossing' to train your model (obviously).

In [55]:
# Set up X
X = df[['start_box_x', 'did_nod', 'did_look']]


## Split Train and Validate Set

In [56]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.33)

## Make a Model

In [57]:
RFC = RandomForestClassifier(n_estimators=10)
RFC.fit(X_train, y_train)
print RFC.feature_importances_

[ 0.98171046  0.00364978  0.01463976]


In [58]:
# <YOUR CODE HERE!>
# You should try a whole bunch of different types of models and parameters

## Test your Model

How well did you do?

In [59]:
print 'Test Set Score:', RFC.score(X_test, y_test)

Test Set Score: 0.677272727273


## Analysis



Please write up a bit about what you did and your findings

