## Breakdown of StudentAdmissions Project

I'm confused! Drinking ML content from a udacity firehose. The last couple projects I had to peek at the answers. So I'm going to try to break this lab down into the smallest bits possible. Here we go.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
# read the admissions data
data = pd.read_csv('student_data.csv')
data.head()

Unnamed: 0,admit,gre,gpa,rank
0,0,380,3.61,3
1,1,660,3.67,3
2,1,800,4.0,1
3,1,640,3.19,4
4,0,520,2.93,4


### Pre-process the Input Data

1. The `rank` needs to be one-hot encoded
2. The gre and gpa data needs to be scaled

In [3]:
# one-hot encode the rank
one_hot_data = pd.get_dummies(data['rank'], prefix='rank')
one_hot_data = pd.concat([data, one_hot_data], axis=1)
one_hot_data = one_hot_data.drop(['rank'], axis=1)
one_hot_data.head()

Unnamed: 0,admit,gre,gpa,rank_1,rank_2,rank_3,rank_4
0,0,380,3.61,0,0,1,0
1,1,660,3.67,0,0,1,0
2,1,800,4.0,1,0,0,0
3,1,640,3.19,0,0,0,1
4,0,520,2.93,0,0,0,1


In [4]:
scaled_data = one_hot_data[:]

scaled_data['gre'] /= 800
scaled_data['gpa'] /= 4.0

scaled_data.head()

Unnamed: 0,admit,gre,gpa,rank_1,rank_2,rank_3,rank_4
0,0,0.475,0.9025,0,0,1,0
1,1,0.825,0.9175,0,0,1,0
2,1,1.0,1.0,1,0,0,0
3,1,0.8,0.7975,0,0,0,1
4,0,0.65,0.7325,0,0,0,1


### Split the Data Into Training/Testing Sets
The training set will be 90% of the data and the testing set will be the remaining 10%

In [5]:
# Randomly select an array of indexes
sample = np.random.choice(scaled_data.index, size=int(len(scaled_data)*0.9), replace=False)

# select and drop the indexes to create the training and test data
train_data, test_data = scaled_data.iloc[sample], scaled_data.drop(sample)

print("Number of traning samples is", len(train_data))
print("Number of testing samples is", len(test_data))

Number of traning samples is 360
Number of testing samples is 40


### Split the data into features (X) and targets (y)

The features are the input data (the matrix X). And the targets (y) are the corresponding actual admission outcomes, in the column `admit`.

In [6]:
# Split data for the training and testing sets
features = train_data.drop('admit', axis=1)
targets = train_data['admit']

features_test = test_data.drop('admit', axis=1)
targets_test = test_data['admit']

In [7]:
# targets will be a labeled Series
print(type(targets))
targets.head()

<class 'pandas.core.series.Series'>


274    0
86     0
225    1
303    1
74     0
Name: admit, dtype: int64

In [8]:
# features will be a DataFrame
print(type(features))
features.head()

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,gre,gpa,rank_1,rank_2,rank_3,rank_4
274,0.65,0.78,0,1,0,0
86,0.75,0.83,0,1,0,0
225,0.9,0.875,0,0,1,0
303,0.85,0.995,0,1,0,0
74,0.9,0.8625,0,0,0,1


### Training the Network

Break down the training function to see how it works in detail.
```
def train_nn(features, targets, epochs, learnrate):
    pass
```

In [9]:
features.shape

(360, 6)

In [10]:
n_records, n_features = features.shape
learnrate = 0.5 / n_records

# setting up random weights to start
np.random.seed(42)
weights = np.random.normal(scale = (1 / n_features ** .5), size = n_features)
weights

array([ 0.2027827 , -0.05644616,  0.26441774,  0.62177434, -0.09559271,
       -0.09558601])

In [11]:
# loop for some number of epochs

# for each epoch the goal is to update the weights
#  - run the data through the current network to get an output for each input
#  - compute the error for each
#  - use the error and gradient descent to compute the change in w
#  - update the weights

# initialize the change in weights vector
delta_w = np.zeros(weights.shape)

# loop through all features and targets

# unroll the loop
# for x, y in zip(features.values, targets):

x, y = features.values[0], targets.values[0]
(x, y)

(array([ 0.65,  0.78,  0.  ,  1.  ,  0.  ,  0.  ]), 0)

In [12]:
# the linear combination performed by the node on one input
h = np.dot(x, weights)
h

0.70955508430787895

In [13]:
# The neural network output (y_hat), the activation function of h
# using sigmoid(h) as the activation function
output = 1 / (1 - np.exp(-h))
output

1.9679731512000953

In [14]:
# The error. Just the difference between the target and the ouput
# (y - y_hat)
error = y - output
error

-1.9679731512000953

In [15]:
# The output gradient the derivative of the activation function at h - sigmoid_prime(h)
output_gradient = output * (1 - output)
output_gradient

-1.9049451726443378

In [16]:
# The error term (lowercase delta)
error_term = error * output_gradient
error_term

3.7488809542722872

In [17]:
# Now the gradient descent step, computing delta w
delta_w += learnrate * error_term
delta_w

array([ 0.00520678,  0.00520678,  0.00520678,  0.00520678,  0.00520678,
        0.00520678])