## Specifications

Define, train, fit and predict a feedforward neural network for the Boston Housing Price dataset with the following specifications:

* `3` hidden layers each with `15` neurons
* Hidden layer activations `relu`
* Single neuron output layer with `relu` output
* Optimization algorithm `Adam`
* Loss function mean absolute error (`mae`)
* Learning rate = `0.01`
* Epochs = `20`
* Batch size = `10`
* Use 15% of training for your validation set

We have not seen how to specify some of the above parameters so you will have to make use of the Keras documentation (https://keras.io/).  This is going to be an important resource going forward, so make sure you familizarize yourself with it.

In [1]:
%tensorflow_version 1.x
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.datasets import boston_housing
from keras.optimizers import Adam
from sklearn.preprocessing import StandardScaler

Using TensorFlow backend.


## Load Dataset

In [2]:
# Load Datset
(x_train, y_train), (x_test, y_test) = boston_housing.load_data()

print("Training Dataset Size: ", x_train.shape)
print("Training Labels Size: ", y_train.shape)
print("Testing Dataset Size: ", x_test.shape)
print("Testing Labels Size: ", y_test.shape)

Downloading data from https://s3.amazonaws.com/keras-datasets/boston_housing.npz
Training Dataset Size:  (404, 13)
Training Labels Size:  (404,)
Testing Dataset Size:  (102, 13)
Testing Labels Size:  (102,)


In [0]:
# Rescale input features to be roughly in range of [-1, 1]
# See: http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
scalar = StandardScaler()
x_train = scalar.fit_transform(x_train)
x_test = scalar.transform(x_test)

## Build and Train model

In [4]:
# Define our neural network

model = Sequential()

# Add your neural network layers here
# model.add(...)
# ...
model.add(Dense(15, activation="relu", input_dim=13))
model.add(Dense(15, activation="relu"))
model.add(Dense(15, activation="relu"))
model.add(Dense(1, activation="relu"))

model.summary()




Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 15)                210       
_________________________________________________________________
dense_2 (Dense)              (None, 15)                240       
_________________________________________________________________
dense_3 (Dense)              (None, 15)                240       
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 16        
Total params: 706
Trainable params: 706
Non-trainable params: 0
_________________________________________________________________


In [5]:
# Compile your model and define your loss and optimization algorithm
# model.compile(...)
model.compile(loss='mae', optimizer=Adam(lr=0.01))




In [6]:
# Fit your model on the training data
# history = model.fit(...)
history = model.fit(x_train, y_train, batch_size=10, epochs=20, validation_split=0.15)
history.history




Train on 343 samples, validate on 61 samples
Epoch 1/20





Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


{'loss': [15.199913080510175,
  4.368505451491553,
  3.264775250813009,
  3.042085761579063,
  2.8423628859200214,
  2.6389777060847934,
  2.45906331553056,
  2.3928128824984713,
  2.6712100933314065,
  2.3730051939758545,
  2.5253775578546107,
  2.2414974165379826,
  2.368541115574517,
  2.261973062340094,
  2.2089286166794446,
  2.0801818005892696,
  2.1201563083048116,
  2.0315894896365463,
  2.0090102087652024,
  2.024790171631571],
 'val_loss': [7.667207968039591,
  3.817678772035192,
  4.025673944441999,
  3.3599057119400775,
  2.905565269657823,
  3.291010106196169,
  3.257595202961906,
  2.6124556885390984,
  3.2517469281055886,
  2.635693096723713,
  3.0311007265184746,
  3.253472984814253,
  2.5338252177003953,
  2.389275218619675,
  2.326743817720257,
  2.5567000498537156,
  2.241882601722342,
  2.260517241524868,
  2.469722575828677,
  2.2893205705236217]}

## Predict

In [0]:
# Evaluate our trained model on the test set
# model.evaluate(...)
model.evaluate(x_test, y_test)



3.5495626599180934

In [0]:
# Generate the actual predictions for our test set
predictions = model.predict(x_test)
predictions[:10]

array([[ 8.181675],
       [16.993532],
       [18.825325],
       [29.409508],
       [22.34674 ],
       [18.796259],
       [27.41534 ],
       [20.158863],
       [17.21129 ],
       [20.429728]], dtype=float32)

## Questions

Once you have filled in the above and are able to train and predict your model.  Work through the following questions individually or in paris in class and we'll review them as a group.

Hint: When trying different values, it's usually a good idea to re-run your entire notebook by clicking `Kernel -> Restart & Run All`.

1. How many learnable parameters does your model have?  How does this compare with a multiple linear regression model?
1. Did we produce a reasonablel model?  How can we interpret our loss function?
1. Why do we need to use `StandardScaler`?  Try re-running your solution without it, what happens?
1. Why do we re-use the same `StandardScaler` instance for the test data?
1. What happens when we use too big of a learning rate say (`1.0`)?  What about too small (`0.00001`)?  Try it!
1. What happens when we use too few hidden layers? Try to use `0` hidden layers instead of `3`.  What is the minimum to get a decent result?
1. What happens when you use too many hidden layers?  Try to use `15` hidden layers instead of `3`.  What happens?  Try again with `sigmoid` activation function.
1. Try `batch_size` of `0` and `100`.  What is the difference?

## Summary

As you can see neural networks are *very* sensitive to the hyper-parameters (architecture, learning rate, activation functions, etc.)  It's important to have an intuitive understanding of these parameters since you will have to tune them very carefully when building your own network.  Often times your network will produce poor results and many time it is because you have not properly tuned these hyper parameters.