# CS412 - Machine Learning - 2021
## Homework 2
100 pts


## Goal

The goal of this homework is two-fold:

*   Gain experience with neural network approaches
*   Gain experience with the Keras library

## Dataset
You are going to use a house price dataset that we prepared for you, that contains four independent variables (predictors) and one target variable. The task is predicting the target variable (house price) from the predictors (house attributes).


Download the data from SuCourse. Reserve 10% of the training data for validation and use the rest for development (learning your models). The official test data we provide (1,200 samples) should only be used for testing at the end, and not model selection.

## Task 
Build a regressor with a neural network that has only one hidden layer, using the Keras library function calls to predict house prices in the provided dataset.

Your code should follow the given skeleton and try the indicated parameters.

## Preprocessing and Meta-parameters
You should try 10,50 and 100 as hidden node count. 

You should  decide on the learning rate (step size), you can try values such as 0.001, 0.01, 0.1, but you may need to increase if learning is very slow or decrease if you see the loss increase!

You can use either sigmoid or Relu activations for the hidden nodes (indicate with your results) and you should know what to use for the activation for the output layer, input, output layer sizes, and the suitable loss function. 

## Software: 

Keras is a library that we will use especially for deep learning, but also with basic neural network functionality of course.

You may find the necessary function references here: 

http://scikit-learn.org/stable/supervised_learning.html
https://keras.io/api/

When you search for Dense for instance, you should find the relevant function and explained parameters, easily.

## Submission: 

Fill this notebook. Write the report section at the end.

You should prepare a separate pdf document as your homework (name hw1-CS412-yourname.pdf) which consists of the report (Part 8) of the notebook for easy viewing -and- include a link to your notebook from within the pdf report (make sure to include the link obtained from the #share link on top right, **be sure to share with Sabancı University first** as otherwise there will be access problems.). 

##1) Initialize

*   First make a copy of the notebook given to you as a starter.

*   Make sure you choose Connect form upper right.


## 2) Load training dataset

* Load the datasets (train.csv, test.csv) provided on SuCourse on your Google drive and read the datasets using Google Drive's mount functions. 
You may find the necessary functions here: 
https://colab.research.google.com/notebooks/io.ipynb

In [None]:
from google.colab import drive
drive.mount('/content/drive/') 
# click on the url that pops up and give the necessary authorizations

Mounted at /content/drive/


In [None]:
ls

[0m[01;34mdrive[0m/  [01;34msample_data[0m/




*   Set your notebooks working directory to the path where the datasets are uploaded (cd is the linux command for change directory) 
*   You may need to use cd drive/MyDrive depending on your path to the datasets on Google Drive. (don't comment the code in the cells when using linux commands)






In [None]:
cd drive/My\ Drive/cs412-hw2

/content/drive/My Drive/cs412-hw2


* List the files in the current directory.

In [None]:
ls

test.csv  train.csv


##3) Understanding the dataset

There are alot of functions that can be used to know more about this dataset

- What is the shape of the training set (num of samples X number of attributes) ***[shape function can be used]***

- Display attribute names ***[columns function can be used]***

- Display the first 5 rows from training dataset ***[head or sample functions can be used]***

..

In [None]:
# import the necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

train_df = pd.read_csv("train.csv")
# show first 10 elements of the training data
train_df.head(10)

Unnamed: 0,sqmtrs,nrooms,view,crime_rate,price
0,251,5,west,low,925701.721399
1,211,3,west,high,622237.482636
2,128,5,east,low,694998.182376
3,178,3,east,high,564689.015926
4,231,3,west,low,811222.970379
5,253,5,north,high,766250.032506
6,101,1,north,low,512749.401548
7,242,1,north,high,637010.760148
8,174,5,west,high,638136.374869
9,328,2,south,high,787704.988273


In [None]:
# print the shape of data

print("Data dimensionality is: ",train_df.shape)
print("Attribute names: ",train_df.columns)
# also give some statistics about the data like mean, standard deviation etc.

print("Statistics:")
train_df.describe()


Data dimensionality is:  (4800, 5)
Attribute names:  Index(['sqmtrs', 'nrooms', 'view', 'crime_rate', 'price'], dtype='object')
Statistics:


Unnamed: 0,sqmtrs,nrooms,price
count,4800.0,4800.0,4800.0
mean,225.033542,2.983958,725757.0
std,71.851436,1.421251,151041.1
min,100.0,1.0,356498.5
25%,163.0,2.0,617953.6
50%,226.0,3.0,729299.9
75%,287.0,4.0,838928.4
max,349.0,5.0,1076067.0


In [None]:
train_df.head()  #first 5 rows

Unnamed: 0,sqmtrs,nrooms,view,crime_rate,price
0,251,5,west,low,925701.721399
1,211,3,west,high,622237.482636
2,128,5,east,low,694998.182376
3,178,3,east,high,564689.015926
4,231,3,west,low,811222.970379


In [None]:
train_df['nrooms'].value_counts()

2    991
5    976
1    973
3    952
4    908
Name: nrooms, dtype: int64

In [None]:
train_df['sqmtrs'].value_counts()

107    32
240    31
212    31
328    30
116    30
       ..
142    10
170    10
329     9
117     9
176     9
Name: sqmtrs, Length: 250, dtype: int64

In [None]:
train_df['price'].value_counts()

8.008052e+05    1
5.957551e+05    1
5.357551e+05    1
6.171843e+05    1
1.048316e+06    1
               ..
5.659009e+05    1
9.021545e+05    1
5.362886e+05    1
5.813591e+05    1
1.024339e+06    1
Name: price, Length: 4800, dtype: int64

In [None]:
train_df['view'].value_counts()   #categorical

east     1252
south    1212
north    1204
west     1132
Name: view, dtype: int64

In [None]:
train_df['crime_rate'].value_counts() #categorical

high    2406
low     2394
Name: crime_rate, dtype: int64

##4) Preprocessing Steps

As some of the features (predictive variables) on this dataset are categorical (non-numeric) you need to do some preprocessing for those features.

Please read 7-features.pdf under SuCourse for converting categorical variables to dummy variables. You can use as many **dummy or indicator variables** as there are categories within one feature. You can also look at pandas' get_dummies or keras.utils.to_categorical functions.

In neural networks, scaling of the features are important, because they affect the net input of a neuron as a whole. You should use **MinMax scaler** on sklearn for this task, which scales the variables between 0 and 1 on by default. (Remember that mean-squared error loss function tends to be extremely large with unscaled features.)


In [None]:
# encode the categorical variables
#select the categorical features that has more than or equal to 2 classes
multi_label_features=['view','crime_rate']
train_Scaled=pd.get_dummies(data=train_df,columns=multi_label_features)

# scale the features between 0-1

from sklearn.preprocessing import MinMaxScaler
msc = MinMaxScaler()

train_Scaled[['sqmtrs','nrooms','view_east','view_north','view_south','view_west','crime_rate_high','crime_rate_low']]=msc.fit_transform(train_Scaled[['sqmtrs','nrooms','view_east','view_north','view_south','view_west','crime_rate_high','crime_rate_low']])
pd_train_Scaled_upd=pd.DataFrame(train_Scaled)
pd_train_Scaled_upd.head()

Unnamed: 0,sqmtrs,nrooms,price,view_east,view_north,view_south,view_west,crime_rate_high,crime_rate_low
0,0.606426,1.0,925701.721399,0.0,0.0,0.0,1.0,0.0,1.0
1,0.445783,0.5,622237.482636,0.0,0.0,0.0,1.0,1.0,0.0
2,0.11245,1.0,694998.182376,1.0,0.0,0.0,0.0,0.0,1.0
3,0.313253,0.5,564689.015926,1.0,0.0,0.0,0.0,1.0,0.0
4,0.526104,0.5,811222.970379,0.0,0.0,0.0,1.0,0.0,1.0


In [None]:
X=pd_train_Scaled_upd.drop(labels=['price'], axis=1)   #target is dropped
X.head()


Unnamed: 0,sqmtrs,nrooms,view_east,view_north,view_south,view_west,crime_rate_high,crime_rate_low
0,0.606426,1.0,0.0,0.0,0.0,1.0,0.0,1.0
1,0.445783,0.5,0.0,0.0,0.0,1.0,1.0,0.0
2,0.11245,1.0,1.0,0.0,0.0,0.0,0.0,1.0
3,0.313253,0.5,1.0,0.0,0.0,0.0,1.0,0.0
4,0.526104,0.5,0.0,0.0,0.0,1.0,0.0,1.0


In [None]:
Y=pd_train_Scaled_upd[['price']]  #target
Y.head()

Unnamed: 0,price
0,925701.721399
1,622237.482636
2,694998.182376
3,564689.015926
4,811222.970379


Don't forget the split the training data to obtain a validation set. **Use random_state=42**

In [None]:
# split 90-10
from sklearn.model_selection import train_test_split
xtrain,xvalid,ytrain,yvalid=train_test_split(X,Y,train_size=0.9,random_state=42)  #seperate validation and training
print(xtrain.shape,xvalid.shape,ytrain.shape,yvalid.shape)

(4320, 8) (480, 8) (4320, 1) (480, 1)


##5) Train neural networks on development data and do model selection using the validation data


* Train a neural network with **one hidden layer** (try 3 different values for the number of neurons in that hidden layer, as 25, 50, 100), you will need to correctly choose the optimizer and the loss function that this model will train with. Use batch_size as 64 and train each model for 30 epochs. 

* Train another neural network with two hidden layers with meta-parameters of your choice. Again, use batch_size as 64 and train the model for 30 epochs. 

* **Bonus (5 pts)** Train a KNN or a Decision Tree model with your own choice of meta parameters to predict the house prices.


In [None]:
# train one-hidden layered neural networks
# define your model architecture

from keras import Sequential
from keras.layers import Dense
from tensorflow.keras.optimizers import Adam, SGD
import time

#models with # of hidden nodes=25 

#0.1 learning rate
model_hidden25_learning1=Sequential()
model_hidden25_learning1.add(Dense(25, activation="relu", input_shape=(8,)))
model_hidden25_learning1.add(Dense(1, activation='linear')) #since it is a regression problem
#model_hidden25_learning1.summary()

#0.01 learning rate
model_hidden25_learning01=Sequential()
model_hidden25_learning01.add(Dense(25, activation="relu", input_shape=(8,)))
model_hidden25_learning01.add(Dense(1, activation='linear')) #since it is a regression problem
#model_hidden25_learning01.summary()

#0.001 learning rate
model_hidden25_learning001=Sequential()
model_hidden25_learning001.add(Dense(25, activation="relu", input_shape=(8,)))
model_hidden25_learning001.add(Dense(1, activation='linear')) #since it is a regression problem
#model_hidden25_learning001.summary()


In [None]:
#models with # of hidden nodes=50 

#0.1 learning rate
model_hidden50_learning1=Sequential()
model_hidden50_learning1.add(Dense(50, activation="relu", input_shape=(8,)))
model_hidden50_learning1.add(Dense(1, activation='linear')) #since it is a regression problem
#model_hidden50_learning1.summary()

#0.01 learning rate
model_hidden50_learning01=Sequential()
model_hidden50_learning01.add(Dense(50, activation="relu", input_shape=(8,)))
model_hidden50_learning01.add(Dense(1, activation='linear')) #since it is a regression problem
#model_hidden50_learning01.summary()

#0.001 learning rate
model_hidden50_learning001=Sequential()
model_hidden50_learning001.add(Dense(50, activation="relu", input_shape=(8,)))
model_hidden50_learning001.add(Dense(1, activation='linear')) #since it is a regression problem
#model_hidden50_learning001.summary()

In [None]:
#models with # of hidden nodes=100

#0.1 learning rate
model_hidden100_learning1=Sequential()
model_hidden100_learning1.add(Dense(100, activation="relu", input_shape=(8,)))
model_hidden100_learning1.add(Dense(1, activation='linear')) #since it is a regression problem
#model_hidden100_learning1.summary()

#0.01 learning rate
model_hidden100_learning01=Sequential()
model_hidden100_learning01.add(Dense(100, activation="relu", input_shape=(8,)))
model_hidden100_learning01.add(Dense(1, activation='linear')) #since it is a regression problem
#model_hidden100_learning01.summary()

#0.001 learning rate
model_hidden100_learning001=Sequential()
model_hidden100_learning001.add(Dense(100, activation="relu", input_shape=(8,)))
model_hidden100_learning001.add(Dense(1, activation='linear')) #since it is a regression problem
#model_hidden100_learning001.summary()

In [None]:
# compile your model with an optimizer - learning rate 0.001
adam_001 = Adam(learning_rate=0.001)

start_time1 = time.time()
model_hidden25_learning001.compile(optimizer = adam_001, loss = 'mean_squared_error', metrics=['accuracy'])
# fit the model on training data
model_hidden25_learning001.fit(xtrain, ytrain, batch_size=64, epochs=30,shuffle=True, verbose=1)
print("execution time for model nodes=25 0.001 learning: ", time.time()-start_time1);

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
execution time for model nodes=25 0.001 learning:  10.835812330245972


In [None]:
adam_001 = Adam(learning_rate=0.001)

start_time2 = time.time()
model_hidden50_learning001.compile(optimizer = adam_001, loss = 'mean_squared_error', metrics=['accuracy'])
# fit the model on training data
model_hidden50_learning001.fit(xtrain, ytrain, batch_size=64, epochs=30,shuffle=True, verbose=1)
print("execution time for model nodes=50 0.001 learning: ", time.time()-start_time2);

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
execution time for model nodes=50 0.001 learning:  8.306734800338745


In [None]:
adam_001 = Adam(learning_rate=0.001)

start_time3 = time.time()
model_hidden100_learning001.compile(optimizer = adam_001, loss = 'mean_squared_error', metrics=['accuracy'])
# fit the model on training data
model_hidden100_learning001.fit(xtrain, ytrain, batch_size=64, epochs=30,shuffle=True, verbose=1)
print("execution time for model nodes=100 0.001 learning: ", time.time()-start_time3);

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
execution time for model nodes=100 0.001 learning:  10.74315857887268


In [None]:
# compile your model with an optimizer - learning rate 0.01
adam_01 = Adam(learning_rate=0.01)

start_time1 = time.time()
model_hidden25_learning01.compile(optimizer = adam_01, loss = 'mean_squared_error', metrics=['accuracy'])
# fit the model on training data
model_hidden25_learning01.fit(xtrain, ytrain, batch_size=64, epochs=30,shuffle=True, verbose=1)
print("execution time for model nodes=25 0.01 learning: ", time.time()-start_time1);

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
execution time for model nodes=25 0.01 learning:  10.747376680374146


In [None]:
adam_01 = Adam(learning_rate=0.01)

start_time2 = time.time()
model_hidden50_learning01.compile(optimizer = adam_01, loss = 'mean_squared_error', metrics=['accuracy'])
# fit the model on training data
model_hidden50_learning01.fit(xtrain, ytrain, batch_size=64, epochs=30,shuffle=True, verbose=1)
print("execution time for model nodes=50 0.01 learning: ", time.time()-start_time2);

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
execution time for model nodes=50 0.01 learning:  10.748280048370361


In [None]:
adam_01 = Adam(learning_rate=0.01)

start_time1 = time.time()
model_hidden100_learning01.compile(optimizer = adam_01, loss = 'mean_squared_error', metrics=['accuracy'])
# fit the model on training data
model_hidden100_learning01.fit(xtrain, ytrain, batch_size=64, epochs=30,shuffle=True, verbose=1)
print("execution time for model nodes=100 0.01 learning: ", time.time()-start_time1);

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
execution time for model nodes=100 0.01 learning:  10.733232021331787


In [None]:
adam_1 = Adam(learning_rate=0.1)

start_time1 = time.time()
model_hidden25_learning1.compile(optimizer = adam_1, loss = 'mean_squared_error', metrics=['accuracy'])
# fit the model on training data
model_hidden25_learning01.fit(xtrain, ytrain, batch_size=64, epochs=30,shuffle=True, verbose=1)
print("execution time for model nodes=25 0.1 learning: ", time.time()-start_time1);

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
execution time for model nodes=25 0.1 learning:  10.320895671844482


In [None]:
adam_1 = Adam(learning_rate=0.1)

start_time2 = time.time()
model_hidden50_learning1.compile(optimizer = adam_1, loss = 'mean_squared_error', metrics=['accuracy'])
# fit the model on training data
model_hidden50_learning1.fit(xtrain, ytrain, batch_size=64, epochs=30,shuffle=True, verbose=1)
print("execution time for model nodes=50 0.1 learning: ", time.time()-start_time2);

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
execution time for model nodes=50 0.1 learning:  10.744277238845825


In [None]:
adam_1 = Adam(learning_rate=0.1)

start_time3 = time.time()
model_hidden100_learning1.compile(optimizer = adam_1, loss = 'mean_squared_error', metrics=['accuracy'])
# fit the model on training data
model_hidden100_learning1.fit(xtrain, ytrain, batch_size=64, epochs=30,shuffle=True, verbose=1)
print("execution time for model nodes=100 0.1 learning: ", time.time()-start_time3);

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30
execution time for model nodes=100 0.1 learning:  8.275849103927612


In [None]:
# train a two-hidden layered neural network
#model 4 two hidden layer # of hidden nodes=25
model_twolayers_Hidden25_learning001=Sequential()
model_twolayers_Hidden25_learning001.add(Dense(25, activation="relu", input_shape=(8,)))
model_twolayers_Hidden25_learning001.add(Dense(25, activation="relu"))
model_twolayers_Hidden25_learning001.add(Dense(1, activation='linear')) #since it is a regression problem
#model_twolayers_Hidden25_learning001.summary()

# compile your model with an optimizer
adam001 = Adam(learning_rate=0.001)
model_twolayers_Hidden25_learning001.compile(optimizer = adam001, loss = 'mean_squared_error', metrics=['accuracy'])
# fit the model on training data
model_twolayers_Hidden25_learning001.fit(xtrain, ytrain, batch_size=64, epochs=30,shuffle=True, verbose=1)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<keras.callbacks.History at 0x7f10602de6d0>

In [None]:
# train a two-hidden layered neural network
#model 4 two hidden layer # of hidden nodes=50
model_twolayers_Hidden50_learning001=Sequential()
model_twolayers_Hidden50_learning001.add(Dense(50, activation="relu", input_shape=(8,)))
model_twolayers_Hidden50_learning001.add(Dense(50, activation="relu"))
model_twolayers_Hidden50_learning001.add(Dense(1, activation='linear')) #since it is a regression problem
#model_twolayers_Hidden50_learning001.summary()

# compile your model with an optimizer
adam001 = Adam(learning_rate=0.001)
model_twolayers_Hidden50_learning001.compile(optimizer = adam001, loss = 'mean_squared_error', metrics=['accuracy'])
# fit the model on training data
model_twolayers_Hidden50_learning001.fit(xtrain, ytrain, batch_size=64, epochs=30,shuffle=True, verbose=1)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<keras.callbacks.History at 0x7f106017abd0>

In [None]:
# train a two-hidden layered neural network
#model 4 two hidden layer # of hidden nodes=100
model_twolayers_Hidden100_learning001=Sequential()
model_twolayers_Hidden100_learning001.add(Dense(100, activation="relu", input_shape=(8,)))
model_twolayers_Hidden100_learning001.add(Dense(100, activation="relu"))
model_twolayers_Hidden100_learning001.add(Dense(1, activation='linear')) #since it is a regression problem
#model_twolayers_Hidden100_learning001.summary()

# compile your model with an optimizer
adam001 = Adam(learning_rate=0.001)
model_twolayers_Hidden100_learning001.compile(optimizer = adam001, loss = 'mean_squared_error', metrics=['accuracy'])
# fit the model on training data
model_twolayers_Hidden100_learning001.fit(xtrain, ytrain, batch_size=64, epochs=30,shuffle=True, verbose=1)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<keras.callbacks.History at 0x7f10603e4190>

In [None]:
#decision tree
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor

# decision tree with max depth 20, min samples split 100
dt=DecisionTreeRegressor(max_depth=20, min_samples_split=100)
print(dt.fit(xtrain,ytrain))

#knn with k=3
knn_1=KNeighborsRegressor(n_neighbors=3)
print(knn_1.fit(xtrain,ytrain))

DecisionTreeRegressor(max_depth=20, min_samples_split=100)
KNeighborsRegressor(n_neighbors=3)


## 6) Test your trained classifiers on the Validation set
Test your trained classifiers on the validation set and print the mean squared errors.


In [None]:
# tests on validation
from sklearn.metrics import mean_squared_error
ypred_val_model1_1=model_hidden25_learning001.predict(xvalid)
print("Model with hidden nodes=25 learning rate 0.001: ",mean_squared_error(yvalid, ypred_val_model1_1))

ypred_val_model1_2=model_hidden25_learning01.predict(xvalid)
print("Model with hidden nodes=25 learning rate 0.01: ",mean_squared_error(yvalid, ypred_val_model1_2))

ypred_val_model1_3=model_hidden25_learning1.predict(xvalid)
print("Model with hidden nodes=25: learning rate 0.1:",mean_squared_error(yvalid, ypred_val_model1_3))

print("\n")

ypred_val_model2_1=model_hidden50_learning001.predict(xvalid)
print("Model with hidden nodes=50 learning rate 0.001: ",mean_squared_error(yvalid, ypred_val_model2_1))

ypred_val_model2_2=model_hidden50_learning01.predict(xvalid)
print("Model with hidden nodes=50 learning rate 0.01: ",mean_squared_error(yvalid, ypred_val_model2_2))

ypred_val_model2_3=model_hidden50_learning1.predict(xvalid)
print("Model with hidden nodes=50: learning rate 0.1:",mean_squared_error(yvalid, ypred_val_model2_3))

print("\n")

ypred_val_model3_1=model_hidden100_learning001.predict(xvalid)
print("Model with hidden nodes=100 learning rate 0.001: ",mean_squared_error(yvalid, ypred_val_model3_1))

ypred_val_model3_2=model_hidden100_learning01.predict(xvalid)
print("Model with hidden nodes=100 learning rate 0.01: ",mean_squared_error(yvalid, ypred_val_model3_2))

ypred_val_model3_3=model_hidden100_learning1.predict(xvalid)
print("Model with hidden nodes=100: learning rate 0.1:",mean_squared_error(yvalid, ypred_val_model3_3))

Model with hidden nodes=25 learning rate 0.001:  541393047137.27734
Model with hidden nodes=25 learning rate 0.01:  282709350136.288
Model with hidden nodes=25: learning rate 0.1: 542179368013.8864


Model with hidden nodes=50 learning rate 0.001:  540235766989.50275
Model with hidden nodes=50 learning rate 0.01:  385626648447.6432
Model with hidden nodes=50: learning rate 0.1: 3142919648.2428823


Model with hidden nodes=100 learning rate 0.001:  538762827554.7875
Model with hidden nodes=100 learning rate 0.01:  276592542402.69495
Model with hidden nodes=100: learning rate 0.1: 1480239932.3440094


In [None]:
#test validation on two hidden layer models.
# I only implemented 0.001 learning rate in two hidden models since it gives the less error.

ypred_val_model4_1=model_twolayers_Hidden25_learning001.predict(xvalid)
print("Model with two hidden layers each has # of nodes=25 learning rate 0.001: ",mean_squared_error(yvalid, ypred_val_model4_1))

ypred_val_model4_2=model_twolayers_Hidden50_learning001.predict(xvalid)
print("Model with two hidden layers each has # of nodes=50 learning rate 0.001: ",mean_squared_error(yvalid, ypred_val_model4_2))

ypred_val_model4_3=model_twolayers_Hidden100_learning001.predict(xvalid)
print("Model with two hidden layers each has # of nodes=100 learning rate 0.001: ",mean_squared_error(yvalid, ypred_val_model4_3))

Model with two hidden layers each has # of nodes=25 learning rate 0.001:  462180123089.03125
Model with two hidden layers each has # of nodes=50 learning rate 0.001:  270524833620.74155
Model with two hidden layers each has # of nodes=100 learning rate 0.001:  21749074098.96465


In [None]:
#evaluate decision tree and knn on validation

#decision tree evaluation
ypred_val_dt=dt.predict(xvalid)
print("decision tree max depth 20 min samples split 100: ",mean_squared_error(yvalid, ypred_val_dt))

#knn evaluation
ypred_val_knn=knn_1.predict(xvalid)
print("knn with k=3: ",mean_squared_error(yvalid, ypred_val_knn))

decision tree max depth 20 min samples split 100:  374562253.66377574
knn with k=3:  45096923.734956056


## 7) Test your classifier on Test set

- Load test data
- Apply same pre-processing as training data (encoding categorical variables, scaling)
- Predict the labels of testing data **using the best model that you have selected according to your validation results** and report the mean squared error. 

In [None]:
# test results
test_df = pd.read_csv("test.csv")
test_df.head()

Unnamed: 0,sqmtrs,nrooms,view,crime_rate,price
0,349,3,south,high,836553.5
1,169,1,west,high,512741.6
2,233,3,south,high,663880.6
3,340,4,north,low,1000086.0
4,199,2,east,low,745015.1


In [None]:
# encode the categorical variables
#select the categorical features that has more than or equal to 2 classes
multi_label_features=['view','crime_rate']
test_Scaled=pd.get_dummies(data=test_df,columns=multi_label_features)

# scale the features between 0-1

from sklearn.preprocessing import MinMaxScaler
msc = MinMaxScaler()

test_Scaled[['sqmtrs','nrooms','view_east','view_north','view_south','view_west','crime_rate_high','crime_rate_low']]=msc.fit_transform(test_Scaled[['sqmtrs','nrooms','view_east','view_north','view_south','view_west','crime_rate_high','crime_rate_low']])
pd_test_Scaled_upd=pd.DataFrame(test_Scaled)
pd_test_Scaled_upd.head()

Unnamed: 0,sqmtrs,nrooms,price,view_east,view_north,view_south,view_west,crime_rate_high,crime_rate_low
0,1.0,0.5,836553.5,0.0,0.0,1.0,0.0,1.0,0.0
1,0.277108,0.0,512741.6,0.0,0.0,0.0,1.0,1.0,0.0
2,0.534137,0.5,663880.6,0.0,0.0,1.0,0.0,1.0,0.0
3,0.963855,0.75,1000086.0,0.0,1.0,0.0,0.0,0.0,1.0
4,0.39759,0.25,745015.1,1.0,0.0,0.0,0.0,0.0,1.0


In [None]:
xtest=pd_test_Scaled_upd.drop(labels=['price'], axis=1)   #target is dropped
xtest.head()


Unnamed: 0,sqmtrs,nrooms,view_east,view_north,view_south,view_west,crime_rate_high,crime_rate_low
0,1.0,0.5,0.0,0.0,1.0,0.0,1.0,0.0
1,0.277108,0.0,0.0,0.0,0.0,1.0,1.0,0.0
2,0.534137,0.5,0.0,0.0,1.0,0.0,1.0,0.0
3,0.963855,0.75,0.0,1.0,0.0,0.0,0.0,1.0
4,0.39759,0.25,1.0,0.0,0.0,0.0,0.0,1.0


In [None]:
ytest=pd_test_Scaled_upd[['price']]  #target
ytest.head()

Unnamed: 0,price
0,836553.5
1,512741.6
2,663880.6
3,1000086.0
4,745015.1


In [None]:
# Evaluate the model on the test data using `evaluate`
# the one that has 100 nodes and 0.1 learning rate gives the best result in one layer

print('Evaluate on test data')
ypred_test_model=model_hidden100_learning1.predict(xtest)
#convert minmaxscaled values back to original 
print("Model with hidden nodes=100 and learning rate 0.1: ",mean_squared_error(ytest, ypred_test_model)*100)

Evaluate on test data
Model with hidden nodes=100 and learning rate 0.1:  146082728149.01855


In [None]:
scores = model_hidden100_learning1.evaluate(xtest, ytest, verbose=0)
print("%s: %.2f%%" % (model_hidden100_learning1.metrics_names[1], scores[1]*100))
print("%s: %.2f%%" % (model_hidden100_learning1.metrics_names[0], scores[0]*100))

accuracy: 0.00%
loss: 146082752000.00%


##8) Report Your Results

**Notebook should be RUN:** As training and testing may take a long time, we may just look at your notebook results without running the code again; so make sure **each cell is run**, so outputs are there.

**Report:** Write an **1-2 page summary** of your approach to this problem **below**. 

**Must include statements such as those below:**
**(Remove the text in parentheses, below, and include your own report)**

  In this homework, our problem is to implement several neural network models with different parameters and find the one that predicts house prices with less error. In the dataset, there are information about square meters, number of rooms, view, crime rate in the location and price (price) values belong to the distinct homes. 

  The size of training dataset is 4800 rows and 5 columns; which are sqmtrs, nrooms, view, crime_rate and price. Price is the target column. View and crime_rate columns are categorical. Since machine learning models require input variables numeric, we have to encode it before developing the model. This stage is called feature extraction and preprocessing. After encoding the categorical columns by using get_dummies, all features are scaled between 0-1. This is performed to normalize the features of data with MinMaxScaler. This is important because we do not want a dominance of a variable just because its quantity is higher. 

  After these, the target is separated from the dataframe.  By using train_test_split, the training set is separated as 90% of it for training, 10% of it for validation.

  Since it is a regression problem, mean squared error is the suitable loss function. In regressions, the last node in neural network should use linear activation function. I chose to use ReLu activation function in hidden layer nodes.

  It should be noted that while preprocessing the dataset, I did not scale the price which is the target.


**Add your observations as follows** (keep the questions for easy grading/context) in the report part of your notebook.

**Observations**

- Try a few learning rates for N=25 hidden neurons,  train for the indicated amount of epochs. Comment on what happens when learning rate is large or small? What is a good number/range for the learning rate?
  

number of nodes/ learning rate  | 0.001|0.01|0.1
-------------------|-------------------|-------------------|-------------------
N=25   |541393047137.27 | 282709350136.2|542179368013.88

  I implemented one hidden layer that has 25 nodes with different learning rates. Learning rate is used step size calculations. The minimum error is obtained when learning rate is equal to 0.01 in the case of N=25. If a learning rate is too large it probably passes the optimal point, since the step size will be higher with a high learning rate. In the case of too small learning rate, we will need too many iterations to converge the optimal solution. The range of values for learning rate is less than 1.0 and greater than 10^(-6). However, default value is taken 0.1 or 0.01 in general.

- Use that learning rate and vary the number of hidden neurons for the given values and try the indicated number of epochs. Give the validation mean squared errors for different approach and meta-parameters tried **in a table** and state which one you selected as your model. How many hidden neurons give the best model? 

  only one layer:

number of nodes/ learning rate  | 0.001|0.01|0.1
-------------------|-------------------|-------------------|-------------------
N=25   | 541393047137.27|	282709350136.2 |	542179368013.88
N=50   |540235766989.50|	385626648447.64	|3142919648.24
N=100  |538762827554.78|	276592542402.69|	**1480239932.34**


   Two layers with same amounts of nodes (this are implemented only with learning rate 0.001) :

number of nodes/ learning rate  | 0.001
-------------------|-------------------
N=25   |462180123089.03
N=50   | 270524833620.74 
N=100  |21749074098.96

The one that has one layer with 100 nodes and uses 0.1 learning rate gives the best result. The mean squared error of it is the smallest one compared to others. It should be noted that while too few nodes will cause us to get high error, too many nodes may overfit.  

- State  what your test results are with the chosen approach and meta-parameters: e.g. "We have obtained the best results on the validation set with the ..........approach using a value of ...... for .... parameter. The result of this model on the test data is ..... % accuracy."" 

  I have obtained the best result on the validation set with the neural network model that has one layer with 100 nodes and 0.1 learning rate. The mean squared error of this model on test set is 146082728149.01.

- How slow is learning? Any other problems?

  When the number of nodes is kept constant and the learning rate is changed, more time is expected to spent in the small learning rate due to the small step sizes. It is obtained as expected.

model  | measured execution time
-------------------|-------------------
One layer, hidden node=25 rate=0.001 | 10.83 seconds
One layer, hidden node=25 rate=0.01  | 10.74  seconds
One layer, hidden node=25 rate=0.1 | 10.32 seconds

In addition to the neural network models, I also implemented knn and decision tree models. The mean squared errors are as follows:

model  | mean squared error
-------------------|-------------------
decision tree max depth 20 min samples split 100: |	374562253.66
knn with k=3:|	**45096923.73**

