Peer-graded Assignment: Build a Regression Model in Keras Part B

Created by Kuan Yew Cheng, Date: 23.10.2020

In [4]:
# import important libraries

import pandas as pd
import numpy as np

Generate the dataframe from dataset concrete_data

In [5]:
# download the data and read it into a pandas dataframe
concrete_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


As we knew from the lab 3 that the data is cleaned and is ready to be used to build our model.

Split data into predictors and target

In [6]:
# all columns except strength is our predictors column and target is our strength column
concrete_data_columns = concrete_data.columns

predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']]
target = concrete_data['Strength']

Next, normalize our data before applying to the neural network model

In [7]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

# number of predictors
n_cols = predictors_norm.shape[1] 

Build our neural network model

In [8]:
# import Keras library and other related libraries
import keras
from keras.models import Sequential
from keras.layers import Dense
from sklearn.metrics import mean_squared_error

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


First, define the model

In [9]:
# define regression model
def regression_model():
    #create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

Split our dataset into train and test set with normalized dataset

In [10]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3)
print ('Train set:', X_train.shape,  y_train.shape)
print ('Test set:', X_test.shape,  y_test.shape)

Train set: (721, 8) (721,)
Test set: (309, 8) (309,)


Build the model and fit our train data in

In [11]:
# build the model
model = regression_model()

Error Evaluation

In [12]:
mean_squared_errors = []

for i in range(50):   
    # fit the model
    model.fit(X_train, y_train, epochs=50, verbose=2)

    # Evaluate the model on the test data
    y_hat = model.predict(X_test)
    mse = mean_squared_error(y_test, y_hat)
    mean_squared_errors.append(mse)

print(mean_squared_errors)

Epoch 1/50
 - 1s - loss: 1562.2593
Epoch 2/50
 - 0s - loss: 1545.7309
Epoch 3/50
 - 0s - loss: 1528.4956
Epoch 4/50
 - 0s - loss: 1510.8425
Epoch 5/50
 - 0s - loss: 1492.2270
Epoch 6/50
 - 0s - loss: 1472.6384
Epoch 7/50
 - 0s - loss: 1451.8715
Epoch 8/50
 - 0s - loss: 1429.4721
Epoch 9/50
 - 0s - loss: 1406.2254
Epoch 10/50
 - 0s - loss: 1381.2606
Epoch 11/50
 - 0s - loss: 1355.8123
Epoch 12/50
 - 0s - loss: 1328.5653
Epoch 13/50
 - 0s - loss: 1300.3521
Epoch 14/50
 - 0s - loss: 1271.0378
Epoch 15/50
 - 0s - loss: 1241.0034
Epoch 16/50
 - 0s - loss: 1210.0667
Epoch 17/50
 - 0s - loss: 1178.2830
Epoch 18/50
 - 0s - loss: 1146.2552
Epoch 19/50
 - 0s - loss: 1113.4484
Epoch 20/50
 - 0s - loss: 1080.1337
Epoch 21/50
 - 0s - loss: 1046.5034
Epoch 22/50
 - 0s - loss: 1013.0683
Epoch 23/50
 - 0s - loss: 978.9582
Epoch 24/50
 - 0s - loss: 945.2003
Epoch 25/50
 - 0s - loss: 911.0627
Epoch 26/50
 - 0s - loss: 877.5505
Epoch 27/50
 - 0s - loss: 844.1580
Epoch 28/50
 - 0s - loss: 811.3541
Epoch 2

In [13]:
# mean
np.mean(mean_squared_errors)

print("Mean after 50 epochs: %3.3f"  %(np.mean(mean_squared_errors)))

Mean after 50 epochs: 57.234


In [14]:
# standatd deviation
np.std(mean_squared_errors)
print("Std after 50 epochs: %3.3f"  %(np.std(mean_squared_errors)))

Std after 50 epochs: 45.163


The mean of mean squared errors of step B decreased to 57.23 from 110.4 compared to step A after using a normalized version of the data.

The variance increases compared to the unnormalized data, the data wider spread, which lead to decrease in accuracy.