<a href="https://cognitiveclass.ai"><img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DL0101EN-SkillsNetwork/images/IDSN-logo.png" width="400"> </a>

<h1 align=center><font size = 5>Regression Models with Keras</font></h1>


<h2>Regression Models with Keras</h2>
 

<a id="item31"></a>


## Download and Clean Dataset


Let's start by importing the <em>pandas</em> and the Numpy libraries.


In [112]:
# All Libraries required for this lab are listed below. The libraries pre-installed on Skills Network Labs are commented. 
# If you run this notebook on a different environment, e.g. your desktop, you may need to uncomment and install certain libraries.

#!pip install numpy==1.21.4
#!pip install pandas==1.3.4
#!pip install keras==2.1.6

In [113]:
import pandas as pd
import numpy as np

import warnings
warnings.simplefilter('ignore', FutureWarning)


<strong>The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:</strong>

<strong>1. Cement</strong>

<strong>2. Blast Furnace Slag</strong>

<strong>3. Fly Ash</strong>

<strong>4. Water</strong>

<strong>5. Superplasticizer</strong>

<strong>6. Coarse Aggregate</strong>

<strong>7. Fine Aggregate</strong>


Let's download the data and read it into a <em>pandas</em> dataframe.


In [114]:
concrete_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


#### Let's check how many data points we have.


In [115]:
concrete_data.shape

(1030, 9)

So, there are approximately 1000 samples to train our model on. Because of the few samples, we have to be careful not to overfit the training data.


Let's check the dataset for any missing values.


In [116]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [117]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

The data looks very clean and is ready to be used to build our model.


#### Split data into predictors and target


The target variable in this problem is the concrete sample strength. Therefore, our predictors will be all the other columns.


In [118]:
concrete_data_columns = concrete_data.columns

predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

<a id="item2"></a>


Let's do a quick sanity check of the predictors and the target dataframes.


In [119]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [120]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

Let's save the number of predictors to *n_cols* since we will need this number when building our network.


In [121]:
n_cols = predictors.shape[1] # number of predictors
print("nombre de colonnes = ", n_cols)

nombre de colonnes =  8


<a id="item1"></a>


<a id='item32'></a>


## Import Keras


In [122]:
import keras

Let's import the rest of the packages from the Keras library that we will need to build our regressoin model.


In [123]:
from keras.models import Sequential
from keras.layers import Dense

<a id='item33'></a>


## Build a Neural Network


Let's define a function that defines our regression model for us so that we can conveniently call it to create our model.


In [124]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    #model.add(Dense(50, activation='relu'))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

The above function create a model that has one hidden layer of 10 hidden units.


<a id="item4"></a>


<a id='item34'></a>


## Train and Test the Network


Let's call the function now to create our model.


In [125]:
# split the data
import sklearn
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3)
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

(721, 8)
(721,)
(309, 8)
(309,)


In [126]:
# build the model
model = regression_model()

Next, we will train and test the model at the same time using the *fit* method. We will leave out 30% of the data for validation and we will train the model for 50 epochs.


In [127]:
# fit the model
model.fit(X_train, y_train, validation_data = (X_test, y_test), epochs=50, verbose=2)

Train on 721 samples, validate on 309 samples
Epoch 1/50
 - 3s - loss: 21918.6767 - val_loss: 11430.7758
Epoch 2/50
 - 0s - loss: 8766.2087 - val_loss: 5398.9121
Epoch 3/50
 - 0s - loss: 5496.2561 - val_loss: 4079.9123
Epoch 4/50
 - 0s - loss: 4504.0893 - val_loss: 3423.9352
Epoch 5/50
 - 0s - loss: 3841.7936 - val_loss: 2973.1243
Epoch 6/50
 - 0s - loss: 3371.9071 - val_loss: 2597.5245
Epoch 7/50
 - 0s - loss: 2978.4341 - val_loss: 2301.3650
Epoch 8/50
 - 0s - loss: 2656.2482 - val_loss: 2048.0977
Epoch 9/50
 - 0s - loss: 2378.4162 - val_loss: 1830.5407
Epoch 10/50
 - 0s - loss: 2134.9477 - val_loss: 1642.1615
Epoch 11/50
 - 0s - loss: 1921.8316 - val_loss: 1473.4080
Epoch 12/50
 - 0s - loss: 1730.6573 - val_loss: 1327.2984
Epoch 13/50
 - 0s - loss: 1564.9221 - val_loss: 1193.3499
Epoch 14/50
 - 0s - loss: 1410.5737 - val_loss: 1072.4709
Epoch 15/50
 - 0s - loss: 1273.2740 - val_loss: 965.0422
Epoch 16/50
 - 0s - loss: 1148.5465 - val_loss: 870.9440
Epoch 17/50
 - 0s - loss: 1040.4133

<keras.callbacks.History at 0x7f855ddf9b90>

In [128]:
# evaluate the model
from sklearn.metrics import mean_squared_error
print("Error =",mean_squared_error(y_test, model.predict(X_test)))

Error = 166.7348624181761


In [129]:
# do the same 50 times and store the score in np array
def run_model():
    X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3)
    model = regression_model()
    model.fit(X_train, y_train, validation_data = (X_test, y_test), epochs=50, verbose=2)
    score = model.evaluate(X_test, y_test, verbose=0)
    return score

scores = np.zeros(50)
for loop in range(np.size(scores)):
    print(f"-------------------- loop {loop}  ----------------------")
    scores[loop] = run_model()
    print("score = ",scores[loop])    
mean = np.mean(scores)
std_dev = np.std(scores)
print("Mean:", mean)
print("Standard Deviation:", std_dev) 

-------------------- loop 0  ----------------------
Train on 721 samples, validate on 309 samples
Epoch 1/50
 - 3s - loss: 1559.5296 - val_loss: 1596.0505
Epoch 2/50
 - 0s - loss: 1544.1596 - val_loss: 1580.2091
Epoch 3/50
 - 0s - loss: 1528.8945 - val_loss: 1564.1635
Epoch 4/50
 - 0s - loss: 1513.4001 - val_loss: 1548.1753
Epoch 5/50
 - 0s - loss: 1497.8289 - val_loss: 1531.7101
Epoch 6/50
 - 0s - loss: 1481.6244 - val_loss: 1515.0231
Epoch 7/50
 - 0s - loss: 1465.1432 - val_loss: 1497.4118
Epoch 8/50
 - 0s - loss: 1447.8021 - val_loss: 1479.1319
Epoch 9/50
 - 0s - loss: 1429.7705 - val_loss: 1459.9961
Epoch 10/50
 - 0s - loss: 1410.8778 - val_loss: 1439.7320
Epoch 11/50
 - 0s - loss: 1390.5961 - val_loss: 1418.5179
Epoch 12/50
 - 0s - loss: 1369.4125 - val_loss: 1395.6591
Epoch 13/50
 - 0s - loss: 1346.5465 - val_loss: 1372.2751
Epoch 14/50
 - 0s - loss: 1323.0105 - val_loss: 1347.0913
Epoch 15/50
 - 0s - loss: 1297.7253 - val_loss: 1321.5391
Epoch 16/50
 - 0s - loss: 1271.8520 - val

In [130]:
# Result for A :
# Mean: 371.83864011400334
# Standard Deviation: 113.7774432852637


## Change Log

|  Date (YYYY-MM-DD) |  Version | Changed By  |  Change Description |
|---|---|---|---|
| 2020-09-21  | 2.0  | Srishti  |  Migrated Lab to Markdown and added to course repo in GitLab |



<hr>

## <h3 align="center"> © IBM Corporation 2020. All rights reserved. <h3/>


This notebook is part of a course on **Coursera** called *Introduction to Deep Learning & Neural Networks with Keras*. If you accessed this notebook outside the course, you can take this course online by clicking [here](https://cocl.us/DL0101EN_Coursera_Week3_LAB1).


<hr>

Copyright &copy; 2019 [IBM Developer Skills Network](https://cognitiveclass.ai/?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/).
