## Download and Clean Dataset

In [1]:
import pandas as pd
import numpy as np

Using the dataset provided for the assignment.

The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:

1. Cement

2. Blast Furnace Slag

3. Fly Ash

4. Water

5. Superplasticizer

6. Coarse Aggregate

7. Fine Aggregate

Read dataset into Pandas dataframe.

In [2]:
concrete_data = pd.read_csv('https://cocl.us/concrete_data')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


The first concrete sample has 540 cubic meter of cement, 0 cubic meter of blast furnace slag, 0 cubic meter of fly ash, 162 cubic meter of water, 2.5 cubic meter of superplasticizer, 1040 cubic meter of coarse aggregate, 676 cubic meter of fine aggregate. It is 28 days old, and has a compressive strength of 79.99 MPa.

Total number of data points.

In [3]:
concrete_data.shape

(1030, 9)

There are approximately 1,000 samples for model training. Because of the limited number, there is the possibility of overfitting the training data. First, check dataset for any missing values.

In [4]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [5]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

Data looks good, so let's begin building the model.

Target variable is the concrete sample strength. Our predictors will then be all the other columns.

In [22]:
concrete_data_columns = concrete_data.columns
predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

In [23]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [8]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

Nomalize the data.

In [24]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


In [25]:
n_cols = predictors_norm.shape[1] # number of predictors

In [11]:
import keras

Using TensorFlow backend.


In [12]:
from keras.models import Sequential
from keras.layers import Dense

In [26]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

The function creates a model that has one hidden layer with 10 neurons and a ReLU activation function. It uses the adam optimizer and the MSE as the loss function.

Import scikit-learn to randomly split the data into a training and test sets.

In [14]:
from sklearn.model_selection import train_test_split

Split the data into a training and test sets by holding 30% of the data for testing.

In [27]:
X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=42)

Create the new model.

In [28]:
model = regression_model()

In [29]:
epochs = 50
model.fit(X_train, y_train, epochs=epochs, verbose=2)

Epoch 1/50
 - 1s - loss: 1570.6187
Epoch 2/50
 - 0s - loss: 1553.9174
Epoch 3/50
 - 0s - loss: 1537.2339
Epoch 4/50
 - 0s - loss: 1520.3501
Epoch 5/50
 - 0s - loss: 1502.9992
Epoch 6/50
 - 0s - loss: 1485.2572
Epoch 7/50
 - 0s - loss: 1466.4388
Epoch 8/50
 - 0s - loss: 1446.8799
Epoch 9/50
 - 0s - loss: 1426.3049
Epoch 10/50
 - 4s - loss: 1404.5653
Epoch 11/50
 - 3s - loss: 1382.2150
Epoch 12/50
 - 0s - loss: 1358.6266
Epoch 13/50
 - 0s - loss: 1334.0378
Epoch 14/50
 - 0s - loss: 1308.7524
Epoch 15/50
 - 0s - loss: 1282.4236
Epoch 16/50
 - 0s - loss: 1255.2937
Epoch 17/50
 - 0s - loss: 1227.3633
Epoch 18/50
 - 0s - loss: 1198.4703
Epoch 19/50
 - 0s - loss: 1169.1865
Epoch 20/50
 - 0s - loss: 1139.0894
Epoch 21/50
 - 0s - loss: 1108.6009
Epoch 22/50
 - 0s - loss: 1077.2385
Epoch 23/50
 - 0s - loss: 1045.4397
Epoch 24/50
 - 0s - loss: 1013.1827
Epoch 25/50
 - 0s - loss: 980.9313
Epoch 26/50
 - 0s - loss: 947.8398
Epoch 27/50
 - 0s - loss: 915.3482
Epoch 28/50
 - 0s - loss: 882.2335
Epoch

<keras.callbacks.History at 0x7f7ffc095518>

Evaluate model on the test data.

In [30]:
loss_val = model.evaluate(X_test, y_test)
y_pred = model.predict(X_test)
loss_val



309.12910352787156

Compute the mean between the predicted concrete strength and the actual concrete strength.

In [31]:
from sklearn.metrics import mean_squared_error

In [32]:
mean_square_error = mean_squared_error(y_test, y_pred)
mean = np.mean(mean_square_error)
standard_deviation = np.std(mean_square_error)
print(mean, standard_deviation)

309.12910176516675 0.0


Create list of 50 mean squared errors and report mean and the standard deviation of the mean squared errors.

In [33]:
total_mean_squared_errors = 50
epochs = 100
mean_squared_errors = []
for i in range(0, total_mean_squared_errors):
    X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=i)
    model.fit(X_train, y_train, epochs=epochs, verbose=0)
    MSE = model.evaluate(X_test, y_test, verbose=0)
    print("MSE "+str(i+1)+": "+str(MSE))
    y_pred = model.predict(X_test)
    mean_square_error = mean_squared_error(y_test, y_pred)
    mean_squared_errors.append(mean_square_error)

mean_squared_errors = np.array(mean_squared_errors)
mean = np.mean(mean_squared_errors)
standard_deviation = np.std(mean_squared_errors)

print('\n')
print("Below is the mean and standard deviation of " +str(total_mean_squared_errors) + " mean squared errors with normalized data. Total number of epochs for each training is: " +str(epochs) + "\n")
print("Mean: "+str(mean))
print("Standard Deviation: "+str(standard_deviation))

MSE 1: 113.54814777559447
MSE 2: 93.92407678400428
MSE 3: 62.859329211287516
MSE 4: 54.26362807388059
MSE 5: 54.07575380146311
MSE 6: 54.468947006274966
MSE 7: 53.76220762382433
MSE 8: 41.07629016765113
MSE 9: 43.51987383203599
MSE 10: 42.88581021009526
MSE 11: 42.53389522712979
MSE 12: 38.73286757731515
MSE 13: 48.60494113045603
MSE 14: 49.64437439063606
MSE 15: 37.94171388249567
MSE 16: 36.79723016189526
MSE 17: 40.679899993452054
MSE 18: 39.639864764167264
MSE 19: 37.63290759583507
MSE 20: 39.42082572369128
MSE 21: 34.20414263530842
MSE 22: 40.479873972031676
MSE 23: 30.842693365893318
MSE 24: 34.96686878636431
MSE 25: 36.35072313080328
MSE 26: 37.3257827882242
MSE 27: 33.027043462956996
MSE 28: 33.576709123876874
MSE 29: 40.092150580150026
MSE 30: 37.96800371435468
MSE 31: 34.92854372043054
MSE 32: 33.97640865288892
MSE 33: 35.37924754812494
MSE 34: 36.56914665938195
MSE 35: 35.20201002201216
MSE 36: 43.01155139713226
MSE 37: 31.007265464387665
MSE 38: 37.8229997752168
MSE 39: 35.5