## Download and Clean Dataset

In [1]:
import pandas as pd
import numpy as np

Using the dataset provided for the assignment.

The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:

1. Cement

2. Blast Furnace Slag

3. Fly Ash

4. Water

5. Superplasticizer

6. Coarse Aggregate

7. Fine Aggregate

Read dataset into Pandas dataframe.

In [2]:
concrete_data = pd.read_csv('https://cocl.us/concrete_data')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


The first concrete sample has 540 cubic meter of cement, 0 cubic meter of blast furnace slag, 0 cubic meter of fly ash, 162 cubic meter of water, 2.5 cubic meter of superplasticizer, 1040 cubic meter of coarse aggregate, 676 cubic meter of fine aggregate. It is 28 days old, and has a compressive strength of 79.99 MPa.

Total number of data points.

In [3]:
concrete_data.shape

(1030, 9)

There are approximately 1,000 samples for model training. Because of the limited number, there is the possibility of overfitting the training data. First, check dataset for any missing values.

In [4]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [5]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

Data looks good, so let's begin building the model.

Target variable is the concrete sample strength. Our predictors will then be all the other columns.

In [6]:
concrete_data_columns = concrete_data.columns
predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

In [7]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [8]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

Nomalize the data.

In [9]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


In [10]:
n_cols = predictors_norm.shape[1] # number of predictors

In [11]:
import keras

Using TensorFlow backend.


In [12]:
from keras.models import Sequential
from keras.layers import Dense

In [13]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

The function creates a model that has three hidden layers with 10 neurons and a ReLU activation function. It uses the adam optimizer and the MSE as the loss function.

Import scikit-learn to randomly split the data into a training and test sets.

In [15]:
from sklearn.model_selection import train_test_split

Split the data into a training and test sets by holding 30% of the data for testing.

In [16]:
X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=42)

Create the new model.

In [17]:
model = regression_model()

Instructions for updating:
Colocations handled automatically by placer.


In [18]:
epochs = 50
model.fit(X_train, y_train, epochs=epochs, verbose=2)

Instructions for updating:
Use tf.cast instead.
Epoch 1/50
 - 10s - loss: 1572.6916
Epoch 2/50
 - 0s - loss: 1534.9316
Epoch 3/50
 - 0s - loss: 1483.7497
Epoch 4/50
 - 0s - loss: 1401.5156
Epoch 5/50
 - 0s - loss: 1271.5138
Epoch 6/50
 - 0s - loss: 1074.8051
Epoch 7/50
 - 0s - loss: 792.9571
Epoch 8/50
 - 0s - loss: 517.1376
Epoch 9/50
 - 0s - loss: 345.9755
Epoch 10/50
 - 0s - loss: 278.8050
Epoch 11/50
 - 0s - loss: 253.3032
Epoch 12/50
 - 0s - loss: 236.1263
Epoch 13/50
 - 0s - loss: 221.8939
Epoch 14/50
 - 0s - loss: 210.4722
Epoch 15/50
 - 0s - loss: 200.9279
Epoch 16/50
 - 0s - loss: 193.7031
Epoch 17/50
 - 0s - loss: 186.1091
Epoch 18/50
 - 0s - loss: 179.6063
Epoch 19/50
 - 2s - loss: 173.6108
Epoch 20/50
 - 0s - loss: 168.4175
Epoch 21/50
 - 0s - loss: 163.7575
Epoch 22/50
 - 0s - loss: 159.3620
Epoch 23/50
 - 0s - loss: 155.3939
Epoch 24/50
 - 0s - loss: 151.4038
Epoch 25/50
 - 0s - loss: 149.6572
Epoch 26/50
 - 0s - loss: 144.7069
Epoch 27/50
 - 0s - loss: 142.4111
Epoch 28/

<keras.callbacks.History at 0x7f0553f2b5f8>

Evaluate model on the test data.

In [19]:
loss_val = model.evaluate(X_test, y_test)
y_pred = model.predict(X_test)
loss_val



103.33865670015896

Compute the mean between the predicted concrete strength and the actual concrete strength.

In [21]:
from sklearn.metrics import mean_squared_error

In [22]:
mean_square_error = mean_squared_error(y_test, y_pred)
mean = np.mean(mean_square_error)
standard_deviation = np.std(mean_square_error)
print(mean, standard_deviation)

103.33865804414553 0.0


Create list of 50 mean squared errors and report mean and the standard deviation of the mean squared errors.

In [23]:
total_mean_squared_errors = 50
epochs = 50
mean_squared_errors = []
for i in range(0, total_mean_squared_errors):
    X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=i)
    model.fit(X_train, y_train, epochs=epochs, verbose=0)
    MSE = model.evaluate(X_test, y_test, verbose=0)
    print("MSE "+str(i+1)+": "+str(MSE))
    y_pred = model.predict(X_test)
    mean_square_error = mean_squared_error(y_test, y_pred)
    mean_squared_errors.append(mean_square_error)

mean_squared_errors = np.array(mean_squared_errors)
mean = np.mean(mean_squared_errors)
standard_deviation = np.std(mean_squared_errors)

print('\n')
print("Below is the mean and standard deviation of " +str(total_mean_squared_errors) + " mean squared errors with normalized data. Total number of epochs for each training is: " +str(epochs) + "\n")
print("Mean: "+str(mean))
print("Standard Deviation: "+str(standard_deviation))

MSE 1: 66.32374083880082
MSE 2: 66.01571620706602
MSE 3: 47.850103680755716
MSE 4: 45.095218251052415
MSE 5: 43.49912455784079
MSE 6: 44.8270061579337
MSE 7: 44.34345218201671
MSE 8: 34.858913211760786
MSE 9: 38.11652037240926
MSE 10: 34.64212005269566
MSE 11: 36.58817945560591
MSE 12: 29.49202158535954
MSE 13: 40.277185199330155
MSE 14: 42.18627274152145
MSE 15: 33.47652719244602
MSE 16: 30.29399860715403
MSE 17: 37.309655038284255
MSE 18: 34.33878219629183
MSE 19: 34.95750863035134
MSE 20: 35.53563742035801
MSE 21: 28.317794207230357
MSE 22: 31.782992372235046
MSE 23: 27.241675997243345
MSE 24: 28.892511151755127
MSE 25: 32.811533943349104
MSE 26: 32.351512834863755
MSE 27: 31.000075158177843
MSE 28: 28.68013778242093
MSE 29: 34.589612757118005
MSE 30: 34.46157867777309
MSE 31: 31.810032680968252
MSE 32: 29.15793544189058
MSE 33: 29.65261465523235
MSE 34: 30.414968459737338
MSE 35: 30.155794927603218
MSE 36: 35.69994164747713
MSE 37: 25.78942338702748
MSE 38: 33.80114721403153
MSE 39