# Importing datasets and necesarry Libraries

Lets import pandas and Numpy Libraries

In [36]:
import pandas as pd
import numpy as np

We will be using the dataset provided in the assignment

The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:

1. Cement

2. Blast Furnace Slag

3. Fly Ash

4. Water

5. Superplasticizer

6. Coarse Aggregate

7. Fine Aggregate

Let's read the dataset into a pandas dataframe.

In [37]:
concrete_data = pd.read_csv("C:/Users/hp/Downloads/concrete_data.csv")
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


Lets check how many data points we have

In [38]:
concrete_data.shape

(1030, 9)

Lets use describe function to get the statistics of the data 

In [39]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


Lets check if the data contains any missing values in the dataset

In [40]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

The data does not contain any Missing values So we are ready to build the model

## Spliting the data into predictor and Target

The target variable in this problem is the concrete sample strength. Therefore, our predictors will be all the other columns.

In [41]:
concrete_data_columns = concrete_data.columns
predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] 

Describing the predictors

In [42]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


Decribing the target 

In [43]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

Normalizing the Predictors

In [44]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


Knowing the number of predictors

In [45]:
n_cols = predictors_norm.shape[1] # number of predictors
n_cols

8

# Import Keras

In [46]:
import keras

Lets import the Sequential and Dense Layers to create our model

In [47]:
from keras.models import Sequential
from keras.layers import Dense

## Creating our model and Compiling the model

The above function creates a model that has one hidden layer with 10 neurons and a ReLU activation function. It uses the adam optimizer and the mean squared error as the loss function.

In [48]:
def model():
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))

    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

Now As mentioned we use Scikit-learn inn order to randomly split the data into training and test set

In [49]:
from sklearn.model_selection import train_test_split

In [50]:
X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3, random_state=42)

## Train and Test the Network

Let's call the function now to create our model.

In [51]:
model = model()

Now we'll train about 50 epochs

In [52]:
epochs=50
model.fit(X_train,y_train,epochs=epochs,verbose=1)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x250b4a19c30>

Next we need to evaluate the model on the test data.

In [53]:
loss_val = model.evaluate(X_test, y_test)
y_pred = model.predict(X_test)
loss_val



504.9945983886719

Now we need to compute the mean squared error between the predicted concrete strength and the actual concrete strength.

In [54]:
from sklearn.metrics import mean_squared_error

In [55]:
mean_square_error = mean_squared_error(y_test, y_pred)
mean = np.mean(mean_square_error)
standard_deviation = np.std(mean_square_error)
print(mean, standard_deviation)

504.9946219278907 0.0


Create a list of 50 mean squared errors and report mean and the standard deviation of the mean squared errors.

In [None]:
total_mean_squared_errors = 50
epochs = 50
mean_squared_errors = []
for i in range(0, total_mean_squared_errors):
    X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3, random_state=i)
    model.fit(X_train, y_train, epochs=epochs, verbose=0)
    MSE = model.evaluate(X_test, y_test, verbose=0)
    print("MSE "+str(i+1)+": "+str(MSE))
    y_pred = model.predict(X_test)
    mean_square_error = mean_squared_error(y_test, y_pred)
    mean_squared_errors.append(mean_square_error)

mean_squared_errors = np.array(mean_squared_errors)
mean = np.mean(mean_squared_errors)
standard_deviation = np.std(mean_squared_errors)

print('\n')
print("Below is the mean and standard deviation of " +str(total_mean_squared_errors) + " mean squared errors without normalized data. Total number of epochs for each training is: " +str(epochs) + "\n")
print("Mean: "+str(mean))
print("Standard Deviation: "+str(standard_deviation))

MSE 1: 217.55230712890625
MSE 2: 171.79225158691406
MSE 3: 116.91492462158203
MSE 4: 117.08979034423828
MSE 5: 116.7727279663086
MSE 6: 93.5916748046875
MSE 7: 102.82447052001953
MSE 8: 81.08439636230469
MSE 9: 87.00141143798828
MSE 10: 75.576416015625
MSE 11: 76.06765747070312
MSE 12: 56.01353454589844
MSE 13: 59.104129791259766
MSE 14: 55.396446228027344
MSE 15: 50.9366340637207
MSE 16: 55.96770095825195
MSE 17: 48.884254455566406
MSE 18: 52.959049224853516
MSE 19: 48.4449577331543
MSE 20: 53.68843078613281
MSE 21: 43.49403762817383
MSE 22: 45.845890045166016
MSE 23: 49.471561431884766
MSE 24: 47.83351516723633
MSE 25: 60.662330627441406
MSE 26: 49.36996078491211
MSE 27: 54.68592071533203
MSE 28: 51.46609878540039
MSE 29: 52.55145263671875
MSE 30: 51.30960464477539
MSE 31: 58.97365951538086
MSE 32: 43.20169448852539
MSE 33: 48.53458023071289
MSE 34: 49.4242057800293
MSE 35: 47.03175735473633
MSE 36: 52.98643493652344
MSE 37: 55.064144134521484
MSE 38: 53.05228042602539
MSE 39: 49.441