# Regression Model:Concrete compressive strength

In this project,We build a regression model using the deep learning Keras library, and then we have experimented with increasing the number of training epochs and changing number of hidden layers and seen how changing these parameters impacts the performance of the model.

 the data can be found here: https://cocl.us/concrete_data. The predictors in the data of concrete strength include:

* Cement
* Blast furnace slag
* Fly ash
* Water
* Superplasticizer
* Coarse aggregate
* Fine aggregate

In [1]:
## A Build a baseline model 

In [2]:
import keras
from keras.layers import Dense, Input, Flatten
from keras.models import Sequential
#from keras.utils import to_categorical

In [3]:
import pandas as pd
import numpy as np

In [4]:
concrete_data=pd.read_csv("concrete_data.csv")
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


So the first concrete sample has 540 cubic meter of cement, 0 cubic meter of blast furnace slag, 0 cubic meter of fly ash, 162 cubic meter of water, 2.5 cubic meter of superplaticizer, 1040 cubic meter of coarse aggregate, 676 cubic meter of fine aggregate. Such a concrete mix which is 28 days old, has a compressive strength of 79.99 MPa.

In [5]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


Let's check the dataset for any missing values.

In [6]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

#### Split data into predictors and target
The target variable in this problem is the concrete sample strength. Therefore, our predictors will be all the other columns.

In [7]:
concrete_data_columns = concrete_data.columns
X=concrete_data[concrete_data_columns[concrete_data_columns!="Strength"]]
y=concrete_data['Strength']

In [8]:
X.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [9]:
y.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

Normalize the data by substracting the mean and dividing by the standard deviation.

In [10]:
X_cols=X.shape[1]

### Split Data into Train and Test data
Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the train_test_split
helper function from Scikit-learn.

In [11]:
from sklearn.model_selection import train_test_split

In [12]:
#X_train,X_test,Y_train,Y_test=train_test_split(X, y, test_size=0.3, random_state=42)

In [13]:
def regression_model():
    model=Sequential()
    model.add(Input(shape=(X_cols,)))
    model.add(Dense(10,activation='relu'))
    model.add(Dense(1))

    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

### Train and Test the Network

In [14]:
#model=regression_model()

In [15]:
#model.fit(X_train,Y_train,validation_split=0.3,epochs=50, verbose=2)

In [16]:
#model.evaluate(X_test,Y_test)

In [17]:
from sklearn.metrics import mean_squared_error
mse_list=[]
for i in range(50):
    # Step 1: Split data into training and test sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=i)

    # Step 2: Build and train the model
    model = regression_model()
    model.fit(X_train, y_train, epochs=50, verbose=0, batch_size=32)

    # Step 3: Evaluate the model
    y_pred = model.predict(X_test).flatten()
    mse = mean_squared_error(y_test, y_pred)
    mse_list.append(mse)

# Step 4: Calculate mean and standard deviation of MSEs
mean_mse = np.mean(mse_list)
std_mse = np.std(mse_list)

print(f"Mean MSE: {mean_mse}")
print(f"Standard Deviation of MSE: {std_mse}")

Mean MSE: 498.3267670747983
Standard Deviation of MSE: 753.3397834571666


## B Normalize the data

In [18]:
X_norm=(X-X.mean())/X.std()
X_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


In [19]:
mse_list_norm=[]
for i in range(50):
    # Step 1: Split data into training and test sets
    X_train, X_test, y_train, y_test = train_test_split(X_norm, y, test_size=0.3, random_state=i)

    # Step 2: Build and train the model
    model = regression_model()
    model.fit(X_train, y_train, epochs=50, verbose=0, batch_size=32)

    # Step 3: Evaluate the model
    y_pred = model.predict(X_test).flatten()
    mse = mean_squared_error(y_test, y_pred)
    mse_list_norm.append(mse)

# Step 4: Calculate mean and standard deviation of MSEs
mean_mse_norm = np.mean(mse_list_norm)
std_mse_norm = np.std(mse_list_norm)

print(f"Mean MSE: {mean_mse_norm}")
print(f"Standard Deviation of MSE: {std_mse_norm}")

Mean MSE: 335.3864669643563
Standard Deviation of MSE: 62.60773672770934


mean squared errors are decreased after utilizing the normalization as compared to the base model.

## C Increase the number of epochs

In [20]:
mse_list_norm=[]
for i in range(50):
    # Step 1: Split data into training and test sets
    X_train, X_test, y_train, y_test = train_test_split(X_norm, y, test_size=0.3, random_state=i)

    # Step 2: Build and train the model
    model = regression_model()
    model.fit(X_train, y_train, epochs=100, verbose=0, batch_size=32)

    # Step 3: Evaluate the model
    y_pred = model.predict(X_test).flatten()
    mse = mean_squared_error(y_test, y_pred)
    mse_list_norm.append(mse)

# Step 4: Calculate mean and standard deviation of MSEs
mean_mse_norm = np.mean(mse_list_norm)
std_mse_norm = np.std(mse_list_norm)

print(f"Mean MSE: {mean_mse_norm}")
print(f"Standard Deviation of MSE: {std_mse_norm}")

Mean MSE: 164.6315094014099
Standard Deviation of MSE: 19.86561782084384


mean squared errors are decreased after increseing number of epochs.

## D Increase the number of hidden layers

In [21]:
def regression_model_3():
    model=Sequential()
    model.add(Input(shape=(X_cols,)))
    model.add(Dense(10,activation='relu'))
    model.add(Dense(10,activation='relu'))
    model.add(Dense(10,activation='relu'))
    model.add(Dense(1))

    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [22]:
mse_list_norm=[]
for i in range(50):
    # Step 1: Split data into training and test sets
    X_train, X_test, y_train, y_test = train_test_split(X_norm, y, test_size=0.3, random_state=i)

    # Step 2: Build and train the model
    model = regression_model_3()
    model.fit(X_train, y_train, epochs=50, verbose=0, batch_size=32)

    # Step 3: Evaluate the model
    y_pred = model.predict(X_test).flatten()
    mse = mean_squared_error(y_test, y_pred)
    mse_list_norm.append(mse)

# Step 4: Calculate mean and standard deviation of MSEs
mean_mse_norm = np.mean(mse_list_norm)
std_mse_norm = np.std(mse_list_norm)

print(f"Mean MSE: {mean_mse_norm}")
print(f"Standard Deviation of MSE: {std_mse_norm}")

Mean MSE: 130.40592878419469
Standard Deviation of MSE: 15.207411614847857


mean squared errors are decreased after increasing the number of hidden layers.