#### Part A. Build a baseline model (5 marks) 

Use the Keras library to build a neural network with the following:

- One hidden layer of 10 nodes, and a ReLU activation function

- Use the adam optimizer and the mean squared error  as the loss function.

1. Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the 
train_test_split
helper function from Scikit-learn.

2. Train the model on the training data using 50 epochs.

3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

5. Report the mean and the standard deviation of the mean squared errors.

In [1]:
# Import necessary libraries
import numpy as np  # For numerical operations
import pandas as pd  # For data manipulation

from sklearn.metrics import mean_squared_error  # To calculate the Mean Squared Error
from sklearn.model_selection import train_test_split  # For splitting the dataset
from tensorflow.keras.layers import Dense  # To define the layers of the neural network
from tensorflow.keras.models import Sequential  # To build the neural network
from tensorflow.keras.optimizers import Adam  # Optimizer for training the model

In [2]:
# Load the dataset
url = "concrete_data.csv"  # Path to the dataset
data = pd.read_csv(url)  # Load data into a pandas DataFrame

In [3]:
data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [4]:
# Data contains columns such as:
# Cement, Blast Furnace Slag, Fly Ash, Water, Superplasticizer, Coarse Aggregate, Fine Aggregate, Age, Strength
# The target variable (y) is 'Strength'

# Split data into predictors (X) and target variable (y)
X = data.drop("Strength", axis=1)  # Features/predictors
y = data["Strength"]  # Target variable

In [5]:
# Define the baseline model function
def baseline_model(X, y):
    """
    Build and evaluate a baseline regression model using Keras.
    The model has:
    - One hidden layer with 10 nodes and ReLU activation.
    - Adam optimizer and mean squared error loss.
    This process is repeated 50 times to calculate mean and standard deviation of MSE.

    Parameters:
    X: Features (predictors)
    y: Target variable (concrete strength)

    Returns:
    Mean and standard deviation of the MSEs from 50 iterations.
    """
    mse_list = []  # List to store MSEs from each iteration

    # Repeat the process 50 times
    for _ in range(50):
        # Split the dataset into training and testing sets (70% train, 30% test)
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=np.random.randint(0, 100))
        
        # Build the neural network model
        model = Sequential([
            Dense(10, activation='relu', input_shape=(X_train.shape[1],)),  # Hidden layer with 10 nodes
            Dense(1)  # Output layer with a single node (for regression)
        ])
        
        # Compile the model with Adam optimizer and mean squared error loss
        model.compile(optimizer=Adam(), loss='mean_squared_error')
        
        # Train the model on the training data for 50 epochs
        model.fit(X_train, y_train, epochs=50, verbose=0)
        
        # Evaluate the model on the test data
        y_pred = model.predict(X_test, verbose=0)  # Predict on the test set
        mse = mean_squared_error(y_test, y_pred)  # Calculate Mean Squared Error
        mse_list.append(mse)  # Append the MSE to the list
    
    # Return the mean and standard deviation of the MSEs
    return np.mean(mse_list), np.std(mse_list)

In [6]:
# Run the baseline model and calculate the mean and standard deviation of MSE
mean_a, std_a = baseline_model(X, y)

In [7]:
print("Part A - Baseline Model: Mean MSE =", mean_a, "Std MSE =", std_a)

Part A - Baseline Model: Mean MSE = 283.67766803016343 Std MSE = 236.16119813621864
