# Case Study 5 (Part - I): DNN Regression (PyTorch) — Concrete Compressive Strength

**Goal:** predict **concrete compressive strength (MPa)** from 8 input features (mixture components + age).

We build one DNN for regression required by the activity:

**Model:** **8 → 12 → 6 → 1**
- Hidden blocks: **Linear → BatchNorm → ReLU → Dropout**
- Output: **Linear** (no activation for regression)
- Loss: **MSE**
- Overfitting mitigation: **Dropout + L2 (weight decay) + BatchNorm**

At the end, we show:
- Train/Test metrics: **MSE, MAE, R²** (in **original MPa units**)
- A table of **10 random test samples**: predicted vs actual (MPa)


In [1]:
# =========================
# 0) Imports + setup
# =========================
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

import torch
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader

SEED = 42
np.random.seed(SEED)
torch.manual_seed(SEED)

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
DEVICE


'cpu'

## 1) Load data

File: `Concrete_Data_Yeh.csv`  
(If your file is elsewhere, update `CSV_PATH`.)


In [2]:
# =========================
# 1) Load dataset
# =========================
CSV_PATH = "Concrete_Data_Yeh.csv"
df = pd.read_csv(CSV_PATH)

print("Shape:", df.shape)
df.head()


Shape: (1030, 9)


Unnamed: 0,cement,slag,flyash,water,superplasticizer,coarseaggregate,fineaggregate,age,csMPa
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


## 2) Pick features and target

**Inputs (8):** cement, slag, flyash, water, superplasticizer, coarseaggregate, fineaggregate, age  
**Target (1):** csMPa


In [3]:
# =========================
# 2) Features + target
# =========================
FEATURES = ["cement", "slag", "flyash", "water", "superplasticizer",
            "coarseaggregate", "fineaggregate", "age"]
TARGET = "csMPa"

# Keep only needed columns, force numeric, and drop missing rows
df_sub = df[FEATURES + [TARGET]].copy()
df_sub = df_sub.apply(pd.to_numeric, errors="coerce")
df_sub = df_sub.replace([np.inf, -np.inf], np.nan).dropna().reset_index(drop=True)

X = df_sub[FEATURES].to_numpy(np.float32)
y = df_sub[[TARGET]].to_numpy(np.float32)   # shape (N, 1)

print("Clean rows:", len(df_sub))


Clean rows: 1030


## 3) Train/test split (70% / 30%)

We keep the test set “unseen” until evaluation.


In [4]:
# =========================
# 3) Split
# =========================
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.30, random_state=SEED
)

print("Train:", X_train.shape, y_train.shape)
print("Test :", X_test.shape,  y_test.shape)


Train: (721, 8) (721, 1)
Test : (309, 8) (309, 1)


## 4) Standard scaling (X and y)

Why scale?
- DNN training is usually smoother when inputs (and sometimes targets) are on similar scales.

Important:
- Fit scalers on **train** only
- Apply the same transform to **test**


In [5]:
# =========================
# 4) Scale X and y (fit on train only)
# =========================
x_scaler = StandardScaler()
X_train_s = x_scaler.fit_transform(X_train)
X_test_s  = x_scaler.transform(X_test)

y_scaler = StandardScaler()
y_train_s = y_scaler.fit_transform(y_train)
y_test_s  = y_scaler.transform(y_test)

# Quick sanity checks (scaled train ~ mean 0, std 1)
print("X train mean (approx):", X_train_s.mean(axis=0))
print("X train std  (approx):", X_train_s.std(axis=0))
print("y train mean (approx):", y_train_s.mean(axis=0))
print("y train std  (approx):", y_train_s.std(axis=0))


X train mean (approx): [-9.2052389e-08  9.8376596e-09 -1.7691253e-08 -1.1689455e-07
 -1.7261372e-07  1.2334276e-07  3.4925759e-07  2.1080700e-08]
X train std  (approx): [1.0000004  0.99999917 1.0000018  0.9999998  1.0000012  0.9999997
 0.99999994 0.9999982 ]
y train mean (approx): [-3.0422342e-08]
y train std  (approx): [0.9999999]


## 5) Torch DataLoaders

We use mini-batches during training.


In [6]:
# =========================
# 5) Torch tensors + loaders
# =========================
Xtr = torch.tensor(X_train_s, dtype=torch.float32)
ytr = torch.tensor(y_train_s, dtype=torch.float32)

Xte = torch.tensor(X_test_s, dtype=torch.float32)
yte = torch.tensor(y_test_s, dtype=torch.float32)

train_loader = DataLoader(TensorDataset(Xtr, ytr), batch_size=512, shuffle=True)


## 6) Define the DNN (8 → 12 → 6 → 1)

Key ideas:
- **Linear layers** learn weighted sums
- **ReLU** adds nonlinearity
- **BatchNorm** stabilizes training
- **Dropout** reduces overfitting
- **L2** is applied via `weight_decay` (keeps weights smaller)


In [7]:
# =========================
# 6) Model: 8 -> 12 -> 6 -> 1
# =========================
class DNNRegressor(nn.Module):
    def __init__(self, in_dim=8, hidden1=12, hidden2=6, dropout_p=0.1):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(in_dim, hidden1),
            nn.BatchNorm1d(hidden1),
            nn.ReLU(),
            nn.Dropout(dropout_p),
            nn.Linear(hidden1, hidden2),
            nn.BatchNorm1d(hidden2),
            nn.ReLU(),
            nn.Dropout(dropout_p),
            nn.Linear(hidden2, 1)
        )

    def forward(self, x):
        return self.net(x)

model = DNNRegressor(
    in_dim=X_train_s.shape[1],
    hidden1=12,
    hidden2=6,
    dropout_p=0.1
).to(DEVICE)
model


DNNRegressor(
  (net): Sequential(
    (0): Linear(in_features=8, out_features=5, bias=True)
    (1): BatchNorm1d(5, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Dropout(p=0.1, inplace=False)
    (4): Linear(in_features=5, out_features=1, bias=True)
  )
)

## 7) Train the model

- Loss: **MSE**
- Optimizer: **Adam**
- Regularization: **L2** via `weight_decay`


In [8]:
# =========================
# 7) Training loop
# =========================
def train(model, train_loader, epochs=400, lr=1e-3, weight_decay=1e-4):
    model.train()
    opt = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)  # <-- L2
    loss_fn = nn.MSELoss()  # <-- regression loss

    for epoch in range(1, epochs + 1):
        losses = []
        for xb, yb in train_loader:
            xb, yb = xb.to(DEVICE), yb.to(DEVICE)

            opt.zero_grad()
            pred = model(xb)
            loss = loss_fn(pred, yb)
            loss.backward()
            opt.step()

            losses.append(loss.item())

        if epoch in [1, 500, 1000, 1500, 2000]:
            print(f"Epoch {epoch:3d} | train MSE (scaled y): {np.mean(losses):.6f}")

train(model, train_loader, epochs=2000, lr=1e-2, weight_decay=1e-4)


Epoch   1 | train MSE (scaled y): 1.262070
Epoch 500 | train MSE (scaled y): 0.238167
Epoch 1000 | train MSE (scaled y): 0.263896
Epoch 1500 | train MSE (scaled y): 0.234202
Epoch 2000 | train MSE (scaled y): 0.247789


## 8) Predict + evaluate (original MPa units)

We trained on **scaled y**, so we:
1. predict in scaled units  
2. inverse-transform back to **MPa**  
3. compute MSE/MAE/R² in **MPa units**


In [9]:
# =========================
# 8) Predict + metrics (MPa)
# =========================
def predict_scaled(model, X_tensor):
    model.eval()
    with torch.no_grad():
        return model(X_tensor.to(DEVICE)).cpu().numpy()

# Predictions in scaled units
yhat_train_s = predict_scaled(model, Xtr)
yhat_test_s  = predict_scaled(model, Xte)

# Back to MPa
yhat_train = y_scaler.inverse_transform(yhat_train_s)
yhat_test  = y_scaler.inverse_transform(yhat_test_s)

def metrics(y_true, y_pred):
    return {
        "MSE": mean_squared_error(y_true, y_pred),
        "MAE": mean_absolute_error(y_true, y_pred),
        "R2":  r2_score(y_true, y_pred),
    }

tr = metrics(y_train, yhat_train)
te = metrics(y_test,  yhat_test)

print("Train (MPa): MSE={MSE:.3f} | MAE={MAE:.3f} | R2={R2:.3f}".format(**tr))
print("Test  (MPa): MSE={MSE:.3f} | MAE={MAE:.3f} | R2={R2:.3f}".format(**te))


Train (MPa): MSE=45.881 | MAE=5.266 | R2=0.837
Test  (MPa): MSE=50.559 | MAE=5.474 | R2=0.813


## 9) 10 random test predictions (MPa)

Predicted vs actual for 10 unseen samples.


In [10]:
# =========================
# 9) 10 random test samples table (MPa)
# =========================
rng = np.random.default_rng(SEED)
idx = rng.choice(len(y_test), size=10, replace=False)

table = pd.DataFrame({
    "actual_MPa": y_test[idx].ravel(),
    "pred_MPa":   yhat_test[idx].ravel(),
})
table["abs_error"] = np.abs(table["pred_MPa"] - table["actual_MPa"])
table


Unnamed: 0,actual_MPa,pred_MPa,abs_error
0,36.299999,37.251465,0.951466
1,39.049999,37.010887,2.039112
2,11.17,10.441101,0.728899
3,15.09,20.881783,5.791782
4,12.25,23.799126,11.549126
5,41.889999,30.4739,11.4161
6,39.700001,36.317764,3.382236
7,13.22,16.635313,3.415313
8,35.34,35.943069,0.603069
9,55.549999,57.223984,1.673985


### Quick takeaways:
- Scale inputs (and optionally targets)
- Regression output is linear
- Compare train vs test metrics
- A small prediction table is an easy sanity check
