<a href="https://colab.research.google.com/github/urness/CS167Fall2025/blob/main/Day17_Multilayer_Perceptrons.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CS167: Day17
## Multilayer Perceptrons

#### CS167: Machine Learning, Fall 2025


In [None]:
# Mount your drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# load the libraries
import pandas as pd
from sklearn.linear_model import SGDRegressor
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split
from sklearn.metrics._plot.confusion_matrix import ConfusionMatrixDisplay
import matplotlib.pyplot as plt
from sklearn.neural_network import MLPRegressor
from sklearn.neural_network import MLPClassifier


# Gradient Descent

# Boston Housing Dataset:

- CRIM - per capita crime rate by town
- ZN - proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS - proportion of non-retail business acres per town.
- CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)
- NOX - nitric oxides concentration (parts per 10 million)
- RM - average number of rooms per dwelling
- AGE - proportion of owner-occupied units built prior to 1940
- DIS - weighted distances to five Boston employment centres
- RAD - index of accessibility to radial highways
- TAX - full-value property-tax rate per \$10,000
- PTRATIO - pupil-teacher ratio by town
- LSTAT - % lower status of the population
- MEDV - Median value of owner-occupied homes in \$1000's

In [None]:
# Code using SGD on Boston Housing Dataset:

# load the data
housing_df = pd.read_csv("/content/drive/MyDrive/CS167/datasets/boston_housing.csv")
predictors = housing_df.columns.drop("MEDV")
target = "MEDV"

#split the data
train_data, test_data, train_sln, test_sln = \
       train_test_split(housing_df[predictors], housing_df[target], test_size = 0.2, random_state=41)

#load up scikit-learn SGD
sgd = SGDRegressor()
sgd.fit(train_data,train_sln)
predictions = sgd.predict(test_data)

r2_value = r2_score(test_sln, predictions)
print("SGD Regression R2 : ", r2_value)

Whoa. that's pretty bad. What's going on here?

- [`sklearn` User Guide on Stochastic Gradient Descent](https://scikit-learn.org/stable/modules/sgd.html#)
- Documentation: [`sklearn.linear_model.SGDRegressor()`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html)

In [None]:
# Your code here

# Multilayer Perceptrons

In [None]:
import pandas as pd
import numpy
from sklearn.model_selection import train_test_split

# load the data
iris_df = pd.read_csv("/content/drive/MyDrive/CS167/datasets/irisData.csv")

#Split the dataset
predictors = iris_df.columns.drop('species')
target = "species"
train_data, test_data, train_sln, test_sln = train_test_split(iris_df[predictors], iris_df[target], test_size = 0.2, random_state=41)

#Normalize Data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(train_data)
train_data_norm = scaler.transform(train_data)
test_data_norm = scaler.transform(test_data)



# Build a MLP using `sklearn`

# In-Class Exercise

### In the code below, change the parameters to the call to `mlp = MLPClassifier()` (keeping `random_state=41`) to improve performance.

- Describe the changes you made that ultimately helped improve performance.
- Why do you think the changes you made helped?


## Exercise #1 -- MLP Classifier

In [None]:
# Set up MLP
from sklearn.neural_network import MLPClassifier
from sklearn import metrics
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

mlp = MLPClassifier(random_state=41,hidden_layer_sizes = (80,), max_iter = 20)
mlp.fit(train_data_norm,train_sln)
predictions = mlp.predict(test_data_norm)

print("Accuracy: ", metrics.accuracy_score(test_sln,predictions))

# Confusion Matrix
vals = iris_df[target].unique() ## possible classification values (species)
conf_mat = metrics.confusion_matrix(test_sln, predictions, labels=vals)

disp = ConfusionMatrixDisplay(confusion_matrix=conf_mat,display_labels=mlp.classes_)
disp.plot(cmap=plt.cm.Blues)
plt.show()

## Exercise #2 -- MLP Regressor
1. Read in the Boston Housing dataset
2. Normalize your data
3. Use a [MLPRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor) to predict the price of a house 'MEDV'
4. Play around with changing the parameters, see what the best R<sup>2</sup> score you can get is. Can you beat my high score of R<sup>2</sup>=0.6241535?

In [None]:
from sklearn import metrics

# Code using SGD on Boston Housing Dataset:
# load the data
housing_df = pd.read_csv("/content/drive/MyDrive/CS167/datasets/boston_housing.csv")
predictors = housing_df.columns.drop("MEDV")
target = "MEDV"

#split the data
train_data, test_data, train_sln, test_sln = \
       train_test_split(housing_df[predictors], housing_df[target], test_size = 0.2, random_state=41)

# Normalize the training data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler() # creates new StandardScaler object
scaler.fit(train_data) # compute the normalized values for predictors using training data
train_data_normalized = scaler.transform(train_data) # apply normalization to training data predictors
test_data_normalized = scaler.transform(test_data) # apply normalization to testing data predictors