[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/tuankhoin/COMP30027-Practical-Solutions/blob/main/2022/Week%211.ipynb)

In [None]:
from google.colab import drive 
drive.mount('/content/gdrive')
path = "gdrive/My Drive/COMP30027 (T)/W11/"

Mounted at /content/gdrive


######  The University of Melbourne, School of Computing and Information Systems
# COMP30027 Machine Learning, 2022 Semester 1

## Week 11 - Neural Networks

Why you may want to attend:
- The reason 69% of the students enrolled in this subject 👌
- Some cool terminology that you can use to flex today:
  - Multi-Layer Perceptron
  - Standardization
  - Pipeline
  - Hyperparameter Tuning with Grid Search
- It is examinable (oh no!)


### NOTE:  You will need the newer (18.1) build of `scikit-learn` for its neural network support.


### Exercise 1.
The Multilayer Perceptron is available from (newer builds of) `scikit-learn` as `sklearn.neural_network.MLPClassifier`.


In [None]:
import numpy as np
from sklearn import datasets
from sklearn.model_selection import cross_val_score
from sklearn.neural_network import MLPClassifier
from collections import Counter

### Exercise 1.(a) 
Build a default Multilayer Perceptron to classify the `Iris` data. Evaluate its cross-validation accuracy.

In [None]:
iris = datasets.load_iris()
X = iris.data
y = iris.target
print('X:', X.shape, 'y:', set(y))


clf = MLPClassifier(max_iter=2000)

print('corss-val acc:', np.mean(cross_val_score(clf, X, y, cv=5)))
clf.fit(X, y)


X: (150, 4) y: {0, 1, 2}
corss-val acc: 0.9800000000000001


MLPClassifier(max_iter=2000)

### Exercise 1.(b) 
Check the `coefs_` and `n_layers_` attributes of the fitted classifier to examine the resulting neural network.

In [None]:
#print(clf.coefs_)
print('parameter shapes:',[p.shape for p in clf.coefs_])
print('num layers:', clf.n_layers_)

parameter shapes: [(4, 100), (100, 3)]
num layers: 3


### Exercise 2.
One important issue with this Multilayer Perceptron is that it is sensitive to the scale of the input attribute values.
### Exercise 2.(a) Standardization
Read up on the `StandardScaler` , and re-scale the `Iris` data so that each attribute has a *mean* of 0 and a *variance* of 1. Evaluate and examine the resulting neural network built on the re-scaled data.


In [None]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
clf = MLPClassifier(max_iter=2000)
#it is cheating because the mean and variance are estimated using both training and test data
print('Cross-val cheating standardised features acc:', np.mean(cross_val_score(clf, scaler.fit_transform(X), y, cv=5))) 


Cross-val cheating standardised features acc: 0.9666666666666668


Why is it not good?

*Because by scaling the whole dataset, you would have put a bit of test info to the training process. It's like peeping a few card in the deck.*

- Solution? Just do it after splitting!
- ToO mAny sTEpS? Here comes Pipeline Perri!

<details>
<summary>Pipeline Perri can fit multiple models at once.</summary>
<s>Just like their cousin Piper Perri.</s>
</details>

> The Pipeline is built using a list of `(key, value)` pairs, where the key is a string containing the name you want to give this step and value is an estimator object:
```python
from sklearn.pipeline import Pipeline
estimators = [('reduce_dim', PCA()), ('clf', SVC())]
pipe = Pipeline(estimators)
```
If you don't need to name them:
```python
from sklearn.pipeline import make_pipeline
make_pipeline(Binarizer(), MultinomialNB())
```


### Exercise 2.(c) 
(Harder) Calculating the _mean_ and _variance_ on the entire data set (before splitting into train/test sets) is cheating slightly. Write a re-scale function that calculates the scaling factors for the training data, and applies the scaler to the test data. Then, write a wrapper function that uses this to cross-validate.



In [None]:
clf = MLPClassifier(max_iter=2000)
#this way we don't cheat. Read more on pipelines https://scikit-learn.org/stable/modules/compose.html
pipeline = Pipeline([('transformer', scaler), ('estimator', clf)])
print('Cross-val noncheating standardised features acc:', np.mean(cross_val_score(pipeline, X, y, cv=5)))


Cross-val noncheating standardised features acc: 0.9666666666666668


*You might not see reduction in performance for the noncheating method, but in general it is best to standardise only the training data (`fit_transform`), and then apply the transformation to the test data (`transform`).*

*Also you didn't see improvements with standardisation, which might be the result of the neural network not being tuned well in terms of regularisation, and number/size of the layers.*

### Exercise 3 
You can coerce the Multilayer Perceptron to have specifically–sized hidden layers using the `hidden_layer_sizes` parameter.
### Exercise 3.(a) 
Train a Multilayer Perceptron on the two-class `Abalone` data, and examine the resulting neural
network.


In [None]:
def convert_class(raw, num_class=2):
    raw = int(raw)
    if num_class == 2:
        return 0 if raw<=10 else 1
    elif num_class == 3:
        return 0 if raw<=8 else 1 if 9<=raw<=10 else 2
    elif num_class == 29:
        return raw

def load_abalone(addsex=False, num_class=2, path=''):
    X, y = [], []
    with open(path + 'abalone.data', 'r') as fin:
        for line in fin:
            atts = line[:-1].split(",")
            if not addsex:
                X.append(atts[1:-1])
            else:
                sex = atts[0]
                if sex == "M": sex = 0
                elif sex=="I": sex = 1
                elif sex=="F": sex = 2
                else: sex = 3
                
                X.append([sex] + atts[1:-1])
            y.append(convert_class(atts[-1], num_class))
    X = np.array(X, dtype=float)
    return X, y

# Remove 'path' argument if you are running the Notebook locally
X, y = load_abalone(addsex=False, num_class=2, path=path)
print('X:', X.shape, 'y:', set(y))

clf = MLPClassifier(max_iter=2000)
clf.fit(X,y)
print([p.shape for p in clf.coefs_])

X: (4177, 7) y: {0, 1}
[(7, 100), (100, 1)]


### Exercise 3.(b) 
(Harder) Change the size and/or number of hidden layers. How are the resulting weights affected? Can you discern any relationship between the weights for layers of varying sizes?

In [None]:
clf = MLPClassifier(hidden_layer_sizes=[10, 10, 4], max_iter=2000)
clf.fit(X, y)
print([p.shape for p in clf.coefs_])

[(7, 10), (10, 10), (10, 4), (4, 1)]


### Exercise 4. 
There are a couple of different tune-able parameters for the MLPClassifier , mostly dealing with the weight optimisation — however, it is often worthwhile to tune the Regularisation parameter (α).
### Exercise 4.(a) 
Try varying orders of α between 10 and 10e−5 for a Multilayer Perceptron built on the two-class `Abalone` data. How much variance in cross-validation accuracy do you observe?


In [None]:
import tqdm
alphas = [np.power(10.0, i) for i in range(-7, 2)]
print(alphas)

for alpha in tqdm.tqdm(alphas, position=0, leave=True):
    clf = MLPClassifier(max_iter=2000, alpha=alpha)
    pipeline = Pipeline([('transformer', scaler), ('estimator', clf)])
    scores = cross_val_score(pipeline, X, y, cv=5, n_jobs=-1)
    print(f'\nalpha: 1e{np.log10(alpha):.0f}\t mean_acc: {np.mean(scores):.5f}\t standard_dev_acc: {np.std(scores):.5f}')

[1e-07, 1e-06, 1e-05, 0.0001, 0.001, 0.01, 0.1, 1.0, 10.0]


 11%|█         | 1/9 [00:14<01:59, 14.93s/it]


alpha: 1e-7	 mean_acc: 0.79004	 standard_dev_acc: 0.01472


 22%|██▏       | 2/9 [00:27<01:34, 13.57s/it]


alpha: 1e-6	 mean_acc: 0.78573	 standard_dev_acc: 0.01284


 33%|███▎      | 3/9 [00:41<01:21, 13.62s/it]


alpha: 1e-5	 mean_acc: 0.78812	 standard_dev_acc: 0.01383


 44%|████▍     | 4/9 [00:55<01:09, 13.81s/it]


alpha: 1e-4	 mean_acc: 0.78549	 standard_dev_acc: 0.01013


 56%|█████▌    | 5/9 [01:09<00:55, 13.89s/it]


alpha: 1e-3	 mean_acc: 0.79004	 standard_dev_acc: 0.01322


 67%|██████▋   | 6/9 [01:25<00:43, 14.55s/it]


alpha: 1e-2	 mean_acc: 0.78525	 standard_dev_acc: 0.01331


 78%|███████▊  | 7/9 [01:36<00:27, 13.52s/it]


alpha: 1e-1	 mean_acc: 0.78477	 standard_dev_acc: 0.01495


 89%|████████▉ | 8/9 [01:42<00:11, 11.19s/it]


alpha: 1e0	 mean_acc: 0.77879	 standard_dev_acc: 0.02362


100%|██████████| 9/9 [01:46<00:00, 11.86s/it]


alpha: 1e1	 mean_acc: 0.74240	 standard_dev_acc: 0.03103





### Exercise 4.(b) Hyperparameter Tuning with Grid Search
Read up on the `GridSearchCV` utility, to help you in tuning the performance of the *Multilayer Perceptron*. Split the data into a training–and–tuning partition, and a test partition. What is the value of the regularisation parameter that `GridSearchCV` comes up with? How does the test accuracy compare to the default (un-tuned) `MLPClassifier` ?

In [None]:
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split

X_train, X_devtest, y_train, y_devtest = train_test_split(X, y, test_size=0.4, random_state=42)
X_dev, X_test, y_dev, y_test = train_test_split(X_devtest, y_devtest, test_size=0.5, random_state=42)

clf.fit(X_train, y_train)
print('MLP acc without tuning:', clf.score(X_test, y_test))

hidden_sizes = [[100], [10, 10]]
#arguments of MLPClassifier and a list of values for them to search and find the best.
param_grid = {'alpha': alphas, 'hidden_layer_sizes':hidden_sizes}


gs = GridSearchCV(estimator=clf,
                  param_grid=param_grid,
                  scoring='accuracy',
                  cv=3,
                  n_jobs=1, # Somehow verbose will not print if you do multithread
                  verbose=1 # More verbose = More detailed print
                  )

gs.fit(X_train, y_train)
best_params = gs.best_params_
print('best_params', best_params)

clf = MLPClassifier(max_iter=2000, **best_params)
clf.fit(X_train, y_train)
print('acc with best params:', clf.score(X_test, y_test))


MLP acc without tuning: 0.7799043062200957
Fitting 3 folds for each of 18 candidates, totalling 54 fits
best_params {'alpha': 0.01, 'hidden_layer_sizes': [100]}
acc with best params: 0.7811004784688995


In [None]:
import pandas as pd
pd.DataFrame([gs.cv_results_['param_alpha'],
              gs.cv_results_['param_hidden_layer_sizes'],
              gs.cv_results_['mean_test_score'],
              gs.cv_results_['mean_fit_time'],
              gs.cv_results_['rank_test_score']], 
             index=['alpha','Hidden layer size','Mean test score', 'Mean fit time', 'Ranking'])

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17
alpha,0.0,0.0,0.000001,0.000001,0.00001,0.00001,0.0001,0.0001,0.001,0.001,0.01,0.01,0.1,0.1,1.0,1.0,10.0,10.0
Hidden layer size,[100],"[10, 10]",[100],"[10, 10]",[100],"[10, 10]",[100],"[10, 10]",[100],"[10, 10]",[100],"[10, 10]",[100],"[10, 10]",[100],"[10, 10]",[100],"[10, 10]"
Mean test score,0.786515,0.781729,0.779338,0.785317,0.787311,0.788511,0.784126,0.780933,0.78851,0.778537,0.789309,0.781731,0.779336,0.781332,0.752591,0.772149,0.687548,0.653236
Mean fit time,2.616513,1.252661,2.536339,1.676932,2.851301,1.756532,2.420419,1.330499,3.681538,1.454044,2.741483,1.790358,1.936064,1.313662,1.630248,1.359696,0.780962,0.657655
Ranking,5,9,12,6,4,2,7,11,3,14,1,8,13,10,16,15,17,18


In [None]:
# What more can I add?
print('\n'.join(gs.cv_results_.keys()))

mean_fit_time
std_fit_time
mean_score_time
std_score_time
param_alpha
param_hidden_layer_sizes
params
split0_test_score
split1_test_score
split2_test_score
mean_test_score
std_test_score
rank_test_score
