# Advices

When there is a large error in training of a ML algorithm:  
- Ger more training examples
- Try smaller sets of features
- Get additional features
- Add polynomial features ($x^1\dots x^N$)
- Increase/decrease $\alpha$

## Machine learning diagnostics
What would help with algorithm performance/accuracy

# Evaulating performance
Consider a data set of housing prices:

- Split dataset into 'training set' and 'test set' ($70\% / 30\%$). Train the model on the former and evaluate on the latter. $m_{\rm train}$ and $n_{\rm test}$, $x_{\rm test}^{(m_{\rm train})}$ $x_{\rm train}^{(m_{\rm test})}$,  
Fit parameters by minimising usual cost fuction over training dataset: $ J(\vec{w},b)=\dots$  
Evaluate the model over by computing cost func $J_{\rm test}(\vec{w},b)$. over test dataset **without** the regularization term $\sum_{j=1}^{n}w_j^2$.  
Compute **training error**: $J_{\rm train}(\vec{w},b) = \frac{1}{2m_{\rm train}} \Big[ \sum_{i=1}^{m_{\rm train}} \big( f_{\vec{w},b}(\vec{x}_{\rm train}^{(i)}) - y_{\rm train}^{(i)} \big)^2 \Big]$. 

Thus, if there is an overfitting, the model will give large error for test set. 

For **classification problem** instead of logistic loss, use a **fraction of the train set** that algorithm has missclassified. 
$$
\hat{y} = 
\begin{cases}
1 \text{ if } f_{\vec{w},b}(\vec{x}^{(i)}) > 0.5 \\ 
0 \text{ if } f_{\vec{w},b}(\vec{x}^{(i)}) < 0.5 \\ 
\end{cases}
$$

then ${\rm count} \hat{y}\neq y$ and $J_{\rm test}$ and $J_{\rm train}$ are `fractions` of misclassified data.



# Automatic model selection:

Training error is not a good representaion of the model performance.  
We need a general test, a test on the test data.  

For instance. We start with a simplest polynomial, 

$$
\text{1D poly : } f_{\vec{w},b}(\vec{x}) = w_1x + b \rightarrow w^{<1>}, b^{<1>} \rightarrow J_{\rm test}(w^{<1>}, b^{<1>})\\
\dots \\
\text{ND poly : } f_{\vec{w},b}(\vec{x}) = \sum_{1}^N w_i x^{i} + b \rightarrow w^{<N>}, b^{<N>} \rightarrow J_{\rm test}(w^{<N>}, b^{<N>})
$$

One can compare all the $J(test)$ and chose the model with the smallest one. However, this is `underestimation of an actual error` because extra parameter, $d$ was chosen based on the **test set**. 

## Automatic model selection 
### Training / cross validation / test set

Splitting data in 3 subsers: $60\%$ $x^{(m_{\rm train})}$, $20\%$ $x^{(m_{\rm cv})}$ and $20\%$ $x^{(m_{\rm test})}$.  
`Cross validation (validation/or development) set`: (cross check the accuracy of the model) 

For each set compute (no **regularization term here!**)
$$
\text{Training error: } \dots \\
\text{Cross alidation error: } \dots \\
\text{Test error: } \dots \\
$$

Then for each model in your model selection options evaluate these on the cross validation set: 
$$
\text{1D poly : } f_{\vec{w},b}(\vec{x}) = w_1x + b \rightarrow w^{<1>}, b^{<1>} \rightarrow J_{\rm cv}(w^{<1>}, b^{<1>})\\
\dots \\
\text{ND poly : } f_{\vec{w},b}(\vec{x}) = \sum_{1}^N w_i x^{i} + b \rightarrow w^{<N>}, b^{<N>} \rightarrow J_{\rm cv}(w^{<N>}, b^{<N>})
$$

The model with the lowest $cv$ error is the best option. 
The `generalized error ` is estimated by performing this analysis on the **test data**. 

> Pick model with the smallest cross validation error



In [1]:
# for array computations and loading data
import numpy as np

# for building linear regression models and preparing data
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# for building and training neural networks
import tensorflow as tf

# custom functions
import utils

# reduce display precision on numpy arrays
np.set_printoptions(precision=2)

# suppress warnings
tf.get_logger().setLevel('ERROR')
tf.autograph.set_verbosity(0)

2023-05-20 09:18:08.050656: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
