# Programming Session 1 (January 24, 2025)

### A Brief Introduction to Machine Learning

Machine learning can be categorized into three types: unsupervised, supervised, and reinforcement learning. 

Unsupervised learning usually deals with unlabeled data in order to detect pattents and data characterization. Clustering is under unsupervised learning and it also covers dimensionality reduction.

Supervised learning uses labeled data for prediction and classification. Some examples of supervised learning is regression and classification.

A third category of machine learning is reinforcement learning. It mimics the trial-and-error learning process that humans use to achieve their goals.

In this course, we'll discuss the following:

-- Machine Learning Pipelines: data collection and data engineering, machine learning implementation (our main focus), evaluations, and reports.

-- Bias-Variance Trade-off: bias (accuracy), variance (precision: getting consistent results). This part deals with sensitivity todatasets and model configurations.

-- Overfitting: Train (parameter estimation)/Validation (hyperparameter tuning)/Test (evaluation) Split. We usually follow the 80-20 train/test split

### Recall: Regression

$y = Xw $ 

$\implies w = (X^T X)^{-1} X^Ty$  

## Linear Regression

### Intro Example 

In [5]:
import numpy as np

X = np.array([1, 2, 3, 4, 5]) #independent var
y = np.array([4, 7, 10, 13, 16]) #dependent var


XD = np.c_[np.ones(X.shape[0]),X]

In [6]:
XD

array([[1., 1.],
       [1., 2.],
       [1., 3.],
       [1., 4.],
       [1., 5.]])

In [11]:
w_best = np.linalg.inv(XD.T.dot(XD)).dot(XD.T).dot(y)
w_best

array([1., 3.])

Note that 1 is the intercept while 3 is the slope. For our next example, we will create a singular matrix.

In [16]:
X_1 = np.array([1, 2, 3, 4, 5])
X_2 = 2 * X_1

XD = np.c_[np.ones(X.shape[0]),X_1, X_2]
XD

array([[ 1.,  1.,  2.],
       [ 1.,  2.,  4.],
       [ 1.,  3.,  6.],
       [ 1.,  4.,  8.],
       [ 1.,  5., 10.]])

In [17]:
w_best = np.linalg.inv(XD.T.dot(XD)).dot(XD.T).dot(y)
w_best

LinAlgError: Singular matrix

What can we do in this case? If $X^{T}X$ is singular, we use the pseudoinverse (Moore-Penrose Inverse).

Consider the system of linear equations:
$A\vec{x} = \vec{b}$.

Ideally, $A$ is nonsingular and thus, $\vec{x} = A^{-1}\vec{b}$.

If $A$ is singular, we want to minimize $|| A\vec{x} - \vec{b}||_{2}$. That is, we want to find $A^{+}$ s.t. $|| A\vec{x} - \vec{b}||_{2} \geq || A\vec{x} - \vec{b}||_{2}$ where $\vec{z} = A^{+}\vec{b}$ 

if $A$ is nonsingular, then $A^{+} = A^{-1}$

In [22]:
X_1 = np.array([1, 2, 3, 4, 5])
X_2 = 2 * X_1

XD = np.c_[np.ones(X.shape[0]),X_1, X_2]

try: 
    w_best = np.linalg.inv(XD.T.dot(XD)).dot(XD.T).dot(y)
except np.linalg.LinAlgError as e:
    print('Warning: Singular matrix, pseudoinverse used')
    w_best = np.linalg.pinv(XD.T.dot(XD)).dot(XD.T).dot(y)



In [21]:
w_best

array([1. , 0.6, 1.2])

### Using `sk_learn`

In [24]:
from sklearn.linear_model import LinearRegression

In [31]:
X = np.array([1, 2, 3, 4, 5]) #independent var
y = np.array([4, 7, 10, 13, 16]) #dependent var

#Step 1: Initialize the linear regression model
model = LinearRegression()

#Step 2: Fit the model to data. For sklearn, X is required to be 2-dimensional.
X = np.array([1, 2, 3, 4, 5]).reshape(-1,1) #reshaping X
model.fit(X,y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

In [33]:
dir(model) #A way to access the model's attributes

['__abstractmethods__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_abc_impl',
 '_decision_function',
 '_estimator_type',
 '_get_param_names',
 '_preprocess_data',
 '_residues',
 '_set_intercept',
 'coef_',
 'copy_X',
 'fit',
 'fit_intercept',
 'get_params',
 'intercept_',
 'n_jobs',
 'normalize',
 'predict',
 'rank_',
 'score',
 'set_params',
 'singular_']

In [40]:
w_best = model.coef_
intercept = model.intercept_
print(f"Slope: {w_best[0]}")
print(f"Intercept: {intercept}")

Slope: 3.000000000000001
Intercept: 0.9999999999999964


In [44]:
X_1 = np.array([1, 2, 3, 4, 5])
X_2 = 2 * X_1

XD = np.c_[X_1, X_2]

model.fit(XD,y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

In [46]:
w_best = model.coef_
intercept = model.intercept_
print(f"Slope: {w_best}") # this refers to w_1 and w_2
print(f"Intercept: {intercept}") # this is w_0

Slope: [0.6 1.2]
Intercept: 1.0


## Train-Test Split

In [69]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

$R^2$ is the cofficient of determination. 

$\implies R^2 = 1- \frac{SS_{R}}{SS_{T}}$

where $SS_{R}$ is the sum of squares of the residuals and $SS_{T}$ is the total sum of squares.

$\implies 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i - \bar{y}_i)^2}$

where $\bar{y} = \frac{1}{n} \sum_{i=1}^{n} y_1$

In [76]:
np.random.seed(0) #Set seed for reproducability

# Data

X = np.array(list(range(1, 11))).reshape(-1, 1)
y = np.array([3 * n for n in range(10)]) + np.random.normal(0, 1, 10)

y

array([ 1.76405235,  3.40015721,  6.97873798, 11.2408932 , 13.86755799,
       14.02272212, 18.95008842, 20.84864279, 23.89678115, 27.4105985 ])

In [77]:
# Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

model = LinearRegression()
model.fit(X_train, y_train)

w_best = model.coef_
intercept = model.intercept_
print(f"Slope: {w_best[0]}")
print(f"Intercept: {intercept}")

Slope: 2.8369468245195097
Intercept: -1.3105001099203442


In [78]:
#Testing

y_pred = model.predict(X_test)
r2 = model.score(X_test, y_test)

print(f"Actual Values: {y_test}") 
print(f"Predicted Values: {y_pred}")
print(f"R2: {r2}")

Actual Values: [ 6.97873798 23.89678115]
Predicted Values: [ 7.20034036 24.22202131]
R2: 0.9989176949332313


Exercise for next programming session: Try to create an implementation of Gradient Descent from scratch.

## Gradient Descent

$w \leftarrow w_o - \eta \frac{\partial{L}}{\partial{w}}$

where $L$ is the loss function.

Steps for implementaion:

-- Initiate weights

-- Compute for gradient

-- Adjust the weights

-- Repeat steps 2 and 3 until a stopping criterian is met