<div style="width: 100%; padding: 20px;">
    <center>
        <img src="Images/nus_logo.png" width="200" style="margin-bottom: 30px;">
    </center>
    <div style="font-size: 45px; color:#002147; font-weight: bold; text-align: center; margin-bottom: 50px;">
        EE2211 Introduction to Machine Learning
    </div>
    <div style="font-size: 32px; color:#FF6F00; text-align: center; margin-bottom: 30px; font-weight: normal;">
        Optional Python Sessions (Week 8)
    </div>
    <hr style="border: none; border-top: 2px solid #002147; width: 80%; margin: 0 auto 30px auto;">
    <p style="text-align:right; font-size: 18px; font-weight: normal; margin-right: 20px;">
        <strong>Mr. ZHU, Zikun</strong><br>
        Department of Electrical and Computer Engineering
    </p>
</div>

## Plan for Today:
- We will go through this Jupyter notebook which has a very quick review of **Numpy**, Python basics followed by revisiting demo code (I have refactored it) from Lectures 4 to 6.
- Feel free to **ask questions at any time**, even during the walkthrough!

## What You Can Ask:
You are welcome to ask about **anything related to Python programming** in the context of **EE2211**.

(Just note that there might be certain questions that I may not have immediate answers to... but I will do my best to help!)

# Please leave your feedback

Feedback link: https://forms.gle/eayf4Y2taDawMqwPA

![Screen Shot 2025-03-11 at 10.57.36 AM.png](attachment:af6cbc25-228c-44c3-b6e4-51ec01ff6ab4.png)

Contact me if you have any questions!

All the best for your midterms!

# **Quick Review on Numerical Computing with NumPy (<ins>Num</ins>erical <ins>Py</ins>thon)**

### **Why Don’t We Just Use Python Lists?**

Python lists are very flexible and can hold different types of data, making them useful for general programming. However, when it comes to numerical computing in machine learning, they have some limitations:

- **Memory Usage**: Lists in Python are not as memory-efficient as NumPy arrays, especially when working with large numbers of elements.
- **Speed**: Python lists are not optimized for numerical operations, especially when dealing with large datasets.
- **Lack of Vectorization**: With lists, you often need to write explicit loops to perform operations on elements, which can make code slower and harder to read.

### **How NumPy Addresses These Limitations**

NumPy provides an efficient solution with its **ndarray** (N-dimensional array) data structure and a variety of optimized functions which operate on this data structure:

- **Memory Efficiency**: `ndarray` is more memory-efficient compared to Python lists.
- **Speed**: NumPy’s array operations are implemented in C, providing significant speed improvements.
- **Vectorized Operations**: With NumPy, you can perform element-wise operations directly on arrays without needing loops, making the code simpler and faster.

---

### **What Does NumPy Offer?**

1. **Data Structure**: 
   - The core data structure is the **ndarray**, which can represent vectors, matrices, and higher-dimensional arrays.
   - It comes with useful property accessors (no `()`) such as `.shape`, `.dtype`, and `.ndim`, which provide quick information about the array.

2. **Optimized Functions**: 
   - NumPy includes many built-in functions (with `()`) for performing operations on `ndarray` objects:
     - **Mathematical Operations**: `np.sum()`, `np.mean()`, `np.exp()`, `np.linalg.inv()`, etc.
     - **Random Number Generation**: `np.random.rand()`, `np.random.randn()`, etc.
   - These functions are optimized for speed and support **vectorized operations**, allowing you to perform computations across entire arrays without the need for explicit loops.

By using NumPy, we can handle large datasets more effectively, write cleaner code, and perform numerical operations much faster than using standard Python lists.

In [1]:
# First, let's import NumPy
import numpy as np

# Check the version of NumPy to ensure it's installed properly
np.__version__

'1.26.4'

### **Creating NumPy Arrays**

The core feature of NumPy is its powerful N-dimensional array object, called the **ndarray**. You can create an array using the `np.array()` function by passing in a Python list or tuple.

Let's see how to create 1D and 2D arrays:

In [5]:
# Creating a 1D array from a Python list
arr1d = np.array([1, 2, 3, 4, 5])
print("1D Array:", arr1d)

# Creating a 2D array (a matrix) from a list of lists
arr2d = np.array([[1, 2, 3], 
                  [4, 5, 6]])
print("2D Array:\n", arr2d)

1D Array: [1 2 3 4 5]
2D Array:
 [[1 2 3]
 [4 5 6]]


### **Array Properties**

Each NumPy array comes with important properties, such as:
- `shape`: Returns the dimensions of the array (rows, columns)
- `dtype`: Tells you the type of data stored (integers, floats, etc.)
- `size`: Number of elements in the array
- `ndim`: The number of dimensions of the array (1D, 2D, etc.)

Let's explore these properties:

In [7]:
# Exploring properties of the 2D array
print("Shape of arr2d:", arr2d.shape)
print("Data type of elements in arr2d:", arr2d.dtype)
print("Number of elements in arr2d:", arr2d.size)
print("Number of dimensions in arr2d:", arr2d.ndim)

Shape of arr2d: (2, 3)
Data type of elements in arr2d: int64
Number of elements in arr2d: 6
Number of dimensions in arr2d: 2


### **Array Indexing and Slicing**

Just like Python lists, you can access elements of a NumPy array using indexing. However, NumPy arrays allow more sophisticated slicing capabilities, especially with multi-dimensional arrays.

Let's see how we can index and slice arrays:

In [11]:
# Accessing individual elements
print("Element at index 0 of arr1d:", arr1d[0])
print("Element at row 0, column 1 of arr2d:", arr2d[0, 1])

# Slicing arrays
print("Elements from index 1 to 3 in arr1d:", arr1d[1:4])
print("Elements in the first two rows and last two columns of arr2d:\n", arr2d[:2, 1:])

Element at index 0 of arr1d: 1
Element at row 0, column 1 of arr2d: 2
Elements from index 1 to 3 in arr1d: [2 3 4]
Elements in the first two rows and last two columns of arr2d:
 [[2 3]
 [5 6]]


### **Basic Arithmetic Operations**

One of the powerful features of NumPy is **vectorized operations**. This means we can perform element-wise operations on arrays without having to loop through elements manually, making our code more efficient and readable.

Let's perform some basic arithmetic operations on arrays:

In [13]:
# Element-wise addition and multiplication, how do you do this in Python list?
arr_add = arr1d + 5
print("arr1d + 5 =", arr_add)

arr_mul = arr1d * 2
print("arr1d * 2 =", arr_mul)

# Element-wise addition of two arrays
arr_sum = arr1d + np.array([10, 20, 30, 40, 50])
print("arr1d + [10, 20, 30, 40, 50] =", arr_sum)

arr1d + 5 = [ 6  7  8  9 10]
arr1d * 2 = [ 2  4  6  8 10]
arr1d + [10, 20, 30, 40, 50] = [11 22 33 44 55]


### **Useful Mathematical Functions**

NumPy provides many useful functions for mathematical operations, such as `np.mean()`, `np.sum()`, `np.std()`, and more. These functions allow you to easily calculate statistics and perform aggregations over your arrays.

Here are some commonly used functions:

In [15]:
# Compute the mean, sum, and standard deviation of the array
mean_value = np.mean(arr1d)
sum_value = np.sum(arr1d)
std_value = np.std(arr1d)

print("Mean of arr1d:", mean_value)
print("Sum of arr1d:", sum_value)
print("Standard Deviation of arr1d:", std_value)

Mean of arr1d: 3.0
Sum of arr1d: 15
Standard Deviation of arr1d: 1.4142135623730951


### **Determinants**

The determinant of a matrix is a scalar value that is a function of the entries of a square matrix. It is used in many areas, such as calculating the inverse of a matrix or solving systems of linear equations. 

We can calculate the determinant of a matrix using `np.linalg.det()`.

In [17]:
# Determinant of a matrix
D = np.array([[1, 2], 
              [3, 4]])

det_D = np.linalg.det(D)
print("Determinant of matrix D:", det_D)

Determinant of matrix D: -2.0000000000000004


Looking at the output, we observe -2.0000000000000004, a consequence of floating-point precision in computers. <br>

Please be reassured that this behavior is common in numerical computing. <br>

It is not specific to NumPy or Python—it's a fundamental characteristic of how computers handle numbers.

# Lecture 4
- Transpose and rank
- Product and inverse
- Even-determined system (m = d)
- Over-determined system (m > d)
- Under-determined system (m < d)


# Transpose and Rank in Linear Algebra

## Transpose of a Matrix

In linear algebra, the **transpose** of a matrix $X$ is denoted as $X^T$. Transposing a matrix means flipping it over its diagonal, which switches its rows and columns.

- If $X$ is an $m \times n$ matrix (with $m$ rows and $n$ columns), then $X^T$ will be an $n \times m$ matrix.
- For example, if $X = \begin{bmatrix} a & b \\ c & d \\ e & f \end{bmatrix}$, then:

$$
X^T = \begin{bmatrix} a & c & e \\ b & d & f \end{bmatrix}
$$

In **NumPy**, we can find the transpose of a matrix using `.T` or `.transpose()`:

- **`.T`** is a **property accessor**, meaning you do not use parentheses. This is because `.T` is treated like an attribute of the array object.
- **`.transpose()`** is a **method** (or function), which is why it requires parentheses `()`. However, `.T` is more commonly used for simple transpositions due to its brevity.

### Differences Between `.T` and `.transpose()`
- **Usage Simplicity**: `.T` is typically used for straightforward transpositions (e.g., flipping a 2D array). It is faster to write and commonly used for simple use cases.
- **Versatility**: `.transpose()` offers more flexibility because you can specify axes when working with arrays of more than two dimensions. For instance, `.transpose()` allows you to change the order of axes in higher-dimensional arrays, whereas `.T` is limited to flipping the last two axes.

---

## Rank of a Matrix

The **rank** of a matrix $X$ is the maximum number of **linearly independent rows** or **columns** in $X$. It represents the dimension of the **column space** (or **row space**) of the matrix.

- It gives us insight into the matrix's dimensionality and is useful in solving systems of linear equations.
- In **Numpy**, we can find the rank of a matrix using `numpy.linalg.matrix_rank()`.

In [19]:
from numpy.linalg import matrix_rank

# Define a 3x3 matrix
X = np.array([[1, 4, 3], 
              [0, 4, 2], 
              [1, 8, 5]])

print("Matrix X:")
print(X)

# Compute the transpose of the matrix
X_transpose = X.T

print("\nTranspose of X (X^T):")
print(X_transpose)

# Compute the rank of the matrix
rank_X = matrix_rank(X)

print(f"\nRank of X: {rank_X}")

Matrix X:
[[1 4 3]
 [0 4 2]
 [1 8 5]]

Transpose of X (X^T):
[[1 0 1]
 [4 4 8]
 [3 2 5]]

Rank of X: 2


# Product and Inverse in Linear Algebra

## Basics: Vectors and Matrices

- A **vector** in NumPy is represented as a **1D array** and is typically written in **lowercase**. For example, a vector $y$ with 3 elements can be written as:
  $$
  y = \begin{bmatrix} 3 \\ 0.5 \\ 4 \end{bmatrix}
  $$

- A **matrix** in NumPy is represented as a **2D array** and is typically written in **uppercase**. For example, a matrix $X$ with 3 rows and 2 columns can be defined as:
  $$
  X = \begin{bmatrix} 1 & 4 \\ 0 & 4 \\ 3 & -2 \end{bmatrix}
  $$

---

## Using `@`, `np.dot()`, or `np.matmul()` for Multiplication

### Quick Recommendation: Just Use `@`

- Using `@` is more concise and improves readability compared to using `np.dot()` or `np.matmul()`.
```python
result = A @ B @ C @ D
result = np.dot(np.dot(np.dot(A, B), C), D)
result = np.matmul(np.matmul(np.matmul(A, B), C), D)
```
- If you are interested to know the differences between these methods in details, I suggest you to explore more [here](https://blog.finxter.com/numpy-matmul-operator/).

---

## Types of Products

### Vector-Vector Product (Dot Product)

The **dot product** of two vectors $a$ and $b$, each of length $n$, is calculated as:

$$
a \cdot b = \sum_{i=1}^{n} a_i b_i
$$

- The result of the dot product is a **scalar** (a single number).
- **Example**: If $a = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}$ and $b = \begin{bmatrix} 4 \\ 5 \\ 6 \end{bmatrix}$, then the dot product is calculated as:

$$
a^T \cdot b = \begin{bmatrix} 1 & 2 & 3 \end{bmatrix} \cdot \begin{bmatrix} 4 \\ 5 \\ 6 \end{bmatrix} = 1 \cdot 4 + 2 \cdot 5 + 3 \cdot 6 = 32
$$

- In practice, NumPy handles this automatically when using `a @ b` or `np.dot(a, b)` or `np.matmul(a, b)` for vectors.

### Matrix-Vector Product

The **matrix-vector product** involves multiplying a matrix $X$ of size $m \times n$ with a vector $y$ of size $n$.

$$
z = X \cdot y
$$

- The result is a **vector** of size $m$.
- **Example**: If $X = \begin{bmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{bmatrix}$ and $y = \begin{bmatrix} 7 \\ 8 \end{bmatrix}$, then:

$$
z = \begin{bmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{bmatrix} \cdot \begin{bmatrix} 7 \\ 8 \end{bmatrix} = \begin{bmatrix} 23 \\ 53 \\ 83 \end{bmatrix}
$$

### Vector-Matrix Product

The **vector-matrix product** involves multiplying a vector $y$ by a matrix $X$.

$$
z = y^T \cdot X
$$

- The result is a **vector** with the same number of columns as $X$.
- **Example**: If $y = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}$ and $X = \begin{bmatrix} 4 & 5 \\ 6 & 7 \\ 8 & 9 \end{bmatrix}$, then:

$$
z = \begin{bmatrix} 1 & 2 & 3 \end{bmatrix} \cdot \begin{bmatrix} 4 & 5 \\ 6 & 7 \\ 8 & 9 \end{bmatrix} = \begin{bmatrix} 40 & 46 \end{bmatrix}
$$

### Matrix-Matrix Product

The **matrix-matrix product** involves multiplying two matrices $X$ of size $m \times n$ and $Q$ of size $n \times p$. The result is a new matrix of size $m \times p$:

$$
Z = X \cdot Q
$$

- The result is a **matrix** of size $m \times p$.
- **Example**: If $X = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}$ and $Q = \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}$, then:

$$
Z = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \cdot \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix} = \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix}
$$

---

### Matrix Inverse

The **inverse** of a square matrix $X$ is a matrix $X^{-1}$ such that:

$$
X \cdot X^{-1} = I
$$

where $I$ is the **identity matrix**.

- Only **square matrices** (same number of rows and columns) that have **full rank** can have an inverse.
- The inverse can be calculated in NumPy using `np.linalg.inv()`.
- **Example**: If $X = \begin{bmatrix} 4 & 7 \\ 2 & 6 \end{bmatrix}$, then:

$$
X^{-1} = \begin{bmatrix} 0.6 & -0.7 \\ -0.2 & 0.4 \end{bmatrix}
$$

and

$$
X \cdot X^{-1} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} = I
$$


In [27]:
# Define 1D vectors (a, b, y)
# NumPy interprets 1D vectors' orientations based on context
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
y = np.array([7, 8])

# Define a 3x2 matrix (X)
X = np.array([[1, 2],
              [3, 4],
              [5, 6]])

# Define a 3x2 matrix (C)
C = np.array([[4, 5],
              [6, 7],
              [8, 9]])

# Define a 2x2 matrix (P)
P = np.array([[1, 2],
              [3, 4]])

# Define a 2x2 matrix (Q)
Q = np.array([[5, 6],
              [7, 8]])

# Dot product (Vector-Vector Product)
dot_product = a @ b
print("Dot product of a and b:")
print(f"a @ b = {dot_product}")
print(f"Result shape: {dot_product.shape}\n")

# Matrix-Vector Product
matrix_vector_product = X @ y # A 1D array on the right side (X @ y) is treated as a column vector
print("Matrix-vector product X @ y:")
print(matrix_vector_product)
print(f"Result shape: {matrix_vector_product.shape}\n")

# Vector-Matrix Product
vector_matrix_product = a @ C # A 1D array on the left side (a @ C) is treated as a row vector
print("Vector-matrix product a @ C:")
print(vector_matrix_product)
print(f"Result shape: {vector_matrix_product.shape}\n")

# Matrix-Matrix Product
matrix_matrix_product = P @ Q
print("Matrix-matrix product P @ Q:")
print(matrix_matrix_product)
print(f"Result shape: {matrix_matrix_product.shape}\n")

Dot product of a and b:
a @ b = 32
Result shape: ()

Matrix-vector product X @ y:
[23 53 83]
Result shape: (3,)

Vector-matrix product a @ C:
[40 46]
Result shape: (2,)

Matrix-matrix product P @ Q:
[[19 22]
 [43 50]]
Result shape: (2, 2)



In [37]:
from numpy.linalg import inv, LinAlgError

# Example 1: Invertible Matrix
X_square = np.array([[4, 7], 
                     [2, 6]])
print("Example 1: Invertible Matrix X_square:")
print(X_square)

# Compute the inverse and verify the result
try:
    X_inv = inv(X_square) # This line is potentially causing issues
    print("\nInverse of X_square:")
    print(X_inv) 
    print("\nVerification (X_square @ X_inv):")
    identity_matrix = X_square @ X_inv 
    print(identity_matrix)
except LinAlgError:
    print("Matrix X_square is not invertible.")

# Example 2: Singular Matrix (Non-Invertible)
X_singular = np.array([[2, 4], 
                       [1, 2]])
print("\nExample 2: Singular Matrix X_singular:")
print(X_singular)

# Attempt to compute the inverse of the singular matrix
try:
    X_singular_inv = inv(X_singular) # This line is potentially causing issues
    print("\nInverse of X_singular:")
    print(X_singular_inv)
    print("\nVerification (X_singular @ X_singular_inv):")
    identity_matrix = X_singular @ X_singular_inv 
    print(identity_matrix)
except LinAlgError:
    print("Matrix X_singular is not invertible because it is singular.")

Example 1: Invertible Matrix X_square:
[[4 7]
 [2 6]]

Inverse of X_square:
[[ 0.6 -0.7]
 [-0.2  0.4]]

Verification (X_square @ X_inv):
[[ 1.00000000e+00 -1.11022302e-16]
 [ 1.11022302e-16  1.00000000e+00]]

Example 2: Singular Matrix X_singular:
[[2 4]
 [1 2]]
Matrix X_singular is not invertible because it is singular.


# Solving Systems of Linear Equations in Optimization Problems

A set of linear equations can have **no solution**, **one solution**, or **infintely many solutions** depending on the relationship between the number of **equations** ($m$) and **unknowns** ($d$). In the following case:

$$
\mathbf{X} \mathbf{w} = \mathbf{y}
$$

Where:

- **$\mathbf{X}$** is an $m \times d$ matrix (features).
- **$\mathbf{w}$** is a $d \times 1$ vector (weights).
- **$\mathbf{y}$** is an $m \times 1$ vector (outputs).

The nature of the solutions depends on the shape of **$\mathbf{X}$**:

| Shape of $\mathbf{X}$ | Type of System      | Condition | Description                              |
|-----------------------|---------------------|-----------|------------------------------------------|
| $\mathbf{X}$ is Square ($m = d$) | Even-determined | $m = d$ | Equal number of equations and unknowns. |
| $\mathbf{X}$ is Tall ($m > d$)   | Over-determined | $m > d$ | More equations than unknowns.           |
| $\mathbf{X}$ is Wide ($m < d$)  | Under-determined | $m < d$ | Fewer equations than unknowns.          |

---

## 1. Square or Even-determined Systems ($m = d$)

- **Condition**: $m = d$ (Square matrix).
- **Solution**: If **$\mathbf{X}$** is **invertible ($\text{Rank} = m \iff \text{Full rank} \iff \text{Invertible}
$)**, the system has exactly **one solution**:

$$
\mathbf{w} = \mathbf{X}^{-1} \mathbf{y}
$$
- If **$\mathbf{X}$** is **not invertible**, the matrix **$\mathbf{X}^{-1}$** does not exist, and a unique solution cannot be found.

## 2. Over-determined Systems ($m > d$)

- **Condition**: $m > d$ (More equations than unknowns).
- **Solution**: In general, an over-determined system has **no exact solution**.
    - **Exception**: If $\text{rank}(\mathbf{X}) = \text{rank}([\mathbf{X}, \mathbf{y}])$, then there is a solution.
- **Approximate Solution**: Use **least squares approximation** (which minimizes the error) when no exact solution exists:

$$
\mathbf{w} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}
$$

- This formula can be used if $\mathbf{X}^T \mathbf{X}$ is invertible.

## 3. Under-determined Systems ($m < d$)

- **Condition**: $m < d$ (Fewer equations than unknowns).
- **Solution**: In general, an under-determined system has **infinitely many solutions**.
    - **Exception**: If $\text{rank}(\mathbf{X}) < \text{rank}([\mathbf{X}, \mathbf{y}])$, there is **no solution** (system is inconsistent).
- **General Solution**: Use the **minimum norm solution**:

$$
\mathbf{w} = \mathbf{X}^T (\mathbf{X} \mathbf{X}^T)^{-1} \mathbf{y}
$$

- This formula can be used if $\mathbf{X} \mathbf{X}^T$ is invertible.

In [41]:
# Define a square system (m = d)
X = np.array([[1, 1], # 2x2 matrix
              [1, -2]])
y = np.array([4, 1])

# Check if the system is full rank by checking if rank = m
if matrix_rank(X) == X.shape[0]:
    w = inv(X) @ y
    print("Solution for even-determined system (square):")
    print(w)
else:
    print("Matrix X is not invertible; no unique solution exists.")

Solution for even-determined system (square):
[3. 1.]


In [55]:
# Define an over-determined system (m > d)
X = np.array([[1, 1], 
              [1, -1], 
              [1, 0]])  # 3x2 matrix
y = np.array([1, 0, 2])

# Check if the system has an exact solution
rank_X = matrix_rank(X)

'''
Xy:
[1, 1, 1] 
[1, -1, 0] 
[1, 0, 2]]


np.column_stack([X, y]:
[[1, 1, 1],
 [1, -1, 0],
 [1, 0, 2]]
'''
rank_Xy = matrix_rank(np.column_stack([X, y]))

# Print appropriate message based on consistency
if rank_X == rank_Xy:
    print("The system has an exact solution.")
else:
    print("The system does not have an exact solution; using least squares approximation.")

# Solve using the same formula
w = inv(X.T @ X) @ X.T @ y

print(f"Solution for over-determined system:")
print(w)

The system does not have an exact solution; using least squares approximation.
Solution for over-determined system:
[1.  0.5]


In [57]:
# Define an under-determined system (m < d)
X = np.array([[1, 2, 3], 
              [1, -2, 3]])  # 2x3 matrix
y = np.array([2, 1])

# Check if the system has a solution
rank_X = matrix_rank(X)
rank_Xy = matrix_rank(np.column_stack([X, y]))

# Determine if the solution can be found (system consistency)
# If rank_X < rank_Xy, the system is inconsistent (no solution).
# Else, there is infinitely many solutions
if rank_X < rank_Xy:
    print("The system is inconsistent; no solution exists.")
else:
    print("The system is consistent and has infinitely many solutions.")
    
    # Solve using the minimum norm solution
    w = X.T @ inv(X @ X.T) @ y
    print("Solution for the under-determined system (minimum norm):")
    print(w)

The system is consistent and has infinitely many solutions.
Solution for the under-determined system (minimum norm):
[0.15 0.25 0.45]


# Lecture 5
- Linear Regression (with One Output)
- Linear Regression with Multiple Outputs

# Linear Regression

## Problem Statement
Linear regression is a fundamental machine learning algorithm and often the first one introduced in many machine learning courses. It forms the basis for understanding more complex models and methods.

The goal of linear regression is to predict the unknown target variable $\mathbf{y}$ given the input features $\mathbf{x}$. This involves building a model using a collection of labeled training examples $\{ (\mathbf{x}_i, y_i) \}_{i=1}^m$, where:
- $m$ is the number of training examples.
- $\mathbf{x}_i$ is a $d$-dimensional feature vector (input).
- $y_i$ is the corresponding real-valued target (output).

Note: The above scenario assumes a single dependent variable. However, in practice, linear regression can be extended to handle various relationships between independent and dependent variables:
- **Single independent variable → Single dependent variable**.
- **Single independent variable → Multiple dependent variables**.
- **Multiple independent variables → Single dependent variable**.
- **Multiple independent variables → Multiple dependent variables**.

---

## Applications: Regression and Classification
Linear regression can be applied to both regression and classification problems:
- When $y_i$ is **continuous-valued**, it is a **regression problem**.
- When $y_i$ is **discrete-valued**, it is a **classification problem**.

---

## Building a Linear Regression Model

In linear regression, we aim to build a model $f_{\mathbf{w}}(\mathbf{x})$ that predicts the target variable $y$ using the input feature vector $\mathbf{x}$. The model is defined as:

$$
f_{\mathbf{w}}(\mathbf{x}) = \mathbf{x}^T \mathbf{w}
$$
where:
- $\mathbf{w}$ is a vector of parameters (weights).
- $\mathbf{x}$ is the input feature vector.

**Note**: Again, in this explanation, we are focusing on a scenario where there is **one independent variable** (feature) and **one dependent variable** (output).

### Absorbing the Bias Term

Often, we include a bias term $b$ in the linear regression model:
$$
f_{\mathbf{w}, b}(\mathbf{x}) = \mathbf{x}^T \mathbf{w} + b
$$

However, the bias term $b$ can be absorbed into the weight vector $\mathbf{w}$ by augmenting the feature matrix $\mathbf{X}$ with an additional column of ones. This allows us to rewrite the model as:
$$
f_{\mathbf{w}}(\mathbf{x}) = \tilde{\mathbf{x}}^T \tilde{\mathbf{w}}
$$
where:
- $\tilde{\mathbf{x}} = [1, \mathbf{x}^T]^T$ is the augmented feature vector (with an extra $1$).
- $\tilde{\mathbf{w}} = [b, \mathbf{w}^T]^T$ is the augmented weight vector (with an extra $b$).

<center>
    <img src="Images/linear_regression_bias.png" width="800" style="margin-bottom: 30px;">
</center>

By doing this, we simplify the notation and maintain the form of the matrix operations, making it easier to compute the weights later on.

### Final Model Representation

After absorbing the bias term, the model can be represented as:
$$
\mathbf{y} = \mathbf{X} \mathbf{w}
$$
where:
- $\mathbf{X}$ is the $m \times (d+1)$ feature matrix, with each row $\mathbf{x}_i$ augmented with a 1.
- $\mathbf{y}$ is the $m \times 1$ vector of target values.
- $\mathbf{w}$ is the $(d+1) \times 1$ vector of weights, including the bias term.

---

## Learning (Training) Stage: Optimizing the Linear Regression Model

After building the linear regression model, the next step is to **learn** or **train** the model by determining the best parameters that minimize the prediction error. This involves finding the optimal values for the model parameters, $\mathbf{w}^*$ (which includes $b^*$), by minimizing an **objective function**, often represented as a **cost function**.

### Objective Function: Mean Squared Error (MSE)

For linear regression, the most commonly used cost function is the **mean squared error (MSE)**, which measures how well the model's predictions match the true values. The cost function is defined as:
$$
\text{Cost}(\mathbf{w}) = \frac{1}{m} \sum_{i=1}^m \left( f_{\mathbf{w}}(\mathbf{x}_i) - y_i \right)^2
$$

Where:
- $m$ is the number of training examples.
- $f_{\mathbf{w}}(\mathbf{x}_i) = \mathbf{x}_i^T \mathbf{w}$ represents the predicted value for input $\mathbf{x}_i$.
- $y_i$ is the actual target value for the $i$-th example.

### Understanding Loss and Cost Functions
- **Loss function**: Measures the error for a single training example (i.e., the difference between the model's prediction $f_{\mathbf{w}, b}(\mathbf{x}_i)$ and the true target value $y_i$. In linear regression, this difference is squared to penalize larger errors).
- **Cost function**: Represents the sum of the loss function over all training examples. It provides a single metric to quantify the model's performance across the entire training dataset. Minimizing this function helps find the best model parameters.

### Why Mean Squared Error (MSE)?
The **Mean Squared Error (MSE)** is a specific form of the cost function used for linear regression:
$$
\text{MSE} = \frac{1}{m} \sum_{i=1}^m \left( \mathbf{x}_i^T \mathbf{w} + b - y_i \right)^2
$$

- **MSE** measures the average squared difference between the predicted values $f_{\mathbf{w}, b}(\mathbf{x}_i)$ and the actual target values $y_i$.
- MSE is commonly used in linear regression because it is:
  - **Simple to compute**: The squared errors are straightforward to calculate.
  - **Differentiable**: Making it suitable for optimization techniques like gradient descent.
  - **Sensitive to larger errors**: Squaring the differences ensures that larger deviations between predictions and true values are penalized more.

By minimizing the MSE, we ensure that our linear regression model learns the parameters $\mathbf{w}$ and $b$ that provide the best fit to the training data, helping the model generalize well to unseen data.

### Solving Linear Regression

After defining the linear regression model and the objective function, the next step is to compute the optimal parameters $\mathbf{w}$ that minimize the cost function, such as **Mean Squared Error (MSE)**. Depending on the nature of the system, we can use different approaches:

1. **Even-determined Systems ($m = d$)**:
   - When the number of training samples equals the number of features, the matrix $\mathbf{X}$ is square. In this case, using the normal equation formula:
    <div style="color:#FF6F00">
    $$
    \mathbf{w} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}
    $$
    </div>
     is equivalent to directly solving the system of equations by computing the inverse of $\mathbf{X}$ itself (i.e., $\mathbf{w} = \mathbf{X}^{-1} \mathbf{y}$), provided that <span style="color: red;">$\mathbf{X}$ is invertible</span>. If $\mathbf{X}$ is not invertible, the normal equation cannot be used directly.

2. **Over-determined Systems ($m > d$)**:
   - If $\text{rank}(\mathbf{X}) = \text{rank}([\mathbf{X}, \mathbf{y}])$, a solution exists, and we can just use the same normal equation approach.
   - If the ranks do not match, there is no exact solution, but the above formula provides a **least squares solution**. This minimizes the sum of squared differences between the predicted values and the actual target values.
   - To compute the least squares solution, <span style="color: red;">the **left inverse** of $\mathbf{X}$ must exist</span>:
     $$
     \mathbf{X}^\dagger = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T
     $$
     <div style="color:#FF6F00">
     $$
         \mathbf{w} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}
     $$
     </div>
   - This left inverse is computed using the inverse of $\mathbf{X}^T \mathbf{X}$ so it has to be invertible.

3. **Under-determined Systems ($m < d$)**:
   - If $\text{rank}(\mathbf{X}) < \text{rank}([\mathbf{X}, \mathbf{y}])$, there is no solution.
   - Else, it ensures that the system is consistent and we have infinitely many solutions. In this case, we use the minimum norm solution.
   - To compute the minimum norm solution (in orange), <span style="color: red;">the **right inverse** of $\mathbf{X}$ must exist</span>:
     $$
     \mathbf{X}^\dagger = \mathbf{X}^T (\mathbf{X} \mathbf{X}^T)^{-1}
     $$
     <div style="color:#FF6F00">
     $$
     \mathbf{w} = \mathbf{X}^T (\mathbf{X} \mathbf{X}^T)^{-1} \mathbf{y}
     $$
    </div>
   - This right inverse is computed using the inverse of $\mathbf{X} \mathbf{X}^T$ so it has to be invertible.

---

## Prediction/Testing:
Once you have successfully computed $\mathbf{w}$ using one of the above methods, you can make predictions on new data $\mathbf{X}_{\text{new}}$ using:
<div style="color:#FF6F00">
$$
\hat{f}_{\mathbf{w}}(\mathbf{X}_{\text{new}}) = \mathbf{X}_{\text{new}} \hat{\mathbf{w}}
$$
</div>

This means that to predict the output for unseen input data $\mathbf{X}_{\text{new}}$, you just perform a matrix multiplication of $\mathbf{X}_{\text{new}}$ with the learned weights $\hat{\mathbf{w}}$.



# Algorithm (Decision Tree) for Solving Linear Regression

To solve a linear regression problem, follow this decision tree based on the relationship between the number of training samples ($m$) and the number of features ($d$):

- **Start**
  - **Is $m = d$?**
    - **Yes**
      - **Is $\mathbf{X}$ invertible?**
        - **Yes**: <span style="color: red;">Use Normal Equation:  $\mathbf{w} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}$  (Actually, you can use either inverse, left inverse or right inverse and multiply it with $y$)</span> 
        - **No**: Cannot Solve
    - **No**: **Is $m > d$?**
      - **Yes**
        - **Is $\text{rank}(\mathbf{X}) = \text{rank}([\mathbf{X}, \mathbf{y}])$?**
          - **Yes**: Use Normal Equation: $\mathbf{w} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}$
          - **No**: **Does Left Inverse $\mathbf{X}^\dagger = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T$ Exist?**
            - **Yes**: Use Least Squares Approximation: $\mathbf{w} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}$
            - **No**: Cannot Solve
    - **No**: **Is $m < d$?**
      - **Yes**
        - **Is $\text{rank}(\mathbf{X}) < \text{rank}([\mathbf{X}, \mathbf{y}])$?**
          - **Yes**: Cannot solve
          - **No**: **Does Right Inverse $\mathbf{X}^\dagger = \mathbf{X}^T (\mathbf{X} \mathbf{X}^T)^{-1}$ Exist?** 
            - **Yes**: Use Minimum Norm Solution: $\mathbf{w} = \mathbf{X}^T (\mathbf{X} \mathbf{X}^T)^{-1} \mathbf{y}$  
            - **No**: Cannot Solve

- **End**

<center>
    <img src="Images/linear_regression_demo1_1.png" width="800" style="margin-bottom: 30px;">
</center>
<center>
    <img src="Images/linear_regression_demo1_2.png" width="800" style="margin-bottom: 30px;">
</center>

In [59]:
from sklearn.metrics import mean_squared_error
# Single independent var, single dependent var
# Over-determined system, no exact sol, least square approximation
# Regression

# Define the training data, remember to add bias term as the first column
X = np.array([[1, -9], 
              [1, -7], 
              [1, -5], 
              [1, 1], 
              [1, 5], 
              [1, 9]])
y = np.array([-6, -6, -4, -1, 1, 4])

# Check if the system has an exact solution
rank_X = matrix_rank(X)
rank_Xy = matrix_rank(np.column_stack([X, y]))

# Print appropriate message based on consistency
if rank_X == rank_Xy:
    print("The system has an exact solution.")
else:
    print("The system does not have an exact solution; using least squares approximation.")

# Compute the optimal weights
w = inv(X.T @ X) @ X.T @ y
print("Learned weights (w):")
print(w)

# Predicting for a new input
X_new = np.array([1, -1])  # Adding bias term as the first element
y_new = X_new @ w
print("\nPrediction for X_new:")
print(y_new)

# Compute the Mean Squared Error (MSE) for the training set
y_pred = X @ w
mse = mean_squared_error(y, y_pred)
print("\nMean Squared Error on the training set:")
print(mse)

# Calculate MSE for individual predictions
mse_per_point = (y_pred - y) ** 2
print("\nMSE for each data point:")
print(mse_per_point)

The system does not have an exact solution; using least squares approximation.
Learned weights (w):
[-1.4375  0.5625]

Prediction for X_new:
-2.0

Mean Squared Error on the training set:
0.16666666666666666

MSE for each data point:
[0.25     0.390625 0.0625   0.015625 0.140625 0.140625]


# Linear Regression with Multiple Outputs

From one column (single output) to h columns (h outputs where h > 1).

<center>
    <img src="Images/linear_regression_multiple_outputs.png" width="800" style="margin-top: 10px; margin-bottom: 30px;">
</center>

Example:
<center>
    <img src="Images/linear_regression_demo2_1.png" width="800" style="margin-top: 10px; margin-bottom: 30px;">
</center>
<center>
    <img src="Images/linear_regression_demo2_2.png" width="800" style="margin-top: 10px; margin-bottom: 30px;">
</center>




In [60]:
# 3 independent var, 2 dependent var
# Over-determined system, no exact sol, least square approximation
# Regression

# Define the feature matrix X and target matrix Y (m > d scenario)
# No need to add offset in this case
X = np.array([
    [1, 1, 1],
    [1, -1, 1],
    [1, 1, 3],
    [1, 1, 0]
])  # 4x3 matrix (4 samples, 3 features)
Y = np.array([
    [1, 0],
    [0, 1],
    [2, -1],
    [-1, 3]
])  # 4x2 matrix (4 samples, 2 targets)

# Check if the system has an exact solution by comparing ranks
rank_X = matrix_rank(X)
rank_Xy = matrix_rank(np.column_stack([X, Y]))

# Print appropriate message based on consistency
if rank_X == rank_Xy:
    print("The system is consistent and has an exact solution.")
else:
    print("The system does not have an exact solution; using least squares approximation.")

# Solve using the least squares formula
w = inv(X.T @ X) @ X.T @ Y
print("\nEstimated weight matrix (w):")
print(w)

# Define new input data for testing and make predictions
# No need to add offset in this case
Xnew = np.array([[1, 6, 8], 
                 [1, 0, -1]])  # 2x3 matrix (2 new samples)
Ynew = Xnew @ w
print("\nPredicted values for new inputs (Ynew):")
print(Ynew)

# Calculate and display the Mean Squared Error (MSE) for the training set predictions
Ytest = X @ w  # Predicted Y using the training data
mse = mean_squared_error(Y, Ytest)

print("\nMean Squared Error (MSE) between actual Y and predicted Ytest:")
print(mse)

The system does not have an exact solution; using least squares approximation.

Estimated weight matrix (w):
[[-0.75        2.25      ]
 [ 0.17857143  0.03571429]
 [ 0.92857143 -1.21428571]]

Predicted values for new inputs (Ynew):
[[ 7.75       -7.25      ]
 [-1.67857143  3.46428571]]

Mean Squared Error (MSE) between actual Y and predicted Ytest:
0.30357142857142855


# Lecture 6
- Linear Regression for Classification
- Ridge Regression
- Polynomial Regression    
    

# Linear Regression for Classification

## Binary Classification

<center>
    <img src="Images/linear_regression_binary_classification.png" width="800" style="margin-top: 10px; margin-bottom: 30px;">
</center>

---

## Multi-Category Classification

<center>
    <img src="Images/linear_regression_multicategory_classification.png" width="800" style="margin-top: 10px; margin-bottom: 30px;">
</center>

<center>
    <img src="Images/linear_regression_demo4_1.png" width="800" style="margin-bottom: 30px;">
</center>
<center>
    <img src="Images/linear_regression_demo4_2.png" width="800" style="margin-bottom: 30px;">
</center>


In [63]:
# 1 independent var, 1 dependent var
# Over-determined system, no exact sol, least square approximation
# Classification

# Define the training data with a bias term as the first column (3x2 matrix)
X = np.array([
    [1, -9], 
    [1, -7], 
    [1, -5], 
    [1, 1], 
    [1, 5], 
    [1, 9]
])
y = np.array([-1, -1, -1, 1, 1, 1])  # Target values as a vector (6,)

# Check if the system has an exact solution
rank_X = matrix_rank(X)
rank_Xy = matrix_rank(np.column_stack([X, y]))

# Print appropriate message based on consistency
if rank_X == rank_Xy:
    print("The system has an exact solution.")
else:
    print("The system does not have an exact solution; using least squares approximation.")

# Compute the optimal weights
w = inv(X.T @ X) @ X.T @ y
print("\nLearned weights (w):")
print(w)

# Predicting for a new input
X_new = np.array([1, -2])  # Including bias term
y_new = X_new @ w
print("\nPrediction for X_new (continuous value):")
print(y_new)

# Classify the prediction using the sign function
y_class_new = np.sign(y_new)
print("\nPredicted class for X_new:")
print(y_class_new)

The system does not have an exact solution; using least squares approximation.

Learned weights (w):
[0.140625 0.140625]

Prediction for X_new (continuous value):
-0.140625

Predicted class for X_new:
-1.0


<center>
    <img src="Images/linear_regression_demo5_1.png" width="800" style="margin-bottom: 30px;">
</center>
<center>
    <img src="Images/linear_regression_demo5_2.png" width="800" style="margin-bottom: 30px;">
</center>

In [51]:
from sklearn.preprocessing import OneHotEncoder

# 2 independent var, 3 dependent var
# Over-determined system, no exact sol, least square approximation
# Classification

# Define training data (4 samples, 3 features including bias term)
X = np.array([
    [1, 1, 1],  # Bias term, feature1, feature2
    [1, -1, 1],
    [1, 1, 3],
    [1, 1, 0]
])

# Define class labels (4 samples, single target variable)
y_class = np.array([1, 2, 1, 3])

# Manually encoded one-hot labels for demonstration
y_onehot_manual = np.array([
    [1, 0, 0],  # Class 1
    [0, 1, 0],  # Class 2
    [1, 0, 0],  # Class 1
    [0, 0, 1]   # Class 3
])
print("Manual one-hot encoding:")
print(y_onehot_manual)

# Automatically perform one-hot encoding using scikit-learn's OneHotEncoder
# First, y_class.reshape(-1, 1) converts your 1D array into a column vector with shape (4, 1), 
# as OneHotEncoder expects each feature in its own column.
# The encoder identifies all unique values in your data (1, 2, and 3) and creates a new binary column for each category.
# For each row in your original data, it places a 1 in the column corresponding to that category and 0s elsewhere.
print("\nOne-hot encoding using OneHotEncoder:")
onehot_encoder = OneHotEncoder(sparse_output=False)
y_onehot = onehot_encoder.fit_transform(y_class.reshape(-1, 1))
print(y_onehot)

# Check if the system has an exact solution
rank_X = matrix_rank(X)
rank_Xy = matrix_rank(np.column_stack([X, y_onehot]))

# Print appropriate message based on consistency
if rank_X == rank_Xy:
    print("\nThe system has an exact solution.")
else:
    print("\nThe system does not have an exact solution; using least squares approximation.")

# Solve for weight matrix W using least squares solution
print("\nEstimated weights (W):")
W = inv(X.T @ X) @ X.T @ y_onehot
print(W)

# Define test data for predictions
X_test = np.array([
    [1, 6, 8],  # Bias term, feature1, feature2
    [1, 0, -1]
])

# Predict class probabilities for the test data
y_test_pred = X_test @ W
print("\nPredicted values for test data (before classification):")
print(y_test_pred)

# Convert predictions to class labels using argmax for one-hot encoding

# axis=1 means that np.argmax is applied across rows, so it finds the index of the highest value in each row
# np.argmax returns 0-based indices, meaning it returns 0, 1, or 2 corresponding to the highest value
y_test_class = np.argmax(y_test_pred, axis=0) + 1

print("\nPredicted class labels for test data:")
print(y_test_class)

Manual one-hot encoding:
[[1 0 0]
 [0 1 0]
 [1 0 0]
 [0 0 1]]

One-hot encoding using OneHotEncoder:
[[1. 0. 0.]
 [0. 1. 0.]
 [1. 0. 0.]
 [0. 0. 1.]]

The system does not have an exact solution; using least squares approximation.

Estimated weights (W):
[[ 3.60822483e-16  5.00000000e-01  5.00000000e-01]
 [ 2.85714286e-01 -5.00000000e-01  2.14285714e-01]
 [ 2.85714286e-01  2.77555756e-17 -2.85714286e-01]]

Predicted values for test data (before classification):
[[ 4.         -2.5        -0.5       ]
 [-0.28571429  0.5         0.78571429]]

Predicted class labels for test data:
[1 2 2]


# Ridge Regression

The primary motivation behind Ridge Regression is to **address overfitting** in linear regression models. Ridge regression ensures that the learned weights are small and the model is simpler, which often leads to better performance on new, unseen data.

<center>
    <img src="Images/ridge_regression.png" width="800" style="margin-top: 10px; margin-bottom: 30px;">
</center>
<center>
    <img src="Images/ridge_regression_primal.png" width="800" style="margin-bottom: 30px;">
</center>
<center>
    <img src="Images/ridge_regression_dual.png" width="800" style="margin-bottom: 30px;">
</center>

# Polynomial Regression

Polynomial regression is used when the relationship between the independent variable(s) and dependent variable(s) is non-linear. By adding polynomial terms (i.e., **higher-degree predictors**), polynomial regression allows the model to capture more complex patterns in the data.

<center>
    <img src="Images/polynomial_regression.png" width="800" style="margin-bottom: 30px;">
</center>
<center>
    <img src="Images/polynomial_regression_regression_versus_classification.png" width="800" style="margin-bottom: 30px;">
</center>


<center>
    <img src="Images/linear_regression_demo_6_1.png" width="800" style="margin-bottom: 30px;">
</center>
<center>
    <img src="Images/linear_regression_demo_6_2.png" width="800" style="margin-bottom: 30px;">
</center>
<center>
    <img src="Images/linear_regression_demo_6_3.png" width="800" style="margin-bottom: 30px;">
</center>

In [55]:
# Note that PolynomialFeatures already includes a column of ones by default as part of the polynomial expansion
from sklearn.preprocessing import PolynomialFeatures

# In P: m = 4, d = 6
# Under-determined system
# Classification

X = np.array([
    [0, 0], 
    [1, 1], 
    [1, 0], 
    [0, 1]
]) 
y = np.array([-1, -1, 1, 1])

# Generate polynomial features of order 2 to introduce non-linear relationships
order = 2
poly = PolynomialFeatures(order)
P = poly.fit_transform(X)
print("Matrix P (Polynomial features):")
print(P)  # P has a higher dimensionality, making the system under-determined (m < d)

# Check if the system has a solution by comparing ranks
rank_P = matrix_rank(P)
rank_Py = matrix_rank(np.column_stack([P, y]))

# Determine if the solution can be found (system consistency)
# If rank_P < rank_Py, the system is inconsistent (no solution).
# Otherwise, it is consistent and has infinitely many solutions.
if rank_P < rank_Py:
    print("The system is inconsistent; no solution exists.")
else:
    print("The system is consistent and has infinitely many solutions.")
    
    # Solve using the minimum norm solution since it is under-determined (m < d).
    w = P.T @ inv(P @ P.T) @ y
    print("\nSolution for the under-determined system (minimum norm):")
    print(w)

# Testing the solution with new input data
print("\nTesting with new data points:")
Xnew = np.array([
    [0.1, 0.1], 
    [0.9, 0.9], 
    [0.1, 0.9], 
    [0.9, 0.1]
])
# Generate polynomial features for the new data
Pnew = poly.fit_transform(Xnew)
# Predict using the learned weights
Ynew = Pnew @ w
print("\nPredictions for new data points:")
print(Ynew)

# Apply sign function to classify the predictions (as it is a classification task)
Ynew_class = np.sign(Ynew)
print("\nPredicted classes for new data points:")
print(Ynew_class)

Matrix P (Polynomial features):
[[1. 0. 0. 0. 0. 0.]
 [1. 1. 1. 1. 1. 1.]
 [1. 1. 0. 1. 0. 0.]
 [1. 0. 1. 0. 0. 1.]]
The system is consistent and has infinitely many solutions.

Solution for the under-determined system (minimum norm):
[-1.  1.  1.  1. -4.  1.]

Testing with new data points:

Predictions for new data points:
[-0.82 -0.82  0.46  0.46]

Predicted classes for new data points:
[-1. -1.  1.  1.]


In [73]:
# Apply Ridge Regression

print("\nApplying Ridge Regression:")
lambda_ridge = 0.1  # Regularization parameter (adjust as needed)
reg_matrix = lambda_ridge * np.identity((P @ P.T).shape[0])  # Regularization term for the dual form
w_ridge = P.T @ inv(P @ P.T + reg_matrix) @ y
print("\nSolution with Ridge Regression (dual form: m < d):")
print(w_ridge)

# Predict using the learned weights from Ridge Regression
Ynew_ridge = Pnew @ w_ridge
print("\nPredictions for new data points using Ridge Regression:")
print(Ynew_ridge)

# Apply sign function to classify the predictions (as it is a classification task)
Ynew_class_ridge = np.sign(Ynew_ridge)
print("\nPredicted classes for new data points using Ridge Regression:")
print(Ynew_class_ridge)


Applying Ridge Regression:

Solution with Ridge Regression (dual form: m < d):
[-0.5570214   0.61565523  0.61565523  0.61565523 -2.64145412  0.61565523]
Test
6

Predictions for new data points using Ridge Regression:
[-0.44799179 -0.59105834  0.32574025  0.32574025]

Predicted classes for new data points using Ridge Regression:
[-1. -1.  1.  1.]


# Please leave your feedback

Feedback link: https://forms.gle/eayf4Y2taDawMqwPA

![Screen Shot 2025-03-16 at 11.50.25 PM.png](attachment:7bb48347-b126-406c-bf3f-31391998da59.png)

Contact me if you have any questions!

All the best for your midterms!