# Parallel Lines
Consider the following 2 equations of lines which are parallel to each other,
- $y = m_1 * x + c_1$.
- $y = m_2 * x + c_2$.

![parallel_lines.png](attachment:parallel_lines.png)

The relationship between $m_1$ and $m_2$,
- If the lines are parallel, they will have the same slope. Only theire y-intercept valus is different. Therefore, for parallel lines, $m_1 = m_2$.

# Perpendicular Lines
Consider the following 2 equation of lines which are perpendiculat to each other,
- $y = m_1 * x + c_1$.
- $y = m_2 * x + c_2$.

![perpendicular_lines.png](attachment:perpendicular_lines.png)

The relationship between $m_1$ and $m_2$ is,
- $m_1 * m2 = -1$.
- $m_1 = -\frac{1}{m_2}$

Therefore, they are negatively inversely proportional to each other (negative reciprocals).

# Half Spaces
If there are 2 features, a line can be drawn to divide the $x_1$ and $x_2$ features into 2 parts,

![half_space_1.png](attachment:half_space_1.png)

One part is above the line, and the other is below the line. These parts are called as half spaces. Whenever a classification boundary is created, the plane is divided into half spaces.

Consider a 3D space,

![half_space_2.png](attachment:half_space_2.png)

A 2D boundary will be needed to divide a plane with 3 features. This 2D plane will divide the 3D space into 2 parts, one above, and one below the 2D plane. These two parts are 2 half spaces.

![half_space_3.png](attachment:half_space_3.png)

# What Happens With More Than 2 Dimensions?
The equation of the line is given by,
- $w_1x_1 + w_2x_2 + w_0 = 0$.

Equation of a 3D plane is given by,
- $w_1x_1 + w_2x_2 + w_3x_3 + w_0 = 0$.

This will divide a 4D plane into 2 halves. Visualizing beyond 3D is impossible. Hence the equation can be generalized. 

The equation of a 4D plane is given by,
- $w_1x_1 + w_2x_2 + w_3x_3 + w_4x_4 + w_0 = 0$.

This will divide a 5D plane into 2 halves.

An nD plane is given by,
- $w_1x_1 + w_2x_2 + w_3x_3 + ... + w_nx_n + w_0 = 0$.

The above is the equation of the geometric structure which will divide the (n + 1)D plane.

Everything greater than 2D (3D, 4D, 5D, ...) is called as a hyperplane.

# The End Goal
Consider the following points,

| $x_1$ | $x_2$ | $x_3$ |
| :-: | :-: | :-: |
| 5 | 5 | 1 |
| 2 | 4 | 2 |
| 6 | 6 | 1 |
| 3 | 2 | 2 |
| 7 | 4 | 1 |
| 5 | 1 | 2 |

![classification_5.png](attachment:classification_5.png)

The goal is to build a ML model which can classify these points. A geometric structure is needed to divide these points. Any point above the line belongs to class 2, and any point below the line belongs to class 1.

Is the line shown in the figure the only line, and the best line that divides the points?
- No, there can be a better line, and this is not the only line that divides the plane. There could be 100s of such lines. The parameters are the only changes that will occur ($w_1$, $w_2$ and $w_0$).

The end goal is to find the best values of $w_1$, $w_2$ and $w_0$. The best values are found by minimizing the loss function.

# Vectors
In Physics, a vector is a quantity with magnitude, and direction. Consider a point $x = (5, 5)$ which is a 2D vector,

![vector_1.png](attachment:vector_1.png)

It is pointing somewhere between $x_1$ and $x_2$ axis.

Consider another point $(3, 3)$. This has the same direction, but their lengths are different. The length of $(3, 3)$ is shorter than $(5, 5)$.

This length is nothing but magnitude of the vector. The angle it makes is the direction. From the Pythagoras' theorem, magnitude of $(3, 3)$ is,
- $hyp^2 = opp^2 + adj^2$.
- $\text{magnitude} = \sqrt{3^2 + 3^2}$.
- $\text{magnitude} = 3\sqrt{2}$.

The magnitude is represented as,
- For $x_1 = (3, 3)$, $||x_1|| = 3\sqrt{2}$.
- For $x_2 = (5, 5)$, $||x_2|| = \sqrt{5^2 + 5^2} = 5\sqrt{2}$.
- For $x_3 = (7, 4)$, $||x_3|| = \sqrt{7^2 + 4^2}$.

The same formula can be extended to a 3D vector as well,
- For $x = (4, 5, 6)$ = $||x|| = \sqrt{4^2 + 5^2 + 6^2}$.

Therefore,
- $magnitude = ||x||$.
- $vector = x, \bar{x}, \overrightarrow{x}$.

Hence, magnitude can be defined as the distance (non-negative length) of the vector from the origin. 

Vector in ML is represented as a column vector,
- Normal, $x = 
\begin{bmatrix}       
1 & 2
\end{bmatrix}$.
- ML, $x = 
\begin{bmatrix}
1 \\
2
\end{bmatrix}$.

If $x$ is a row vector, then $x^T$ is a column vector. An nD vector is represented as,
- $\bar{x} =
\begin{bmatrix}
x_1 \\
x_2 \\
x_3 \\
\vdots \\
x_n
\end{bmatrix}$.

# Norm Of A Vector
Norm is just another name for distance. Consider a vector,
- $\bar{x} =
\begin{bmatrix}
x_1 \\
x_2 \\
x_3 \\
\vdots \\
x_n
\end{bmatrix}$.

The magnitude of this vector is given by,
- $||x|| = \sqrt{x_1^2 + x_2^2 + x_3^2 + ... + x_n^2}$.

This is called as L2 norm, or Euclidean distance.
- $\therefore L_2 = \sqrt{x_1^2 + x_2^2 + x_3^2 + ... + x_n^2}$.

L2 norm is the shortest distance from the origin to the point. Similarly there is L1 norm, which is given by,
- $L_1 = |x_1| + |x_2| + ... + |x_n|$.

L1 norm is called as Manhattan distance.

![l1_l2_norms.png](attachment:l1_l2_norms.png)

The same can be extended to L3, L4, ..., Ln norm as well.

The best fit line will be the one that is farthest away from both points (class 1, and class 2).
- $\therefore Loss = -Distance$.

# Dot Product Of Vector
Consider 2 vectors, $\bar{x} = (1, 2)$, and $\bar{y} = (3, 4)$.

There are 2 types of products, dot and cross.

Dot product is the sum of products. It is given by,
- $\bar{x}.\bar{y} = x_1y_1 + x_2y_2$.
- $\bar{x}.\bar{y} = 1*3 + 2*4 = 3 + 8 = 11$.

The result of a dot product is a scalar. Consider,
- $\bar{x} =
\begin{bmatrix}
1 \\
2
\end{bmatrix}
, \bar{y} =
\begin{bmatrix}
3 \\
4
\end{bmatrix}$.

To perform a dot product,
- $a * b . m * n$.

$b$ should be equal to m, and the size of the output would be ($a x n$).

Hence transpose of either $\bar{x}$, or $\bar{y}$ should be taken,
- $\bar{x}^T . \bar{y} = 
\begin{bmatrix}
1 & 2
\end{bmatrix}_{1 * 2} .
\begin{bmatrix}
3 \\
4
\end{bmatrix}_{2 * 1} = 1 * 3 + 2 * 4 = 11$.

Note: $\bar{x}^T . \bar{y} ≠ \bar{x} . \bar{y}^T$.

The Python implementation of the same is,

In [1]:
import numpy as np

x = np.array([1, 2])
y = np.array([3, 4])

In [2]:
x.T.dot(y)

np.int64(11)

In [3]:
x.dot(y)

np.int64(11)

In [4]:
np.dot(x, y)

np.int64(11)

In [5]:
np.matmul(x, y)

np.int64(11)

Operations in `numpy` are highly parallelized (vetorized). Meaning, the computation of all the variables involved in an operation are computed at once. Making it a faster way of computation.

# Example Study: Bank Loan Approval System
Some important parameters used while approving the loan are,
- Income.
- CIBIL.
- Current loans.

To approve a loan, the equation can be written as,
- $w_1 * \text{income} + w_2 * \text{CIBIL} + w_3 * \text{current loan} >= \text{Target}$.

Loan is approved if the LHS is greater than or equal to the target. But if the same is lesser than the target, the loan is rejected.

This equation looks like the equation of a line without the $w_0$ in it.

Therefore, $\begin{bmatrix}
w_1 \\
w_2 \\
w_3
\end{bmatrix}^T 
*
\begin{bmatrix}
\text{income} \\
\text{CIBIL} \\
\text{current loan}
\end{bmatrix}$.

Essentially, $\text{weights} * \text{features}$.

The target can be derived from,
- domain experience.
- or data.

Finding the $w_x$'s is the objective.

# Angle Between 2 Vectors
Given the equation of line, how is it decided which half space will the data point lie in?

Consider 2 vectors,

![vector_2.png](attachment:vectors_2.png)

$\cos\theta = \frac{\bar{x}^T.\bar{y}}{||x||.||y||}$.

Where,
- $||x||$ and $||y||$ = L2 norm.

Range of $\cos\theta = (-1, 1)$.

- $\cos\theta$ is positive between $0^{\degree}$ and $90^{\degree}$, and again between $270^{\degree}$ and $360^{\degree}$.
- $\cos\theta$ is negative between $90^{\degree}$ and $270^{\degree}$.

Consider,

![vector_3.png](attachment:vector_3.png)

The angle between 2 vectors cannot be greater than $180^{\degree}$.

### When is a point perpendicular to a line?
Consider,

$w_1x_1 + w_2x_2 = 0$.

![vector_4.png](attachment:vector_4.png)

If the point is on the same line, what would the angle between the point and the line be? $0^{\degree}$.

$\cos\theta = \frac{\bar{w}^T.\bar{y}}{||w||.||y||}$.

$\cos\theta = 0$, when $\theta = 90^{\degree}$. Meaning, $w^T$ and $x$ are perpendicular to each other.

For a new data point, to find if it is above or below the line, the angle between $\bar{w}$, and the point has to be found. If $\theta < 90^{\degree}$, then the point is below the line, and if $\theta = 90^{\degree}$, then the point is exactly on the line.

# Unit Vector
A unit vector has a magnitude of 1. It is represented as, $\hat{w}$.

Given $w =
\begin{bmatrix}
1, 
2, 
3
\end{bmatrix}$

$\hat{w} = \frac{\bar{w}}{||w||} = \frac{1}{||w||}, \frac{2}{||w||}, \frac{3}{||w||}$.

$\hat{w} = \frac{1}{\sqrt{14}}, \frac{2}{\sqrt{14}}, \frac{3}{\sqrt{14}}$.

When the magnitude of $\hat{w}$ is calculated, it will be equal to 1.
- $\hat{w} < \bar{w} (always)$
- also, $\bar{w} = ||w|| * \hat{w}$.

$\hat{w}$ is one unit of measurement. $\hat{w}$ will divide $\bar{w}$ in to $||w||$ parts.

# Vector Projection
Consider,

![vector_5.png](attachment:vector_5.png)

The shadow made by $\bar{x}$ on $\bar{y}$ is called as the vector projection of $\bar{x}$ on $\bar{y}$.

The angle between the projection, and $\bar{y}$ will be $0^{\degree}$.

$\cos\theta = \frac{||p||}{||x||} \text{ (using trigonometry)}$.

$\cos\theta = \frac{\bar{x}^T.\bar{y}}{||x||.||y||} \text{ (using linear algebra)}$.

Equate the above 2 equations,

$\frac{||p||}{||x||} = \frac{\bar{x}^T.\bar{y}}{||x||.||y||}$

$||p|| = \frac{\bar{x}^T.\bar{y}}{||y||}$

$\because \frac{\bar{y}}{||y||} = \hat{y}$

$||p|| = x^T . \hat{y}$

# Classification V. Regression
Classification and regression are 2 types of supervised learning tasks in Machine Learning, and they have distinct objectives, and characteristics.

### Classification
1. Objective:
    - Objective of classification: To predict the categorical class labels of new, unseen instances or records or data points based on past observations.
    - Example: Predicting whether an email is a spam, or a ham (binary classification), or classifying images of digits into their respective numerical labels (multi-class classification).
2. Output:
    - Output in classification: Discrete and categorical. The model assigns instances to predefined classes or categories.
3. Examples of classification algorithms,
    - Logistic Regression.
    - Decision Tree.
    - SVM.
    - kNN.
    - Random Forest.
    - Neural Networks.
4. Evaluation metrics: Accuracy, Precision, Recall, F1-Score, Confusion Matrix, ROC-AUC.
5. Example applications:
    - Spam detection.
    - Image recognition.
    - Sentiment analysis.
    - Disease diagnosis (e.g., presence or absence).

### Regression
1. Objective:
    - Objective of regression: To predict a continuous numerical value based on input features.
    - Example: Predicting house prices, stock prices, temperature, or the time it takes to complete a task.
2. Output: Continuous and numerical. The model predicts a value within a range.
3. Examples of regression algorithms:
    - Linear Regression.
    - Ridge and Lasso Regression.
    - Decision Trees.
    - SVM
    - Neural Networks.
4. Evaluation metrics: MSE, MAE, RMSE, R2-Score, etc.
5. Example applications:
    - House price prediction.
    - Stock price forecasting.
    - Temperature prediction.
    - Sales forecasting.

### Key differences
1. Output type:
    - Classification predicts categorical labels (classes).
    - Regression predicts continuous numerical values.
2. Evaluation metrics:
    - Different metrics are used to evaluate the performance of classification, and regression models.
3. Algorithms:
    - Although some algorithms can be used for both tasks, certain algorithms are specifically designed for either classification, or regression.
4. Example applications:
    - Applications and domains where classification or regression is more suitable depend on the nature of the prediction task.

The choice between classification and regression depends on the nature of the target variable. If the target variable is categorical, its a classification problem. If it is numerical, its a regression problem.