# Which Is The Best Line?
![line_11.png](attachment:line_11.png)

Metrics to find the best line are,
1. Accuracy (if accuracy is different for each lines).
2. Distance (if accuracy is same for 2 or more lines).

Plot line v. distance,

![line_12.png](attachment:line_12.png)

The value of distance is maximum at L2 (similarly, the minimum can also be found).

Slope of the curve at maximum or minimum will be zero.
- Gain -> Maximum.
- Loss -> Minimum.

The gradient descent algorithm does exactly this.

It find the parameter of the line $w_1$, $w_2$, and $w_0$, such that the gain function is maximized or the loss function is minimized.

To achieve this, i.e., to find maxima and minima, calculus is used (derivatives in particular).

Optimization is the process of finding the best values of a parameter in order to maximize the gain, or minimize the loss.

The process of finding the point where the gain is maximum, or the loss is minimum is optimization.

# Grid Search
Gain function is given by,

$G = \frac{1}{n} \sum^n_{i = 1} \frac{\overrightarrow{w}^Tx_i + w_0}{||w||} * y_i$

Loss function is given by,

$\text{Loss} = -\text{-Gain}$

$L = - \frac{1}{n} \sum^n_{i = 1} \frac{\overrightarrow{w}^Tx_i + w_0}{||w||} * y_i$

The goal is to find the best $w$, and $w_0$ values such that $G$ is maximum.

Optimize $w$ and $w_0$ to get the maximum $G$.

The problem with grid search is its time complexity. Assuming that $w_1$, $w_2$ and $w_0$ has the following ranges with 1000 values between the lower and upper limit,
- $w_1 = (-10, 10, 1000)$
- $w_2 = (-10, 10, 1000)$
- $w_0 = (-10, 10, 1000)$

The number of combinations that have to be traversed through to find the best $w$ and $w_0$ are, $10^3 * 10^3 * 10^3 = 10^9$.

Which are a lot of combinations.

Alternatively,
- Perceptron.
- Gradient Descent.

# Gradient Descent
It helps in finding the best (approximate) value of parameters for any function.

It does this by dividing the minima and maxima values. Calculus is used to find the minima and maxima. Calculus depends on slope, tangents, and derivatives.

# Defining The Classification Problem Mathematically
### Dataset
$D : \{(x_i, y_i): x_i \in \Re^d, y_i \in \{-1, 1\}\}^n_{i = 1}$

Where,
- $x_i$ = Features.
- $y_i$ = Input labels.
- $R^d$ = d-Dimension real numbers (size of the $w$ vector).

### Goal
Find a linear or non-linear function f(x) such that,

$f(\bar{x}) = y_i'$. Where,
- $x$ = New vector.
- $y_i$ = Prediction.

If the model is good, then $y_i'$ = y_i (ideally).

Gain function is defined as,

$G(D, w, w_0)$

$G = \frac{1}{n} \sum^n_{i = 1} \frac{\overrightarrow{w}^Tx_i + w_0}{||w||} * y_i$

Converting the above to a different format,

$G = \sum^n_{i = 1} g(x_i, y_i, \overrightarrow{w}, w_0)$

The $g$ in the above equation represents the distance of a single point from the line represented by $\overrightarrow{w}$ and $w_0$. And $i = 1 to n$ is the summation of all the distances.

### Why summation and not multiplication?
- $\sum$ = Summation.
- $\Pi$ = Multiplication.

Even if one point is present on the line, the distance of it will be zero and hence, the whole operation will result in zero.

Hence summation is preferred over multiplication. Log or average is also preferred sometimes.

$g(x_i, y_i, \overrightarrow{w}, w_0) = \frac{\overrightarrow{w}^Tx_i + w_0}{||w||} * y_i$.

# Distance Function
$g(x_i, y_i, \overrightarrow{w}, w_0) = \frac{\overrightarrow{w}^Tx_i + w_0}{||w||} * y_i$

Prediction is represented as, $y_i' or \hat{y_i}$. This should not be confused with the representation of a unit vector.

$\overrightarrow{w}^*, w_0^* = \text{Optimal values of } \overrightarrow{w} \text{ and } w_0 \text{ (Gain function is maximum here)}$

$\overrightarrow{w}^*, w_0^* = \argmax_{(\overrightarrow{w}, w_0)} G(D, w, w_0)$

Where,
- $\argmax$ = Index of the maximum element.

# Function
Function is a mapping between input and output, e.g., $f(x) = y^2$.

![function_1.png](attachment:function.png)

- Domain represents all the possible values that a function can take as input.
- Range represents all the possible values that a function can produce as output.

For the above example function,

- Domain, $D = (-\infty, \infty) \\
D \in \Re$
- Range, $R = [0, \infty) \\
R \in \Re^+$

### What is a valid function?
- One to one = valid.
- Many to one = valid.
- One to many = invalid.
- Many to many = invalid.

![function_2.png](attachment:function_2.png)

Now consider the following function, $f(x) = x \{x > 1, 0\}$

The plot for the above function looks like this,

![function_3.png](attachment:function_3.png)

This above function is called as ReLU function, it is very popular in Deep Learning.

# Limits
Functions are further classified into 2 types,
- Continuous.
- Discontinuous.

Note: The cure of a continuous function can be drawn without lifting the pen.

### Step function
$f(x) =\\ \{1, x>= 0\\0, x < 0 \}$

The plot for the above step function looks like,

![function_4.png](attachment:function_4.png)

It is evident that step function is a discontinuous function. To define a function mathematically, the knowledge of limits is required. There are 2 essential concepts in limits,
- Continuity.
- Diffferentiability.

### Contuinity
To explain contuinity, consider the following function, $f(x) = x^2$. If $x = 3$, then $f(x) = 9$.

Now consider the domain, $[2.9, 2.99, 2.999, ..., 3)$.

These values when plotted show a trend which is approaching 3, but never reaching 3. This is called as Left Hand Limit (LHL) of 3 for $f(x)$.

The mathematical representation is, $\lim\limits_{x \to 3^-} f(x) = 9$.

It is read as, for a number which is very very less than 3, if increased gradually, it will reach 9 eventually.

Similarly, the Right Hand Limit (RHL) of 3 for $f(x)$ is written as, $\lim\limits_{x \to 3^+} f(x) = 9$.

The domain looks as follows, $[3.1, 3.01, 3.001, 3.0001, ..., 3)$.

A 2 side limit is when, $\text{LHL at x} = \text{RHL at x} = f(x)$.

When the above criteria is met, the function $f(x)$ is said to be continuous at $x$.

### Differentiability
For a function to be differentiable, it has to be continuous. What is meant by a derivative of a function? The derivative of a function, $f(x)$ is defined as,

$\frac{df(x)}{dx} = F'(x) = \lim\limits_{h \to 0}\frac{f(x + h) - f(x)}{h}$.

The above equation is called as "*ab-initio*" method of finding the derivative of a function.

Consider the identity function, where the input is equal to output,

$f(x) = x$

$\frac{df(x)}{dx} = \lim\limits_{h \to 0}\frac{x + h - x}{h} = 1$

Therefore, $\frac{df(x)}{dx} = 1$.

Now consider, $f(x) = x^2$

$\frac{df(x)}{dx} = \lim\limits_{h \to 0} = \frac{(x + h)^2 - x^2}{h} = \lim\limits_{h \to 0} (2x + h)$

Since $h$ is tending to 0,

$\frac{df(x)}{dx} = 2x$.

Consider the step function, at $x = 0$, LHL of x, $\lim_{x -> 0^-} f(x)$.

As $x$ approaches 0 from the left, the function approaches 0.

RHL of x, $\lim_{x -> 0^+} f(x)$.

As $x$ approaches 0 from the right, the function approaches 0.

At $x = 0$, $f(x)_{x = 0} = f(0) = 1$

$\therefore LHL ≠ RHL$

Therefore, the function is not continuous.