# ML Crash Course
The goal of these guides is to introduce the reader to machine learning. We will be going through a couple of different disciplines at a very high level in order to understand the intuition behind what ML is, how it works, why it works, and perhaps most importantly, when it doesn't work; but this is by no means a comprehensive course in any of the disciplines that go into ML (e.g. linear algebra, convex optimization, multivariable calculus, statistics). I will include examples of college courses you can take to learn more in each section, as well as any online resources that go more into depth - but these are only if you'd like to learn more about the field, and won't be mandatory for building the intuition we seek.

If you have any questions about anything in these guides (or corrections!) please reach out to me (@Rehan Durrani) on Slack!

## Multivariable Functions (Multivariable Calculus)
One of the fundamental concepts behind ML is the algebra and calculus of multivariate functions. As you can tell from the name, multivariate functions are functions of more than one variable. An example of a multivariate function could be $f(x, y) = x^2 + y^2$, where $f$ is a function of **both** $x$, and $y$.

Multivariate functions are very common in real life, and model the fact that in real life, values often have dependencies on multiple other variables. As an example, suppose we try to come up with a formula for how much it costs to produce a pallet of paper. It makes sense that this is dependent on multiple variables, like the cost of lumber, the cost of manpower, and the cost of the equipment. In this case, the estimated cost is a function of 3 variables.

## Extending to higher dimensions (Multivariable Calculus)
The cartesian plane we see most often (shown below) is 2 dimensional, since it has 2 axes.
<img src="cartesian.png" alt="Cartesian Plane" width="250" />
 We can expand to higher dimensions (3, 4, 5, ...) by adding additional axes that are perpendicular. It is important to note that all axes must be perpendicular to each other. Specifically, we can expand the cartesian plane to 3 dimensions by adding the $z$-axis, as shown below.
 <img src="cartesian3.png" alt="3D Cartesian Coordinate System" width="250"/>
In our previous example of a multivariate function, $f(x, y) = x^2 + y^2$, we would equate $z$ to $f(x, y)$ and plot the curve the same way we plotted curves in 2 dimensions - picking values for $x$ and $y$ and finding out the corresponding value for $z$.

We can also extend to higher dimensions (e.g. 4, 5, etc.), but it becomes tricky for humans to visualize (one hypothesis is that this is because we live in 3 dimensions and therefore our brains are not trained to understand what higher dimensions would look like).

## Gradients (Multivariable Calculus)
For functions of one variable, we are able to identify a derivative - an instantaneous rate of change. We often think of the derivative as the slope of the function at one point. We can extend derivatives to functions of multiple variables - with a few tweaks. The derivative of a function of multiple variables is referred to as its gradient; but what does rate of change mean in the context of multiple variables?

In higher dimensions, when we try to find the rate of change, we try and find it with respect to one variable. We are saying "If I hold all of the other variables constant, how much does the function change when this one variable changes?" The technical term for this is a partial derivative, and it is found using the same rules as differentiation as before - except that we pretend that all of the other variables are constants. As an example, say we are trying to find the partial derivative of $f(x, y) = xy + y^2x^2$ with respect to $x$. We pretend $y$ is a constant, and apply the normal rules of differential to get: $$\frac{\partial f}{\partial x} = y + 2y^2x$$
The fancy $\partial$ symbol is different from a normal $d$, and is used to represent that this is a *partial* derivative of $f$ since $f$ is a function of multiple variables. The symbol is often referred to as the partial operator.

We find the *gradient* of a function $f$ by finding its partial derivative with respect to all of its dependent variables. In other words, the gradient of $f(x, y) = xy + y^2x^2$ is $$\nabla f = \begin{bmatrix}\nabla_xf\\\nabla_yf\end{bmatrix} = \begin{bmatrix}\frac{\partial f}{\partial x}\\\frac{\partial f}{\partial y}\end{bmatrix}$$
where the $\nabla$ operator, called *nabla*, represents either a gradient (when there is no subscript), or a partial derivative with respect to the subscripted variable.

The gradient is a vector, since it is composed of a collection of values. In the next section, we will dive into vectors and matrices.

## Vectors and Matrices (Fundamentals of Linear Algebra)

What is a vector? A vector is a collection of multiple scalar (numerical) values. We can think of a vector as a way to represent a point in N-dimensional space. In 2 dimensions, a vector looks like our standard cartesian coordinates - $$(3, 4) \rightarrow \begin{bmatrix}3\\4\end{bmatrix}$$ but this can extend to higher dimensions. As an example, a vector in 5 dimensions may look something like this: $$\begin{bmatrix}x_1\\x_2\\x_3\\x_4\\x_5\end{bmatrix}$$

I'm going to show my bias to my alma mater here and defer to these lecture notes: https://inst.eecs.berkeley.edu/~ee16a/fa20/lecture/Note1A.pdf. Please read these to get a better understanding of vectors, matrices, and the basics of linear algebra, and complete section 1.6.1 as the quiz for this lesson.