<img src="https://ucfai.org//course/sp19/programming-math-camp/banner.jpg">

<div class="col-12">
    <a class="btn btn-success btn-block" href="https://ucfai.org/signup">
        First Attendance? Sign Up!
    </a>
</div>

<div class="col-12">
    <h1> A Math Refresher and Intro to Scientific Python Programming </h1>
    <hr>
</div>

<div style="line-height: 2em;">
    <p>by: 
        <strong> John Muchovej</strong>
        (<a href="https://github.com/jmuchovej">@jmuchovej</a>)
     on 2019-02-07</p>
</div>

## Math Background

Hopefully, you've either taken or are taking Multivariate Calculus (Calc 3) and/or Matrix Theory & Linear Algebra (Matrix). If not, though, this should get you up to speed on all you need to know for Course.

Tonight we'll be covering:
- Calculus:
  - Derivatives
  - Partial Derivatives
  - Gradients
  - Chain Rule
- Linear Algebra:
  - Vectors
    - Addition
    - Multiplication
  - Matrices
    - Dot Product
    - Element-wise Multiplication (Hadamard Product)
    - Transposition

In [1]:
import numpy as np
import plotly.offline as py
import plotly.graph_objs as go

py.init_notebook_mode(connected=False)

from utils import *

Recall from your earlier days in learning math, where we only lived in a world of straight lines. If you wanted to know the slope of the line below, it never changed, so we could use a wonderful piece of math to figure it out.
$$y = mx + b$$
All we needed to do was determine $m$ and we were done. If we didn't have $m$, then it's a tad more complex, but boils down to... pick two points on the line, $(3, 3)$ and $(4, 4)$, then plug them into...
$$\frac{y_2 - y_1}{x_2 - x_1}$$ or, in our case more specifically &ndash; $$\frac{4 - 3}{4 - 3} = \frac{1}{1} = 1$$

In [2]:
graph_x()

However, the problem here is that we can't model the world in strictly straight lines. Because of that, we need to upgrade to polynomials to have a better chance. **Take a look at the graph below.** However, while we might have a better chance at modelling the world with this, we can't use our beautiful template equation from above to figure it out. We need a new way to describe the slope at any given point.

Below is the graph of $f(x) = x^2$, something you'll notice is that we can't use our $$\frac{y_2 - y_1}{x_2 - x_1}$$ equation... Because $f(2) = 4$, but $f(3) = 9$. The change isn't constant! :o

In [3]:
graph_x2()

However, our good friend Isaac Newton comes to our rescue. :D There's a way we can determine the slope at any point on a line, curved or not. It's called the **derivative**.

Let's say we have an equation $f(x) = x^2 - x + 4$. We can apply the **power rule** to determine what the slope is at any given point in the graph above.

Here's the end-result of the **power rule** applied to the graph: $f'(x) = 2x - 1$. Now let's decompose this, the general idea is to start with some $x$, which is raised to some power $k$; so... $x^k$. Next, we'll pull the power down and subtract one from the power. Which means, we end up with... $$k(x^{k-1})$$

So, if we apply this to our equation above: we'll end up with $$f'(x) = 2 \cdot x^{2-1} - 1 \cdot x^{1-1} + 0 \cdot 4x^{0-1}$$ Whoa! Where'd that $x^0$ come from??

Oh, right. So, anything raised to the 0th-power is just $1$, right? This also means that our equation $f(x) = x^2 - x + 4$ can be written (exactly the same) as $$f(x) = x^2 - x^1 + 4x^0$$

What if, though, we have something like this equation? $$f(x) = (3x^2+x-4)^3$$ Taking a look at it, the slope seems rather odd, no? 

In [4]:
graph_chain()

If you hover over the graph, you'll see that the slope definitely changes, but calculating it is a wee difficult because of that $f(x) = (...)^3$ term. However, there's a way around this, which works quite well – it's called the **chain rule**. So, let's walk through that with our equation: $f(x) = (3x^2+x-4)^3$

$$
\begin{align}
    f(x) &= (3x^2 + x - 4)^3 \\
    f'(x) &= (6x + 1) \cdot 3(3x^2 + x - 4)^2
\end{align}
$$

So, what's going on here is that we're taking two derivatives.
1. We take the derivative of the inside of $f(x)$, so we take the derivative over $3x^2 + x - 3$, which is $6x + 1$
2. We then take the derivative of the outside of $f(x)$, which is that $(...)^3$ component, which gives us $3(...)^2$

Now, when we take the derivative of that outer section of the function, we leave the inside alone and carry it around with us. Leaving us with... $$f'(x) = (6x + 1) \cdot 3(3x^2 + x - 4)^2$$

Awesome, now you know what derivatives are – so we can set out and conquer the world of machine learning, right?! Erm... well... no. We can't do that just yet because this "definition" of the derivative is only good for 2-dimensional data, it's very rare we have such low-dimensional data.

> Erm, "dimensional?" – ahh, yes. So, let's imagine we have some housing data ([I got it here][kaggle-ames])... Look at the code-snippet, below. You'll see a slew of "Data Columns" like... `LotFrontage`, `LotArea`, `Street`, `Alley`, etc. In Machine Learning, we refer to each one of these as "features" of our data, and they correspond to a "dimension" as well.
>
> In this particular case, the housing data here has 80-odd dimensions.

[kaggle-ames]: https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data

In [5]:
import pandas as pd
ames = pd.read_csv("ames-train.csv")
print(f"Ames has {len(ames.columns)} columns")
ames.iloc[:5, :10]

Ames has 81 columns


Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub


We can't take the derivative we've just learned about over 80-dimensions. However, there's a more general form of the derivative, called the **partial derivative**. It works just like a normal derivative, but requires you only determine the "rate of change" of one variable at a time. For example, let's say you have the graph below:

In [6]:
graph_saddle()

So, while this is only a 3D graph, the ideas that follow extend to all dimensions. The key take away here, though, is that we can't take a normal derivative because there are more variables affecting the "rate of change" – if this is unclear, do ping one of the coordinators on Discord and we'll definitely clarify that intuition.

Since there's more than one variable affecting the "rate of change," this means we should figure out a way to look at 

In [None]:
import numpy as np
from random iamport randint

In [None]:
py_list = None
np_list = None
n_rowcols = 10000

In [None]:
%%timeit 
py_list = [[0 for _ in range(n_rowcols)] for _ in range(n_rowcols)]

In [None]:
np_list = None
%%time np_list = np.zeros((n_rowcols, n_rowcols))
;

In [None]:
py_list[0:5]

In [None]:
some_bool = True

In [None]:
isinstance(some_bool, bool)

In [None]:
empty_list = [10]

In [None]:
print(len(empty_list) == 0)

In [None]:
print(empty_list is not None)

In [None]:
if empty_list:
    print(empty_list)

In [None]:
# for (int i = 0; i < 100; i++)
for i in range(100):
    print(i)

In [None]:
while True:
    print("forkbomb")

In [None]:
for i in range(100000000000):
    yield 
# [1,2,3,...]

In [None]:

cls = OOP()
class OOP:
    def __init__(self):
        pass
    
    def _something(self, s):
        return None
    
    def _things(self, s):
        return None
    
    def __gt__(self, something_else):
        pass
    
var1 > var2