# Machine Learning Introduction

### Steps

![ml-framework](./res/ml-framework.PNG)


### Learning Map

![ml-learning-map](./res/ml-learning-map.PNG)

### Supervised Learning

Learning from `teacher`.

Require a log of training data to find out input/output relationship.

We call function output = `label`


##### Regression

The output of target function $f$ is `scalar`.

* Input user action data, find out how much money user will deposit.

##### Classfication

The output of target function $f$ is a `binary` or `multi-class`.

* Is this a cat?

* Handwriting recognition.

##### Structure Learning

The output is beyond classification.

* Speech recognition(Output is a sentense).

* Face recognition(Tell who this person is).


### Semi-Supervised Learning

Learning from few labelled data but much more unlabelled data(relating to labelled data).

* Distinguish picture is a cat or a dog. In training data, they are all cats and dog, but only few of them are labelled.

### Transfer Learning

Learning from few labelled data but much more unlabelled data(**NOT** relating to labelled data).

* Distinguish picture is a cat or a dog. Training data contains other objects not relating to this task(labelled and unlabelled ones).


### Unsupervised Learning

Input training data, but do not tell what output it should be.

* Input a word, and machine tell you meaning of it from reading a lot of documents.

### Reinforcement Learning

Learning from `critics`. Input is good or bad instead of correct answer.

* Training machine to play Super Mario.

### Demo

##### Q: We want to find releationship between `Attack` and `Special Attack` of Water Pokemon.

Following code are for preparation, could ignore it.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

def draw_heatmap(heatmap_df):
    
    indices = heatmap_df.index.values
    columns = list(map(lambda co: co[1], heatmap_df.columns.values))

    heatmap = np.array(heatmap_df.values)


    fig, ax = plt.subplots()
    
    im = ax.imshow(heatmap)
    
    # We want to show all ticks...
    ax.set_xticks(np.arange(len(columns)))
    ax.set_yticks(np.arange(len(indices)))
    # ... and label them with the respective list entries
    ax.set_xticklabels(columns)
    ax.set_yticklabels(indices)

    # Rotate the tick labels and set their alignment.
    plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
             rotation_mode="anchor")


    ax.set_title("Heatmap")
    fig.tight_layout()
    plt.show()
    
plt.rcParams["figure.figsize"] = (10, 10)

First, show data and visualize it.

In [None]:
pokemon_df = pd.read_csv('./res/pokemon.csv', na_values=['NA'])

water_pokemon_df = pokemon_df[pokemon_df['Type 1'] == 'Water'].dropna(how='any')

print(water_pokemon_df[['Name', 'Type 1', 'Attack', 'Sp. Atk']].head())

water_pokemon_df.plot.scatter('Attack', 'Sp. Atk')

plt.show()

##### Step 1: Find a set of function
Redifine question. 

let y = `Attack`, x = `Sp. Atk`, find a function 

$f = ax + b$

to match this chart.

##### Step 2: Measure goodness of function

In other word, we want to find a function to calculate error of it.

That is:

We want to find a `loss function` 

$
L(f) = L(a, b) = \sum_{i=1}^{n} (\hat{y}^i - (ax^i + b))^2
$


##### Step 3: Pick best one

We want to find best $f^*$ to have minimum loss function

That is:

$
f^* = arg min L(f)
$

##### Brute Force Coding Example



In [None]:
train_data = water_pokemon_df[['Attack', 'Sp. Atk']].values

print(train_data[0:5])

step = 10
ab_array = [(round(a, 2), round(b, 2)) for a in np.arange(-100, 100, step) for b in np.arange(-100, 100, step)]

loss_array = []

best_result = None
min_total_loss = float('inf')
for a, b in ab_array:
    total_loss = 0
    for (x, y) in train_data:
        total_loss += (y - (x*a + b))**2
        
    if total_loss < min_total_loss:
        min_total_loss = total_loss
        best_result = [a, b, total_loss]
    loss_array.append([a, b, total_loss])

loss_array_df = pd.DataFrame(np.array(loss_array))
loss_array_df.columns = ['a', 'b', 'loss']

print(loss_array_df.head())

heatmap_df = pd.pivot_table(loss_array_df, columns=['b'], index=['a'], values=['loss'])

print("Best result, a = {0}, b = {1}, loss = {2}".format(best_result[0], best_result[1], best_result[2]))

draw_heatmap(heatmap_df)

x_array = np.array(list(range(160)))
y_array = x_array*best_result[0] + best_result[1]
plt.scatter(x=train_data[:, 0], y=train_data[:, 1])
plt.plot(x_array, y_array, color='red')
plt.show()

In above exampe, we might be able to find best function, however, there are some issues.

    * Brute force takes time
    * It might not cover best function range.
    
##### Gradient Descent

The concept is that if you are not randomly pick a and b, but you move to next point which has less loss function, it could speed up calculation. 
    


In [None]:
x_array = np.arange(-10, 10, 0.1)


# y = x^2
plt.plot(x_array, x_array**2)
plt.annotate('Current argument', arrowprops=dict(facecolor='black'), xy=(7.5, 7.5**2), xytext=(7.5-5, 7.5**2+5))
plt.xlabel('a')
plt.ylabel('loss')
plt.show()

##### Gradient Descent Coding example

In [None]:
a = 0
b = 0
l_rate = 0.000001
iteration = 200000
for i in range(iteration):
    
    b_grad = 0.0
    a_grad = 0.0
    for (x, y) in train_data:
        b_grad = b_grad - 2.0*(y - b - a*x)*1.0
        a_grad = a_grad - 2.0*(y - b - a*x)*x
    
    b = b - l_rate*b_grad
    a = a - l_rate*a_grad
    
    total_loss = 0
    for (x, y) in train_data:
        total_loss += (y - (x*a + b))**2
        
    if i % 10000 == 0:
        print("Iteration {0}: a = {1}, b = {2}, loss = {3}".format(i, a, b, total_loss))
        
x_array = np.array(list(range(160)))
y_array = x_array*a + b
plt.scatter(x=train_data[:, 0], y=train_data[:, 1])
plt.plot(x_array, y_array, color='red')
plt.show()