# NOTE: WIP, visit me later!

# Outline
+ [Setup](#Environment-setup)
+ **Training**: adjust W & b
    + Initialization
        + [Dataset & DataLoader](#Dataset)
        + Weight Initialization
            + Zeros
                + the derivative with respect to loss function is the same for every w; similar to linear model [read me](https://medium.com/usf-msds/deep-learning-best-practices-1-weight-initialization-14e5c0295b94)
            + Normal distribution
                + Vanishing gradients
                + Exploding gradients
                    + "This may result in oscillating around the minima or even overshooting the optimum again and again and the model will never learn" [see](https://medium.com/usf-msds/deep-learning-best-practices-1-weight-initialization-14e5c0295b94)
    + **Ecoph**: iteration in the training phase
    + Forward Propagation
        + [Model](#Models)
            + **in_features**: size of W
            + **out_features**: classes
            + Linear Classifiers: returns positive & negative values
            + Logistic Regression: returns [0 - 1] values
            + Threshold function: returns either 0 or 1 
            + [Linear Regression](#Linear-Regression)
        + Activation functions:
            + Tanh
                + zero centered [-1, 1]
                + **Xavier initialization** [see](https://medium.com/usf-msds/deep-learning-best-practices-1-weight-initialization-14e5c0295b94)
                + Vanishing gradient
            + Sigmoid
                + [0, 1]
                + Initialization
                + Vanishing gradient
                + **Binary** classification
            + ReLU
                + [0, 1]
                + "With RELU(z) vanishing gradients are generally not a problem as the gradient is 0 for negative (and zero) inputs and 1 for positive inputs." [read me](https://medium.com/usf-msds/deep-learning-best-practices-1-weight-initialization-14e5c0295b94)
            + Softmax
                + **Multi-class** classification
                
    + **Loss/Cost**: the difference between the prediected values,
        $\hat{y}$ and true labels, $y$
        + [Derivative](#Derivative)
        + [Mean Square Error](#Mean-Square-Error)
        + [Cross Entropy](#Cross-Entropy)
            + "This criterion expects a class index (0 to C-1) as the target for each value" [PyTorch](https://pytorch.org/docs/stable/nn.html#torch.nn.CrossEntropyLoss)
        + [Binary Cross Entropy](#Binary-Cross-Entropy)
    + **Backward propagation**:    
        + **Optimization**: updates W & b in the Backward propagation
            + [Adam optimizer](#Adam)
            + Gradient Descent Optimization
                + Batch Gradient Descent
                + Mini-Batch Gradient Descent (PyTorch's default)
                + Stochastic Gradient Descent
                    + Update loss by one sample at a time
                    + Sudden increases may occur
                    + May not be accurate
                    + Good for big data
+ **Validation**: adjust the hyper-parameters; learning rate & batch size
    + **Early Stopping**: "Stop training when a monitored quantity has stopped improving" [[tf.keras](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/EarlyStopping)]

+ Tools & Libraries
    + **Visualization**
        + Pandas
        + Matplotlib
    + [Numpy](#NumPy)
+ Cheatsheets
    + [ml-cheatsheet.readthedocs.io](https://ml-cheatsheet.readthedocs.io)

# Environment setup

## Install
+ Install [Anaconda](https://www.anaconda.com/download/#linux)
+ Expose `~/anaconda3/bin` (where `conda` executable biniary)
+ Install [PyTorch](https://pytorch.org/): `conda install pytorch torchvision -c pytorch`

## Online tools
+ [Google Colaboratory](https://colab.research.google.com) [[open me!](https://colab.research.google.com/github/yoga1290/cheatsheets/blob/master/PyTorch.ipynb)]

# Gradient Descent Optimization

+ [PUML](https://raw.githubusercontent.com/yoga1290/cheatsheets/master/gradient-descent.puml)
![Gradient Descent](https://github.com/yoga1290/cheatsheets/raw/master/gradient-descent.png)


# Dataset

In [None]:
from torch.utils.data import Dataset, DataLoader
from torch import arange, randn

# https://stanford.edu/~shervine/blog/pytorch-how-to-generate-data-parallel#dataset
class MyDataset(Dataset):
    # Constructor
    def __init__(self):
        self.x = arange(-3, 3, 0.1).view(-1, 1)
        self.f = 1 * self.x - 1
        self.y = self.f + 0.1 * randn(self.x.size())
        self.len = self.x.shape[0]
        
    # Getter
    def __getitem__(self,index):    
        return self.x[index],self.y[index]
    
    # Get Length
    def __len__(self):
        return self.len

    
params = {'batch_size': 64,
          'shuffle': True,
          'num_workers': 6}
# https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader
dataLoader = DataLoader(MyDataset(), **params)

# for X, y in dataLoader

## Prebuilt dataset

### MNIST

In [None]:
import torchvision.transforms as transforms
import torchvision.datasets as dsets

dataset = dsets.MNIST(
    root = './data2', 
    train = False, 
    download = True, 
    transform = transforms.ToTensor()
)

In [None]:
import matplotlib.pylab as plt

def show_data(data_sample, shape = (28, 28)):
    plt.imshow(data_sample[0].numpy().reshape(shape), cmap='gray')
    plt.title('y = ' + str(data_sample[1].item()))

show_data(dataset[0])

## Torchvision Transforms

+ Compose
+ CenterCrop
+ ToTensor

In [None]:
from torchvision.transforms import Compose
from torchvision.transforms import CenterCrop
from torchvision.transforms import ToTensor
import torchvision.datasets as dsets

croptensor_data_transform = Compose([
    CenterCrop(20),
    ToTensor()
])

dataset = dsets.MNIST(root = './data', train = False, download = True, transform = croptensor_data_transform)
print("The shape of the first element in the first tuple: ", dataset[0][0].shape)

# Models

## Torch.nn.[Model](https://pytorch.org/docs/stable/nn.html#torch.nn.Module)

## Linear Regression

In [None]:
from torch.nn import Module, Linear, Dropout
from torch.nn.functional import relu

# Customize Linear Regression Class

class linear_regression(Module):
    
    # Constructor
    # in_features = len(W)
    def __init__(self, in_features, out_features):
        
        # Inherit from parent
        super(linear_regression, self).__init__()
        self.linear = Linear(in_features, out_features, bias=True) #TODO
    
    # Prediction function
    # https://pytorch.org/docs/stable/nn.html#torch.nn.Module.forward
    def forward(self, x):
        yhat = self.linear(x)
        return yhat
    
# list( linear_regression(5,1).parameters() ) # W[5], b

class Net(Module):
    def __init__(self,in_size,n_hidden,out_size,p=0):

        # Inherit from parent
        super(Net, self).__init__()
        self.drop = Dropout(p=p)
        self.linear1 = Linear(in_size,n_hidden)
        self.linear2 = Linear(n_hidden,n_hidden)
        self.linear3 = Linear(n_hidden,out_size)

    # Prediction function
    # https://pytorch.org/docs/stable/nn.html#torch.nn.Module.forward
    def forward(self,x):
        x= relu(self.linear1(x))
        x= self.drop(x)
        x= relu(self.linear2(x))
        x= self.drop(x)
        x= self.linear3(x)
        return x

## Sequential

In [None]:
from torch.nn import Sequential, Linear, Sigmoid

model = Sequential( Linear(2,1), Sigmoid() )

### state_dict(), load_state_dict(dict), save & load

In [None]:
from torch import save
from torch import load
from torch.nn import Module, Linear

save({"a": 123}, 'tmp.pt')
tdict = load('tmp.pt')
print(tdict)

model = Linear(5, 1)
save(model.state_dict(), 'model.pt')
model.load_state_dict( load('model.pt') )

print(model.state_dict())

## Activation functions

### Relu

In [None]:
from torch import linspace
from torch.nn.functional import relu

x = linspace(-3, 3, 100, requires_grad = True)
Y = relu(x)

# Cost/Loss

Comparing/differentiating the prediected values (**Y^**) and the actual labels (**Y**)

### Mean Square Error

+ [torch.nn.MSELoss(size_average=None, reduce=None, reduction='elementwise_mean')](https://pytorch.org/docs/stable/nn.html#torch.nn.MSELoss)

In [3]:
from torch.nn import MSELoss

criterion = MSELoss()

# equivalent to:
from torch import mean

def criterion(yhat, y):
    return mean((yhat - y) ** 2)

### Binary Cross Entropy

In [2]:
from torch.nn import BCELoss

criterion = BCELoss()

# equivalent to:
from torch import mean
from torch import log

def criterion(yhat, y):
    return -1 * mean(y * log(yhat) + (1-y) * log(1 - yhat))


### Cross Entropy

In [None]:
from torch.nn import CrossEntropyLoss
# https://pytorch.org/docs/stable/nn.html#crossentropyloss

criterion = CrossEntropyLoss()


# Optimizers

## Adam

In [6]:
from torch.optim import Adam

opt = Adam(model.parameters(), lr=0.01)

NameError: name 'model' is not defined

# Train

In [8]:
# model.train([true]) # sets model.training = true

# NumPy

#### [linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.linspace.html)
+ Return evenly spaced numbers over a specified interval.

In [None]:
from torch import arange
from numpy import linspace

print( linspace(-2, 2 ,5) )
print( arange(-2, 2 ,5).numpy() )

#### [array([]).T](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.ndarray.T.html)

In [None]:
from numpy import array

x = array([[1,2,3], [4, 5, 6]])
print(x)
print(x.T)

In [3]:
# https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.meshgrid.html
# https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.c_.html

# PyTorch
#### torch.tensor( , [requires_grad=True, dtype=torch.int8|uint8|int16/short|half|float|int|double|long, device=cuda0])
+ .zeros()
+ .ones()
+ .pow(2)
+ .sum()
+ .ndimension()
+ .numpy()
+ .shape
+ .dtype
+ [begin_row **\:** end_row **\,** begin_column **\:** end_column]

In [None]:
from torch import ones
from torch import zeros

print(zeros((2,)))
print(ones((2,2)).numpy().shape)

+ [arange(start=0, end, step=1, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False).view()](https://pytorch.org/docs/stable/torch.html#torch.arange)

+ [reshape(input, shape)](https://pytorch.org/docs/stable/torch.html#torch.reshape)

In [None]:
from torch import arange
from torch import reshape

print( arange(-2, 2, 1) ) # 1 Row

print( arange(-2, 2, 1).view(-1, 1) ) # 1 Column
print( reshape(arange(-2, 2, 1), (-1, 1)) ) # same

### Save/Load dict

In [None]:
from torch import save
from torch import load

save({"a": 123}, 'tmp.pt')
tdict = load('tmp.pt')
print(tdict)

## Derivative

### Partial derivative w respect to u/v

In [None]:
import torch
import matplotlib.pylab as plt
import torch.functional as F

# Calculate f(u, v) = v * u + u^2 at u = 1, v = 2

u = torch.tensor(1.0,requires_grad=True)
v = torch.tensor(2.0,requires_grad=True)
f = u * v + u ** 2

f.backward()
print("The result of v * u + u^2: ", f)
print("The partial derivative with respect to u: ", u.grad)
print("The partial derivative with respect to v: ", v.grad)

## Calculate the derivative with multiple values

In [None]:
x = torch.linspace(-10, 10, 10, requires_grad = True)
Y = x ** 2
y = torch.sum(x ** 2)

# [scikit](http://www.scikit-learn.org)

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split

iris = datasets.load_iris()

# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

# TensorFlow

In [None]:
# source ~/venv/bin/activate 
# see https://www.tensorflow.org/install/pip

import tensorflow as tf

(tensor([ 5, 10,  6]), tensor([1, 0, 0]))