[![install](
https://anaconda.org/yoga1290/pytorch/badges/installer/ipynb.svg)](https://anaconda.org/yoga1290/pytorch/notebook) [![downloads](https://anaconda.org/yoga1290/pytorch/badges/downloads.svg)](https://anaconda.org/yoga1290/pytorch/notebook) [![Anaconda Cloud](
https://anaconda.org/yoga1290/pytorch/badges/version.svg)](https://anaconda.org/yoga1290/pytorch/notebook)
# NOTE: WIP, visit me later!

# Outline
+ [Setup](#Environment-setup)
+ **Training**: adjust W & b
    + Initialization
        + [Dataset & DataLoader](#Dataset)
        + **Ecoph**: iteration in the training phase
    + Forward Propagation
        + [Model](#Models)
            + **in_features**: size of W
            + **out_features**: classes
            + Linear Classifiers: returns positive & negative values
            + Logistic Regression: returns [0 - 1] values
            + Threshold function: returns either 0 or 1 
            + [Linear Regression](#Linear-Regression)
            + Autoencoders
                + Shallow Neural Networks
                + Uses deterministic approach
                + Dimensionality Reduction
                + **RBM: Restricted Boltzman Machine**
                    + Shallow Neural Networks (2 layers)
                    + feature extraction
                    + Uses a stochastic approach
                    + Generative model
                        + "We can do both supervise and unsupervised tasks with generative models" [[ref](https://labs.cognitiveclass.ai/tools/jupyterlab/lab/tree/labs/ML0120EN/ML0120EN-4.1-Review-RBMMNIST.ipynb)]
                    + 2 Phase model [[ref](https://labs.cognitiveclass.ai/tools/jupyterlab/lab/tree/labs/ML0120EN/ML0120EN-4.1-Review-RBMMNIST.ipynb)]:
                        + Forward pass: $p(h|v) = sigmoid(X \otimes W + hb)$
                        + Backward Pass/Reconstruction: $p(v|h) = sigmoid(h0 \otimes transpose(W) + vb)$
                    + **DBN: Deep Believe Network**
                        + feature extraction is unsupervised by a stack of RBMs.
                        + Solves the backward propagation, local minima & vanishing gradients
            + **Recurrent Model/RNN**
                + May use time-window to store state
                + Vanishing/Exploding gradient may occurs
                + **LSTM: Long Short Term Memory networks**
                    + Keep/Read/Write Gates
                    + solves the vanishing/exploding gradient
                    + Unfloaded LSTM
                    + Stacked LSTM
                    
            + **Weight Initialization**
                + Zeros
                    + the derivative with respect to loss function is the same for every w; similar to linear model [read me](https://medium.com/usf-msds/deep-learning-best-practices-1-weight-initialization-14e5c0295b94)
                + **Xavier initialization** [see](https://medium.com/usf-msds/deep-learning-best-practices-1-weight-initialization-14e5c0295b94)
                    + Tanh
                + He/Kaiming uniform initialization
                    + ReLU
                + Uniform distribution
                + Normal distribution
                    + Vanishing gradients
                    + Exploding gradients
                        + "This may result in oscillating around the minima or even overshooting the optimum again and again and the model will never learn" [see](https://medium.com/usf-msds/deep-learning-best-practices-1-weight-initialization-14e5c0295b94)
        + **Normalization**:
            + "Normalization is a technique often applied as part of data preparation for machine learning. The goal of normalization is to change the values of numeric columns in the dataset to use a common scale, without distorting differences in the ranges of values or losing information" [see](https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/normalize-data#module-overview)
            + Batch Normalization
                + "batch normalization normalizes the output of a previous activation layer by subtracting the batch mean and dividing by the batch standard deviation." [see](https://towardsdatascience.com/batch-normalization-in-neural-networks-1ac91516821c#b87b)
                + Remove Dropout
                + Reduce Internal Covariate Shift
                + Increase learning rate
                + Bias is not necessary
                + USE `model.eval()` before perdication ($\hat{y}$)
        + **Regularization** [[ref](https://towardsdatascience.com/l1-and-l2-regularization-methods-ce25e7fc831c)]
            + L1 Regularization
                + Loss += $\lambda \sum | \beta | $
                + "shrinks the less important feature’s coefficient to zero"
                + "works well for feature selection in case we have a huge number of features"
                + Lasso Regression
                + LASSO: Least Absolute Shrinkage and Selection Operator
            + L2 Regularization
                + Ridge Regression
                + Loss += $ \lambda \sum \beta ^2 $
            + Dropout
                + USE `model.train()` before training
                + USE `model.eval()` before perdication ($\hat{y}$)
        + **Activation functions**:
            + Tanh
                + zero centered [-1, 1]
                + **Xavier initialization** [see](https://medium.com/usf-msds/deep-learning-best-practices-1-weight-initialization-14e5c0295b94)
                + Vanishing gradient
            + Sigmoid
                + [0, 1]
                + Initialization
                + Vanishing gradient
                + **Binary** classification
            + ReLU
                + [0, 1]
                + "With RELU(z) vanishing gradients are generally not a problem as the gradient is 0 for negative (and zero) inputs and 1 for positive inputs." [read me](https://medium.com/usf-msds/deep-learning-best-practices-1-weight-initialization-14e5c0295b94)
            + Softmax
                + **Multi-class** classification
                
    + **Loss/Cost**: the difference between the prediected values,
        $\hat{y}$ and true labels, $y$
        + [Derivative](#Derivative)
        + [Mean Square Error](#Mean-Square-Error)
        + [Binary Cross Entropy](#Binary-Cross-Entropy)
        + [Cross Entropy](#Cross-Entropy)
            + **Multi-class** "This criterion expects a class index (0 to C-1) as the target for each value" [PyTorch](https://pytorch.org/docs/stable/nn.html#torch.nn.CrossEntropyLoss)
    + **Backward propagation**:    
        + **Optimization**: updates W & b in the Backward propagation
            + [Adam optimizer](#Adam)
            + Gradient Descent Optimization
                + Batch Gradient Descent
                + Mini-Batch Gradient Descent (PyTorch's **default**)
                + Stochastic Gradient Descent
                    + Update loss by one sample at a time
                    + Sudden increases may occur
                    + May not be accurate
                    + Good for big data
+ **Validation**: adjust the hyper-parameters; learning rate & batch size
    + **Early Stopping**:
        + "Stop training when a monitored quantity has stopped improving" [[tf.keras](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/EarlyStopping)]
        + or validation error just got worse

+ Tools & Libraries
    + **Visualization**
        + Pandas
        + Matplotlib
    + [Numpy](#NumPy)
+ Cheatsheets
    + [ml-cheatsheet.readthedocs.io](https://ml-cheatsheet.readthedocs.io)

# Environment setup

## Install
+ Install [Anaconda](https://www.anaconda.com/download/#linux)
+ Expose `~/anaconda3/bin` (where `conda` executable biniary)
+ Install [PyTorch](https://pytorch.org/): `conda install pytorch torchvision -c pytorch`

## Online tools

### [Google CoLaboratory](https://colab.research.google.com) [[open me!](https://colab.research.google.com/github/yoga1290/cheatsheets/blob/master/PyTorch.ipynb)]

Install PyTorch on [Google CoLaboratory](https://colab.research.google.com/notebooks/snippets/importing_libraries.ipynb#scrollTo=RHXKNvj8ROgq)

In [8]:
# https://colab.research.google.com/notebooks/snippets/importing_libraries.ipynb#scrollTo=RHXKNvj8ROgq
from os.path import exists
from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())
cuda_output = !ldconfig -p|grep cudart.so|sed -e 's/.*\.\([0-9]*\)\.\([0-9]*\)$/cu\1\2/'
accelerator = cuda_output[0] if exists('/dev/nvidia0') else 'cpu'

!pip install -q http://download.pytorch.org/whl/{accelerator}/torch-0.4.1-{platform}-linux_x86_64.whl torchvision
import torch

^C
[31mOperation cancelled by user[0m


# Gradient Descent Optimization

+ [PUML](https://raw.githubusercontent.com/yoga1290/cheatsheets/master/gradient-descent.puml)
![Gradient Descent](https://github.com/yoga1290/cheatsheets/raw/master/gradient-descent.png)


# Dataset

In [None]:
from torch.utils.data import Dataset, DataLoader
from torch import arange, randn

# https://stanford.edu/~shervine/blog/pytorch-how-to-generate-data-parallel#dataset
class MyDataset(Dataset):
    # Constructor
    def __init__(self):
        self.x = arange(-3, 3, 0.1).view(-1, 1)
        self.f = 1 * self.x - 1
        self.y = self.f + 0.1 * randn(self.x.size())
        self.len = self.x.shape[0]
        
    # Getter
    def __getitem__(self,index):    
        return self.x[index],self.y[index]
    
    # Get Length
    def __len__(self):
        return self.len

    
params = {'batch_size': 64,
          'shuffle': True,
          'num_workers': 6}
# https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader
dataLoader = DataLoader(MyDataset(), **params)

# for X, y in dataLoader

## Prebuilt dataset

### MNIST

In [None]:
import torchvision.transforms as transforms
import torchvision.datasets as dsets

dataset = dsets.MNIST(
    root = './data2', 
    train = False, 
    download = True, 
    transform = transforms.ToTensor()
)

In [None]:
import matplotlib.pylab as plt

def show_data(data_sample, shape = (28, 28)):
    plt.imshow(data_sample[0].numpy().reshape(shape), cmap='gray')
    plt.title('y = ' + str(data_sample[1].item()))

show_data(dataset[0])

## Torchvision Transforms

+ Compose
+ CenterCrop
+ ToTensor

In [None]:
from torchvision.transforms import Compose
from torchvision.transforms import CenterCrop
from torchvision.transforms import ToTensor
import torchvision.datasets as dsets

croptensor_data_transform = Compose([
    CenterCrop(20),
    ToTensor()
])

# set train = false for validation
dataset = dsets.MNIST(root = './data', train = False, download = True, transform = croptensor_data_transform)
print("The shape of the first element in the first tuple: ", dataset[0][0].shape)

# Models

![source: https://youtu.be/zJSY2C9xzoU](https://github.com/yoga1290/cheatsheets/raw/4346bd63cf490d978cee156497db3b64f87fc37e/resources/nn-3.3.2-zJSY2C9xzoU.png)
![source: https://youtu.be/zJSY2C9xzoU](https://github.com/yoga1290/cheatsheets/raw/4346bd63cf490d978cee156497db3b64f87fc37e/resources/nn-3.3.2-2-zJSY2C9xzoU.png)
![source: https://youtu.be/xR4Ian1UIGM](https://github.com/yoga1290/cheatsheets/raw/4346bd63cf490d978cee156497db3b64f87fc37e/resources/nn2-xR4Ian1UIGM.png)
> sources [[1](https://youtu.be/xR4Ian1UIGM)] [[2](https://youtu.be/xR4Ian1UIGM)] [[3](https://youtu.be/xR4Ian1UIGM)]

## Torch.nn.[Model](https://pytorch.org/docs/stable/nn.html#torch.nn.Module)

### Linear Regression

In [None]:
from torch.nn import Module, Linear, Dropout
from torch.nn.functional import relu

# Customize Linear Regression Class
class linear_regression(Module):
    def __init__(self, in_features, out_features):
        # Inherit from parent
        super(linear_regression, self).__init__()
        self.linear = Linear(in_features, out_features, bias=True) #TODO
    def forward(self, x):
        yhat = self.linear(x)
        return yhat

### Neural Network

In [None]:
# torch.nn.init.kaiming_uniform_(linear.weight,nonlinearity='relu')
# https://labs.cognitiveclass.ai/tools/jupyterlab/lab/tree/labs/DL0110EN/5.1.2.He_Initialization.ipynb
# https://labs.cognitiveclass.ai/tools/jupyterlab/lab/tree/labs/DL0110EN/5.3.1BachNorm.ipynb

from torch.nn import Module, ModuleList, Linear, Dropout, BatchNorm1d
from torch.nn.init import kaiming_uniform_, xavier_uniform_
from torch.nn.functional import relu, tanh

class Net(Module):
    # Constructor
    # in_features = len(W)
    def __init__(self,Layers, p=0):
        # Inherit from parent
        super(Net,self).__init__()
        self.hidden = ModuleList()
        self.drop = Dropout(p=p)

        for input_size,output_size in zip(Layers,Layers[1:]):
            linear = Linear(input_size,output_size)
            
            # Uniform initialization
            #linear.weight.data.uniform_(0, 1)
            
            # He/Kaiming uniform initialization
            #kaiming_uniform_(linear.weight, nonlinearity='relu')
            
            # Xavier initialization
            #xavier_uniform_(linear.weight)
            
            self.hidden.append( linear )
            
            # Batch Normalization
            # self.hidden.append( BatchNorm1d(output_size) )
            
            # Dropout
            # self.hidden.append( Dropout(p=p) )

    # Prediction function
    # https://pytorch.org/docs/stable/nn.html#torch.nn.Module.forward
    def forward(self,x):
        L=len(self.hidden)
        for (l, linear_transform)  in zip(range(L),self.hidden):
            if l<L-1:
                x = relu(linear_transform (x))
                #x = tanh(linear_transform (x))
                #x = self.drop(x)
            # last layer
            else:
                x =linear_transform (x)
        
        return x #yhat
    
    #def activation(self,x):
    #    return x

## Sequential

In [None]:
from torch.nn import Sequential, Linear, Sigmoid

model = Sequential( Linear(2,1), Sigmoid() )

### state_dict(), load_state_dict(dict), save & load

In [None]:
from torch import save
from torch import load
from torch.nn import Module, Linear

save({"a": 123}, 'tmp.pt')
tdict = load('tmp.pt')
print(tdict)

model = Linear(5, 1)
save(model.state_dict(), 'model.pt')
model.load_state_dict( load('model.pt') )

print(model.state_dict())

## Activation functions

### Relu

In [None]:
from torch import linspace, tensor
from torch.nn.functional import relu

x = linspace(-3, 3, 100, requires_grad = True)
Y = relu(x)

z = tensor([[1,0,-1],[2,0,-2],[1,0,-1]])
relu(z)

# Criterion/Cost/Loss

Comparing/differentiating the prediected values (**Y^**) and the actual labels (**Y**)

### Mean Square Error

+ [torch.nn.MSELoss(size_average=None, reduce=None, reduction='elementwise_mean')](https://pytorch.org/docs/stable/nn.html#torch.nn.MSELoss)

In [None]:
from torch.nn import MSELoss

criterion = MSELoss()

# equivalent to:
from torch import mean

def criterion(yhat, y):
    return mean((yhat - y) ** 2)

### Binary Cross Entropy

In [None]:
from torch.nn import BCELoss

criterion = BCELoss()

# equivalent to:
from torch import mean
from torch import log

def criterion(yhat, y):
    return -1 * mean(y * log(yhat) + (1-y) * log(1 - yhat))


### Cross Entropy

In [None]:
from torch.nn import CrossEntropyLoss
# https://pytorch.org/docs/stable/nn.html#crossentropyloss

criterion = CrossEntropyLoss()


# Optimizers

## Adam

In [None]:
from torch.optim import Adam

opt = Adam(model.parameters(), lr=0.01)

# Train

In [None]:
# model.train([true]) # sets model.training = true

# https://labs.cognitiveclass.ai/tools/jupyterlab/lab/tree/labs/DL0110EN/5.1.1Xaviermist1layer.ipynb
from torch.optim import SGD
from torch.nn import CrossEntropyLoss
from torch.utils.data import DataLoader

def train(model,criterion, train_loader,validation_loader, optimizer, epochs=100):
    i=0
    useful_stuff={'training_loss':[],'validation_accuracy':[]}  
    
    #n_epochs
    for epoch in range(epochs):
        for i,(x, y) in enumerate(train_loader):

            #clear gradient 
            optimizer.zero_grad()
            #make a prediction logits 
            z=model(x.view(-1,28*28))
            # calculate loss 
            loss=criterion(z,y)
    
            # calculate gradients of parameters 
            loss.backward()
            # update parameters 
            optimizer.step()
            useful_stuff['training_loss'].append(loss.data.item())
        correct=0
        for x, y in validation_loader:
            #perform a prediction on the validation  data  
            yhat= model(x.view(-1,28*28))
            
            _,lable=torch.max(yhat,1)
            correct+=(lable==y).sum().item()
 
    
        accuracy=100*(correct/len(validation_dataset))
   
        useful_stuff['validation_accuracy'].append(accuracy)
    
    return useful_stuff

train_dataset=dsets.MNIST(root='./data', train=True, download=True, transform=transforms.ToTensor())
validation_dataset=dsets.MNIST(root='./data', train=False, download=True, transform=transforms.ToTensor())
train_loader= DataLoader(dataset=train_dataset,batch_size=2000,shuffle=True)
validation_loader= DataLoader(dataset=validation_dataset,batch_size=5000,shuffle=False)

criterion= CrossEntropyLoss()
model= Net(layers)
optimizer= SGD(model.parameters(),lr=learning_rate) # momentum=0.4)

learning_rate=0.01
training_results=train(model,criterion, train_loader,validation_loader, optimizer, epochs=epochs)

# [Convolution](https://pytorch.org/docs/stable/nn.html#torch.nn.Conv2d)

<img src = "https://ibm.box.com/shared/static/wq8wbqhm4824y1oxpdbol55q645gykg9.gif" width = 500, align = "center">

> source: [cognitiveclass.ai](https://labs.cognitiveclass.ai/tools/jupyterlab/lab/tree/labs/DL0110EN/6.1.1What%20is%20Convolution.ipynb)

+ [MaxPool2d](https://pytorch.org/docs/stable/nn.html#torch.nn.MaxPool2d)
+ kernels count = **in_channels**
+ 1 bais

+ [Algorithmia](https://blog.algorithmia.com/convolutional-neural-nets-in-pytorch/)

In [None]:
from torch.nn import Conv2d
from torch import tensor

conv1 = Conv2d(in_channels=1, out_channels=1,kernel_size=2,stride=3)
conv1.state_dict()['weight'][0][0] = tensor([[1.0,1.0],[1.0,1.0]])
conv1.state_dict()['bias'][0] = 0.0
conv1.state_dict()

z1 = conv1(image1)

print("z4:",z4)
print("z4:",z4.shape[2:4])

# NumPy

#### [linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.linspace.html)
+ Return evenly spaced numbers over a specified interval.

In [None]:
from torch import arange
from numpy import linspace

print( linspace(-2, 2 ,5) )
print( arange(-2, 2 ,5).numpy() )

#### [array([]).T](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.ndarray.T.html)

In [None]:
from numpy import array

x = array([[1,2,3], [4, 5, 6]])
print(x)
print(x.T)

In [None]:
# https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.meshgrid.html
# https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.c_.html

# PyTorch
#### torch.tensor( , [requires_grad=True, dtype=torch.int8|uint8|int16/short|half|float|int|double|long, device=cuda0])
+ .zeros()
+ .ones()
+ .pow(2)
+ .sum()
+ .ndimension()
+ .numpy()
+ .shape
+ .dtype
+ [begin_row **\:** end_row **\,** begin_column **\:** end_column]

In [None]:
from torch import ones
from torch import zeros

print(zeros((2,)))
print(ones((2,2)).numpy().shape)

+ [arange(start=0, end, step=1, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False).view()](https://pytorch.org/docs/stable/torch.html#torch.arange)

+ [reshape(input, shape)](https://pytorch.org/docs/stable/torch.html#torch.reshape)

In [None]:
from torch import arange
from torch import reshape

print( arange(-2, 2, 1) ) # 1 Row

print( arange(-2, 2, 1).view(-1, 1) ) # 1 Column
print( reshape(arange(-2, 2, 1), (-1, 1)) ) # same

### Save/Load dict

In [None]:
from torch import save
from torch import load

save({"a": 123}, 'tmp.pt')
tdict = load('tmp.pt')
print(tdict)

## Derivative

### Partial derivative w respect to u/v

In [None]:
import torch
import matplotlib.pylab as plt
import torch.functional as F

# Calculate f(u, v) = v * u + u^2 at u = 1, v = 2

u = torch.tensor(1.0,requires_grad=True)
v = torch.tensor(2.0,requires_grad=True)
f = u * v + u ** 2

f.backward()
print("The result of v * u + u^2: ", f)
print("The partial derivative with respect to u: ", u.grad)
print("The partial derivative with respect to v: ", v.grad)

## Calculate the derivative with multiple values

In [None]:
x = torch.linspace(-10, 10, 10, requires_grad = True)
Y = x ** 2
y = torch.sum(x ** 2)

# [scikit](http://www.scikit-learn.org)

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split

iris = datasets.load_iris()

# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

# Convolution

+ NumPy
    + np.convolve(x, h, "valid|same|full")
        + valid: no padding
        + same: adds zeros to left (& top)
        + full: padding
        

+ PyTorch

```python
    from torch.nn import Conv2d
    from torch import tensor

    conv1 = Conv2d(in_channels=1, out_channels=1,kernel_size=2,stride=3)
    conv1.state_dict()['weight'][0][0] = tensor([[1.0,1.0],[1.0,1.0]])
    conv1.state_dict()['bias'][0] = 0.0
    conv1.state_dict()

    z1 = conv1(image1)
```

+ TensorFlow

```python
    input = tf.Variable(tf.random_normal([1, 10, 10, 1]))
    filter = tf.Variable(tf.random_normal([3, 3, 1, 1]))
    op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID')
    op2 = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
```

+ SciPy

```python
    from scipy import signal
    input = tf.Variable(tf.random_normal([1, 10, 10, 1]))
    filter = tf.Variable(tf.random_normal([3, 3, 1, 1]))
    op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID')
    op2 = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
```

# TensorFlow


You have two basic options when using TensorFlow to run your code [[ref](https://labs.cognitiveclass.ai/tools/jupyterlab/lab/tree/labs/ML0120EN/ML0120EN-2.2-Review-CNN-MNIST-Dataset.ipynb)]:
- [Build graphs and run session] Do all the set-up and THEN execute a session to evaluate tensors and run operations (ops) 
- [Interactive session] create your coding and run on the fly.


In [2]:
# https://labs.cognitiveclass.ai/tools/jupyterlab/lab/tree/labs/ML0120EN/ML0120EN-1.4-Review-LogisticRegressionwithTensorFlow.ipynb

import tensorflow as tf

# Create a graph
graph1 = tf.Graph()

# Variables must be initialized
init_op = tf.global_variables_initializer()

v = tf.Variable(0)
update = tf.assign(v, v+1)

# Placeholder can feed data outside of a graph
ph = tf.placeholder(tf.float32)

# Loss
a = tf.Variable(20.0)
b = tf.Variable(30.2)
y = a * train_x + b
loss = tf.reduce_mean(tf.square(y - train_y))
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

# optimizer
optimizer = tf.train.GradientDescentOptimizer(0.05)

# train
train = optimizer.minimize(loss)

with graph1.as_default():
    constant = tf.constant([2], name = 'constant_a') #return tf.Tensor object
    session.run(init_op)
    
    
# Interactive session
sess = tf.InteractiveSession()

# Session for graph1
sess = tf.Session(graph = graph1)
result = sess.run(a)
print(result)
sess.close()

# Session + Graph
with tf.Session(graph = graph1) as sess:
    result = sess.run(a)
    print(result)
    session.run(init_op)

# Initialization
numFeatures = trainX.shape[1]
numLabels = trainY.shape[1]
# Placeholders
# 'None' means TensorFlow shouldn't expect a fixed number in that dimension
X = tf.placeholder(tf.float32, [None, numFeatures])
yGold = tf.placeholder(tf.float32, [None, numLabels]) # This will be our correct answers matrix for 3 classes.
weights = tf.Variable(tf.random_normal([numFeatures,numLabels],
                                       mean=0,
                                       stddev=0.01,
                                       name="weights"))
bias = tf.Variable(tf.random_normal([1,numLabels],
                                    mean=0,
                                    stddev=0.01,
                                    name="bias"))

# Three-component breakdown of the Logistic Regression equation.
# Note that these feed into each other.
apply_weights_OP = tf.matmul(X, weights, name="apply_weights")
add_bias_OP = tf.add(apply_weights_OP, bias, name="add_bias") 
activation_OP = tf.nn.sigmoid(add_bias_OP, name="activation")

#Defining our cost function - Squared Mean Error
cost_OP = tf.nn.l2_loss(activation_OP-yGold, name="squared_error_cost")

# Number of Epochs in our training
numEpochs = 700
# Defining our learning rate iterations (decay)
learningRate = tf.train.exponential_decay(learning_rate=0.0008,
                                          global_step= 1,
                                          decay_steps=trainX.shape[0],
                                          decay_rate= 0.95,
                                          staircase=True)

#Defining our Gradient Descent
training_OP = tf.train.GradientDescentOptimizer(learningRate).minimize(cost_OP)

UnboundLocalError: local variable 'self' referenced before assignment

## Stacked LSTM

In [None]:
# https://labs.cognitiveclass.ai/tools/jupyterlab/lab/tree/labs/ML0120EN/ML0120EN-3.1-Reveiw-LSTM-basics.ipynb
# https://labs.cognitiveclass.ai/tools/jupyterlab/lab/tree/labs/ML0120EN/ML0120EN-3.2-Review-LSTM-LanguageModelling.ipynb
from tensorflow import placeholder
from tensorflow.nn import dynamic_rnn
from tensorflow.contrib.rnn import LSTMCell, BasicLSTMCell, MultiRNNCell

LSTM_CELL_SIZE = 4 #hidden nodes

stacked_lstm = MultiRNNCell([
                    BasicLSTMCell(LSTM_CELL_SIZE, forget_bias=0.0),
                    LSTMCell(LSTM_CELL_SIZE),    
                    LSTMCell(LSTM_CELL_SIZE) ])
output, state = dynamic_rnn(stacked_lstm, data, dtype=tf.float32)

sess = tf.Session()
sess.run(tf.global_variables_initializer())
sess.run(output, feed_dict={data: sample_input}) # TODO fix feed_dict placeholders

In [None]:
import torch
z = torch.tensor([[1,0,-1],[2,0,-2],[1,0,-1]])
torch.nn.functional.relu(z)

In [None]:
import tensorflow as tf