

## Introduction to Deep Learning Frameworks

---








### Why using deep learning frameworks?

- Scales machine learning code
- Computes gradients!
- Standardizes machine learning applications for sharing
- Zoo of Deep Learning frameworks available with different advantages, paradigms, levels of abstraction, programming languages...
- Interface with GPU for parallel processing

**What is Tensorflow?** 
- Open source software library for numerical computation using data flow graphs
- Originally developed by Google Brain Team to conduct machine learning research
- Tensorflow is an interface for expressing machine learning algorithms and an implementation for executing such algorithms

Big idea: express a numeric computation as a graph
- graph nodes are operations which have any number of inputs and outputs
- graph edges are tensors which flow between nodes

### Tensors
Tensors are the primary data structure to operate on the computational graph.
Main ways to create tensors in Tensorflow: 
1. **Fixed tensors:**
  - Create a zero filled tensor: 
  ```zero_tsr = tf.zeros([row_dim, col_dim]) ```
  - Create a one filled tensor: 
  ```ones_tsr = tf.ones([row_dim, col_dim])```
  - Create a constant filled tensor: 
  ```filled_tsr = tf.fill([row_dim, col_dim], 42)```
  - Create a tensor out of an existing constant: 
  ```constant_tsr = tf.constant([1,2,3])```

2. **Tensors of similar shape:  **   
    ```zeros_similar = tf.zeros_like(constant_tsr) ```     
    ```ones_similar = tf.ones_like(constant_tsr)```
  
3. **Sequence tensors** (tensors contain defined intervals):     
  - ```linear_tsr = tf.linspace(start=0, stop=1, start=3) ```
  - ```integer_seq_tsr = tf.range(start=6, limit=15, delta=3)```
4. **Random tensors: **
  - uniform distribution: ```randunif_tsr = tf.random_uniform([row_dim, col_dim], minval=0, maxval=1)```
  - normal distribution: ```randnorm_tsr = tf.random_normal([row_dim, col_dim], mean=0.0, stddev=1.0)```
  - truncated_normal() function picks normal values within two standard deviations of the specified mean: ```runcnorm_tsr = tf.truncated_normal([row_dim, col_dim],
mean=0.0, stddev=1.0)```
5. **randomizing entries of arrays.** 
  - ```shuffled_output = tf.random_shuffle(input_tensor)```


### Placeholders and Variables
**Variables** are the *parameters* of the algorithm and TensorFlow keeps track of how to change these to optimize the algorithm.     

**Placeholders** are objects that allow you to feed in data of a specifc type and shape and depend on the results of the computational graph, such as the expected outcome of a computation.

1. create a variable: 
    - declare a variable: variable = tf.Variable(tensor)    
```Ex: my_var = tf.Variable(tf.zeros([2,3])```      
    - initialize the variable:     
    ```sess = tf.Session()```     
```initialize_op = tf.global_variables_initializer ()```    
```sess.run(initialize_op)```

2. Placeholders:  
  - assign data type, assign a shape of a tensor   
  ```x = tf.placeholder(tf.float32, shape=[2,2])```
  - get data from a feed_dict argument in the session. 
  ``` y = tf.identity(x) -> identity operation```
  ``` x_vals = np.random.rand(2,2)```
  ``` sess.run(y, feed_dict={x: x_vals})```

### Operations

1. standard operations on tensors: add(), sub(), mul(), and div().
2. tf.abs() : absolute value of one input tensor
3. tf.ceil() : Ceiling funtion
4. tf.maximum(): Element-wise max of two tensors
....

### Activation Functions

Activation functions are non-linear operations that act on tensors    

1. ReLU = max(0,x):     
```tf.nn.relu([-3., 3., 10.])```
2. ReLU6 = min(max(0,x),6):     
```tf.nn.relu6([-3., 3., 10.])```
3. Sigmoid function = 1/(1+exp(-x)) :    
```tf.nn.sigmoid([-1., 0., 1.])```
4. tanh = ((exp(x)- exp(-x))/(exp(x)+exp(-x)):      
```tf.nn.tanh([-1., 0., 1.])```


### Loss Functions

#### 1. Loss Functions for regression: 
  - **L2 norm** (Euclidean Loss function): square of the distance to the target      
  ```l2_y_vals = tf.square(target - x_vals)```
  ```l2_y_out = sess.run(l2_y_vals)```
  
  - **L1 norm** (Absolute loss function): instead of squaring the difference, we take the absolute value    
  ```l1_y_vals = tf.abs(target - x_vals)```
  
#### 2. Loss function for classification:     
  - **Hinge loss**: compute a loss between with two target classes, 1 and -1      
  ```hinge_y_vals = tf.maximum(0., 1. - tf.mul(target, x_vals))```

  - **Cross-entropy loss** (logistic loss): measure a distance from the actual class (0 or 1) to the predicted value, which is usually a real number between 0 and 1.    ```xentropy_y_vals = - tf.mul(target, tf.log(x_vals)) - tf.mul((1. -target), tf.log(1. - x_vals))```
  
  - **Sigmoid cross entropy**:     
  ```xentropy_sigmoid_y_vals = tf.nn.sigmoid_cross_entropy_with_logits(x_vals, targets)```
  
  - **Weighted cross-entropy loss**:weighted version of the sigmoid cross entropy loss. We provide a weight on the positive target     
  ```weight = tf.constant(0.5)```     
```xentropy_weighted_y_vals = tf.nn.weighted_cross_entropy_with_logits(x_vals, targets, weight)```

  - **Softmax cross-entropy loss**: measure a loss when there is only one target category instead of multiple.  the function transforms the outputs into a probability distribution via the softmax function and then computes the loss function from a true probability distribution.     
  ```unscaled_logits = tf.constant([[1., -3., 10.]])```     
  ```target_dist = tf.constant([[0.1, 0.02, 0.88]])```      
  ```softmax_xentropy = tf.nn.softmax_cross_entropy_with_logits(unscaled_logits, target_dist)```
  
  

### Implementing Backpropagation

1. Created the data.
2. Initialized placeholders and variables.
3. Created a loss function.
4. Defned an optimization algorithm.
5. And fnally, iterated across random data samples to iteratively update our variables

```python
loss = tf.square(my_output - y_target)
my_opt = tf.train.GradientDescentOptimizer(learning_rate=0.02)
train_step = my_opt.minimize(loss)```

Another optimization algorithm:    
  - MomentumOptimizer()
  - AdagradOptimizer()
  - AdadeltaOptimizer()
  - AdamOptimizer()    
  ....

Example: we want to compute ReLU activation 

![ReLU activation](https://cdn-images-1.medium.com/max/1600/1*G9MXGOM2jWl3SOXuqRlV3A.png)

Three principal type of node in tensorflow:    
- **Variables** are stateful nodes which output their current value **(W, b)**.   
- **Placeholders** are nodes whose value is fed in at execution time. We don't give any initial values, just assign a data type, and we assign a shape of a tensor so the graph still knows what to compute even though it doesn't have any stored values yet **(x)**.     
- **Mathematical operations**: ex **MatMul, Add, ReLU**

In [0]:
# create graph - backbone of our model
import tensorflow as tf
import numpy as np

b = tf.Variable(tf.zeros((100,)))
W = tf.Variable(tf.random_uniform((784,100), -1 , 1))

x = tf.placeholder(tf.float32, (100,784))

h = tf.nn.relu(tf.matmul(x,W) + b)

In [5]:
# create a session and fit our numerical input to our graph
sess = tf.Session()
sess.run(tf.initialize_all_variables())
sess.run(h, {x: np.random.rand(100, 784)})

array([[  0.        ,   9.38900852,   8.5437212 , ...,   0.        ,
          2.11395288,  11.79295635],
       [  0.        ,  11.15999031,   7.56981277, ...,   0.        ,
          6.30800915,  12.28312874],
       [  0.        ,   8.55609322,   5.16542864, ...,   0.        ,
          5.78558874,  10.00546741],
       ..., 
       [  0.        ,   7.19157362,   9.9484663 , ...,   5.70030022,
          3.15153456,   0.        ],
       [  0.        ,   8.64635563,   2.45155334, ...,   3.20237422,
          0.19523752,   6.6025753 ],
       [  0.        ,  17.83238029,   4.84830666, ...,   0.        ,
          0.        ,   9.50206089]], dtype=float32)

In [0]:
# create loss function
