# sigmoid function

$sigmoid(x) = \frac{1}{1+e^{-x}}$ <br>
used in ML (Logistic Regression) and DL

In [5]:
import numpy as np

# vector X
x=np.array([1,2,3])

x can be either a real number, a vector, or a matrix. 
$$ \text{For } x \in \mathbb{R}^n \text{,     } sigmoid(x) = sigmoid\begin{pmatrix}
    x_1  \\
    x_2  \\
    ...  \\
    x_n  \\
\end{pmatrix} = \begin{pmatrix}
    \frac{1}{1+e^{-x_1}}  \\
    \frac{1}{1+e^{-x_2}}  \\
    ...  \\
    \frac{1}{1+e^{-x_n}}  \\
\end{pmatrix}\tag{1} $$

#### sigmoid 

In [7]:
def sigmoid(x):
    s=1/(1+np.exp(-x))
    return s


sigmoid(x)

array([0.73105858, 0.88079708, 0.95257413])

## sigmoid gradient 

with respect to input vector x
to optimize loss functions using backpropagation<br>

Formula: <br><br> $$sigmoid\_derivative(x) = \sigma'(x) = \sigma(x) (1 - \sigma(x))\tag{2}$$
<br>
2 steps:
1. s=sigmoid of x
2. $\sigma'(x) = s(1-s)$

In [8]:
def sigmoid_derivative(x):
    """
    Compute the gradientof the sigmoid function with respect to its input x.
    """
    
    s=1/(1+np.exp(-x))
    ds=s*(1-s)
    return ds

In [9]:
sigmoid_derivative(x)

array([0.19661193, 0.10499359, 0.04517666])

## reshape (unroll) the image matrix na 1d-vector 

In [10]:
# This is a 3 by 3 by 2 array, typically images will be (num_px_x, num_px_y,3) where 3 represents the RGB values
image = np.array([[[ 0.67826139,  0.29380381],
        [ 0.90714982,  0.52835647],
        [ 0.4215251 ,  0.45017551]],

       [[ 0.92814219,  0.96677647],
        [ 0.85304703,  0.52351845],
        [ 0.19981397,  0.27417313]],

       [[ 0.60659855,  0.00533165],
        [ 0.10820313,  0.49978937],
        [ 0.34144279,  0.94630077]]])


In [12]:
image.shape

(3, 3, 2)

In [18]:
# This is a 3 by 3 by 2 array, typically images will be (num_px_x, num_px_y,3) where 3 represents the RGB values
def reshape_image(image):
    """
    Argument:
    image -- a numpy array of shape (length, height, depth)
    Returns:
    v -- a row vector of shape (length*height*depth, 1)
    """
    
    v = image.shape
    v=image.reshape(1, v[0]*v[1]*v[2])
    
    return v

In [19]:
reshape_image(image)

array([[0.67826139, 0.29380381, 0.90714982, 0.52835647, 0.4215251 ,
        0.45017551, 0.92814219, 0.96677647, 0.85304703, 0.52351845,
        0.19981397, 0.27417313, 0.60659855, 0.00533165, 0.10820313,
        0.49978937, 0.34144279, 0.94630077]])

## Normalizing rows 

better performance, gradient descent converges faster after normalization<br>
<br>Here, normalization = changing x to $ \frac{x}{\| x\|} $ (dividing each row vector of x by its norm).

example: if $$x = 
\begin{bmatrix}
    0 & 3 & 4 \\
    2 & 6 & 4 \\
\end{bmatrix}\tag{3}$$ then $$\| x\| = np.linalg.norm(x, axis = 1, keepdims = True) = \begin{bmatrix}
    5 \\
    \sqrt{56} \\
\end{bmatrix}\tag{4} $$and        $$ x\_normalized = \frac{x}{\| x\|} = \begin{bmatrix}
    0 & \frac{3}{5} & \frac{4}{5} \\
    \frac{2}{\sqrt{56}} & \frac{6}{\sqrt{56}} & \frac{4}{\sqrt{56}} \\
\end{bmatrix}\tag{5}$$ Note that you can divide matrices of different sizes and it works fine: this is called broadcasting and you're going to learn about it in part 5.


Function `normalizeRows()` will normalize the rows of a matrix
After applying this function to an input matrix x, each row of x should be a vector of unit length (meaning length 1).

In [8]:
import numpy as np
x=np.array([[0,3,4], [2,6,4]])
x

array([[0, 3, 4],
       [2, 6, 4]])

In [11]:
def normalizeRows(x):
    
    x_norm = np.linalg.norm(x,axis=1,keepdims=True)
    x = x/x_norm
    return x

In [12]:
normalizeRows(x)

array([[0.        , 0.6       , 0.8       ],
       [0.26726124, 0.80178373, 0.53452248]])

# Softmax 

Softmax, a normalizing function used when the algorithm needs to classify two or more classes.

- $ \text{for } x \in \mathbb{R}^{1\times n} \text{,     } softmax(x) = softmax(\begin{bmatrix}
    x_1  &&
    x_2 &&
    ...  &&
    x_n  
\end{bmatrix}) = \begin{bmatrix}
     \frac{e^{x_1}}{\sum_{j}e^{x_j}}  &&
    \frac{e^{x_2}}{\sum_{j}e^{x_j}}  &&
    ...  &&
    \frac{e^{x_n}}{\sum_{j}e^{x_j}} 
\end{bmatrix} $ 

- $\text{for a matrix } x \in \mathbb{R}^{m \times n} \text{,  $x_{ij}$ maps to the element in the $i^{th}$ row and $j^{th}$ column of $x$, thus we have: }$  $$softmax(x) = softmax\begin{bmatrix}
    x_{11} & x_{12} & x_{13} & \dots  & x_{1n} \\
    x_{21} & x_{22} & x_{23} & \dots  & x_{2n} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    x_{m1} & x_{m2} & x_{m3} & \dots  & x_{mn}
\end{bmatrix} = \begin{bmatrix}
    \frac{e^{x_{11}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{12}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{13}}}{\sum_{j}e^{x_{1j}}} & \dots  & \frac{e^{x_{1n}}}{\sum_{j}e^{x_{1j}}} \\
    \frac{e^{x_{21}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{22}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{23}}}{\sum_{j}e^{x_{2j}}} & \dots  & \frac{e^{x_{2n}}}{\sum_{j}e^{x_{2j}}} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    \frac{e^{x_{m1}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m2}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m3}}}{\sum_{j}e^{x_{mj}}} & \dots  & \frac{e^{x_{mn}}}{\sum_{j}e^{x_{mj}}}
\end{bmatrix} = \begin{pmatrix}
    softmax\text{(first row of x)}  \\
    softmax\text{(second row of x)} \\
    ...  \\
    softmax\text{(last row of x)} \\
\end{pmatrix} $$

In [13]:
def softmax(x):

    x_exp = np.exp(x)
    x_sum = np.sum(x_exp, axis=1, keepdims=True)
    s = x_exp/x_sum

    return s

In [22]:
x = np.array([
    [9, 2, 5, 0, 0],
    [7, 5, 0, 0 ,0]])

a=softmax(x)
print(a,'\n')
print('shape of x: \t\t{0}\nshape of softmax(x): \t{1}'.format(x.shape, a.shape))

[[9.80897665e-01 8.94462891e-04 1.79657674e-02 1.21052389e-04
  1.21052389e-04]
 [8.78679856e-01 1.18916387e-01 8.01252314e-04 8.01252314e-04
  8.01252314e-04]] 

shape of x: 		(2, 5)
shape of softmax(x): 	(2, 5)


## L1 and L2 loss functions
= used to evaluate the performance of the model (difference between the predictions(ŷ) and the true values (y)

#### L1 loss:
$$\begin{align*} & L_1(\hat{y}, y) = \sum_{i=0}^m|y^{(i)} - \hat{y}^{(i)}| \end{align*}\tag{6}$$

In [36]:
def L1(yhat, y):
    loss = sum(abs(y-yhat))   
    return loss

In [39]:
yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])
print('loss: {:.2f}'.format(L1(yhat, y)))

loss: 1.10


#### L2 loss 
L2 loss: $$\begin{align*} & L_2(\hat{y},y) = \sum_{i=0}^m(y^{(i)} - \hat{y}^{(i)})^2 \end{align*}\tag{7}$$

In [40]:
def L2(yhat, y):
    loss = np.dot((y-yhat),(y-yhat))
    return loss

In [41]:
yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])
print('loss: {:.2f}'.format(L2(yhat, y)))

loss: 0.43


*based of the **Neural Networks and Deep Learning** course on Coursera 