## Python Basics with NumPy

In [1]:
import numpy as np
import math

#####  Build a function that returns the sigmoid of a real number x.

##### Use math.exp(x) for the exponential function. 

In [22]:
def sigmoid(z):
    return 1 / (1 + math.exp(-z))

z = np.random.randint(1,5)
a = sigmoid(z)
print(f"Input: {z}")
print(f"Output: {a}")

z = 3 
print(f"\nInput: {z}")
a = sigmoid(z)
print(f"Output: {a}")

Input: 4
Output: 0.9820137900379085

Input: 3
Output: 0.9525741268224334


We rarely use the math librarires in deep learnign operations. In the example of the sigmoid function, if done so on vectors or matrices, they'd produce a TypeError as math.exp() isn't applicable to vectors or matrices, just scalars.

##### Implement the sigmoid function using numpy.

In [27]:
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

z = np.random.randint(1,5)
a = sigmoid(z)
print(f"Input: {z}")
print(f"Output: {a}")

z = 3 
print(f"\nInput: {z}")
a = sigmoid(z)
print(f"Output: {a}")

z = np.random.randint(1, 9, (2,2))
print(f"\nInput: \n{z}")
a = sigmoid(z)
print(f"Output: \n{a}")

Input: 2
Output: 0.8807970779778823

Input: 3
Output: 0.9525741268224334

Input: 
[[2 7]
 [2 8]]
Output: 
[[0.88079708 0.99908895]
 [0.88079708 0.99966465]]


##### Implement the function sigmoid_grad() to compute the gradient of the sigmoid function with respect to its input x. 

The formula is: $sigmoid_{derivative}(x)=σ′(x)=σ(x)(1−σ(x))$

In [29]:
def sigmoid_grad(x):
    s = sigmoid(x)
    ds = s * (1 - s)
    return ds

x = 3
ds = sigmoid_grad(x)

print(f"Input: {x}")
print(f"Output: {ds}")

Input: 3
Output: 0.045176659730912


In [30]:
x = np.array ([1,2,3])
print(f"Input: \n{x}")
ds = sigmoid_grad(x)
print(f"\nOutput: \n{ds}")

Input: 
[1 2 3]

Output: 
[0.19661193 0.10499359 0.04517666]


#### Implement image2vector() that takes an input of shape (length, height, 3) and returns a vector of shape (length*height*3, 1). 

For example, if you would like to reshape an array v of shape (a, b, c) into a vector of shape (a*b,c) you would do:

In [12]:
def image2vector(image): 
    v = image.reshape(image.shape[0] * image.shape[1] * image.shape[2], 1) # dim1 is set to 1, as we want a column, not a 1d array
    # say the image.shape amounts to be 3,3,3 so you'd multiply 3*3 to get the total vals in one matrix and then *3 again to get the total vals in all 3 matrices. Say output is 27. Then, dim 0 in the new vector is equal to 27.
    return v

image = np.random.randint(1, 9, (3, 3, 3)) 
print(f"Input:\n {image}")
print(f"\n Input Shape: \n {image.shape}")

v = image2vector(image)
print(f"\nOutput: \n {v}")
print(f"\nOutput Shape: {v.shape}")

Input:
 [[[4 7 1]
  [8 2 7]
  [4 2 1]]

 [[7 3 8]
  [5 4 1]
  [1 6 1]]

 [[4 1 6]
  [1 7 8]
  [5 4 8]]]

 Input Shape: 
 (3, 3, 3)

Output: 
 [[4]
 [7]
 [1]
 [8]
 [2]
 [7]
 [4]
 [2]
 [1]
 [7]
 [3]
 [8]
 [5]
 [4]
 [1]
 [1]
 [6]
 [1]
 [4]
 [1]
 [6]
 [1]
 [7]
 [8]
 [5]
 [4]
 [8]]

Output Shape: (27, 1)


Another common technique we use in Machine Learning and Deep Learning is to normalize our data. It often leads to a better performance because gradient descent converges faster after normalization. Here, by normalization we mean changing x to $\frac{x}{‖x‖}$
 (dividing each row vector of x by its norm).

For example, if

$x= \begin{pmatrix} 0,3,4\\2,6,4 \end{pmatrix}$

then

$‖x‖=np.linalg.norm(x,axis=1,keepdims=True)= \begin{pmatrix} 5 \\ \sqrt{56} \end{pmatrix}$

and

$x_{normalized}=\frac{x}{‖x‖}= \begin{pmatrix} 0, \frac{3}{5}, \frac{4}{5} \\ \frac{2}{\sqrt{56}}, \frac{6}{\sqrt56}, \frac{4}{\sqrt56} \end{pmatrix}$

Note that you can divide matrices of different sizes and it works fine: this is called broadcasting

Exercise: Implement normalizeRows() to normalize the rows of a matrix. After applying this function to an input matrix x, each row of x should be a vector of unit length (meaning length 1).

In [38]:
def normalizeRows(x):
    X_norm = np.linalg.norm(x, axis = 1, keepdims = True)
    X_normalized = x / X_norm
    print(f"Norm: \n{X_norm} \n")
    return X_normalized

x = np.array([[1, 2, 3,], [4,5,6]])
x_normalized = normalizeRows(x)

print(f"Normalized:\n{x_normalized}")

Norm: 
[[3.74165739]
 [8.77496439]] 

Normalized:
[[0.26726124 0.53452248 0.80178373]
 [0.45584231 0.56980288 0.68376346]]


A very important concept to understand in numpy is "broadcasting". It is very useful for performing mathematical operations between arrays of different shapes. For the full details on broadcasting, you can read the official broadcasting documentation.

Exercise: Implement a softmax function using numpy. You can think of softmax as a normalizing function used when your algorithm needs to classify two or more classes. You will learn more about softmax in the second course of this specialization.

Instructions:

$softmax(x) = softmax\begin{bmatrix}
  x_{11} & x_{12} & x_{13} & \dots  & x_{1n} \\
  x_{21} & x_{22} & x_{23} & \dots  & x_{2n} \\
  \vdots & \vdots & \vdots & \ddots & \vdots \\
  x_{m1} & x_{m2} & x_{m3} & \dots  & x_{mn}
\end{bmatrix} = \begin{bmatrix}
  \frac{e^{x_{11}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{12}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{13}}}{\sum_{j}e^{x_{1j}}} & \dots  & \frac{e^{x_{1n}}}{\sum_{j}e^{x_{1j}}} \\
  \frac{e^{x_{21}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{22}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{23}}}{\sum_{j}e^{x_{2j}}} & \dots  & \frac{e^{x_{2n}}}{\sum_{j}e^{x_{2j}}} \\
  \vdots & \vdots & \vdots & \ddots & \vdots \\
  \frac{e^{x_{m1}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m2}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m3}}}{\sum_{j}e^{x_{mj}}} & \dots  & \frac{e^{x_{mn}}}{\sum_{j}e^{x_{mj}}}
\end{bmatrix} = \begin{pmatrix}
  softmax\text{(first row of x)}  \\
  softmax\text{(second row of x)} \\
  ...  \\
  softmax\text{(last row of x)} \\
\end{pmatrix}$

In [53]:
def softmax(z):
    a = np.exp(z) / np.sum(np.exp(z), axis = 1, keepdims = True)
    return a

z = np.array([[9, 2, 5, 0, 0], [7, 5, 0, 0, 0]])
a = softmax(z)

print(a)

[[9.80897665e-01 8.94462891e-04 1.79657674e-02 1.21052389e-04
  1.21052389e-04]
 [8.78679856e-01 1.18916387e-01 8.01252314e-04 8.01252314e-04
  8.01252314e-04]]


Exercise: Implement the numpy vectorized version of the L1 loss. You may find the function abs(x) (absolute value of x) useful.

Reminder:

The loss is used to evaluate the performance of your model. The bigger your loss is, the more different your predictions (
) are from the true values (
). In deep learning, you use optimization algorithms like Gradient Descent to train your model and to minimize the cost.
L1 loss is defined as

$L_1(\hat{y}, y) = \sum_{i=0}^m|y^{(i)} - \hat{y}^{(i)}| $

In [59]:
def l1_loss(y, yhat):
    l1 = np.sum(np.abs(y - yhat))
    return l1

yhat = np.array(([.9, .2, .1, .4, .9]))
y = np.array(([1, 0, 0 ,1, 1]))

l1 = l1_loss(y, yhat)

print(l1)

1.1


Exercise: Implement the numpy vectorized version of the L2 loss.

$L_2(\hat{y},y) = \sum_{i=0}^m(y^{(i)} - \hat{y}^{(i)})^2 $

In [61]:
def l2_loss(y, yhat):
    l2 = np.sum((y - yhat) ** 2)
    return l2

yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])

l2 = l2_loss(y, yhat)

print(l2)

0.43
