<div style="font-size: 200%; font-weight: bold; color: maroon;">401_Intro_to_numpy</div>


## Some introduction to numpy

numpy (most frequently imported as np) is the linear algebra library for python environments. 

In order to work -to implement vectorization, which is the basis of the computational advantage of numpy- you have first to define a numpy object: a matrix.

Once defined a matrix, ie x, you can simply call vectorized functions using np. syntax.

For instance:

- np.exp(x) works for any np.array x and applies the exponential function to every coordinate

In summary, numpy has efficient built-in functions for computing matrices, it is fast because it vectorizes the computations


The following is adapted from the great Andrew Ng's coursera on deep learning

## Important numpy links / reference material 

numpy official user guide (good, but long):  https://numpy.org/doc/stable/numpy-user.pdf

one simple / quick numpy cheatsheet : https://s3.amazonaws.com/dq-blog-files/numpy-cheat-sheet.pdf

yeat another (longer) cheatsheet: http://datacamp-community-prod.s3.amazonaws.com/ba1fe95a-8b70-4d2f-95b0-bc954e9071b0

# Vectorization


In machine or deep learning, you deal with very large datasets. Hence, a non-computationally-optimal function can become a huge bottleneck in your algorithm and can result in a model that takes ages to run. To make sure that your code is  computationally efficient, you will use vectorization. For example, try to tell the difference between the following implementations of the dot/outer/elementwise product.

In [2]:
import time
import random
import numpy as np

x1 = [random.random() for e in range(10**4)]  # remember list comprehension
x2 = [random.random() for e in range(10**4)]

#print(x1)
#print(x2)


In [3]:
print(len(x1))
n1 = np.array(x1)
n1.shape  #shape solo en obejtos numpy

10000


(10000,)

In [7]:
matriz = np.array([[1., 0., 0.],
                   [0., 1., 2.]])
matriz.shape   #axis 0->2   axis 1->3


dtype('float64')

In [11]:
unos = np.ones((2,3,4), dtype=int)
unos

array([[[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]]])

In [12]:
ran = np.random.rand(2, 3, 4)
ran

array([[[0.74675572, 0.97685082, 0.84379403, 0.41979389],
        [0.83380888, 0.7183372 , 0.32266541, 0.89531003],
        [0.99267648, 0.96775906, 0.90376308, 0.15291189]],

       [[0.10825803, 0.11139605, 0.64200599, 0.37628683],
        [0.76221891, 0.06103522, 0.3091854 , 0.9930198 ],
        [0.9249055 , 0.33290977, 0.99249007, 0.53997838]]])

In [13]:
ran[0,2,1]

0.967759055607709

In [18]:
data1 = np.array([[1, 2], 
                 [3, 4], 
                 [5, 6]])
data1

array([[1, 2],
       [3, 4],
       [5, 6]])

In [20]:
data1.max()

6

For the next example code, we will make basic matrix computations **without** numpy, this is using classical for loops. 

Later we will do the same but using vectorization, ie, numpy

NOTE:

- np.zeros(x, y) produces a matrix with zeros with dimensions x, y

- np.random.rand(x, y) creates a random float numbers matrix with dimensions x, y

In [21]:
### CLASSIC DOT PRODUCT OF VECTORS IMPLEMENTATION ###
tic = time.process_time()
dot = 0
for i in range(len(x1)):
    dot+= x1[i]*x2[i]
toc = time.process_time()
print ("dot = " + str(dot))
print ("\n ----- Computation time dot product (for) = " + str(1000*(toc - tic)) + "ms")


### CLASSIC OUTER PRODUCT IMPLEMENTATION ###
tic = time.process_time()
outer = np.zeros((len(x1),len(x2))) # we create a len(x1)*len(x2) matrix with only zeros
for i in range(len(x1)):
    for j in range(len(x2)):
        outer[i,j] = x1[i]*x2[j]
toc = time.process_time()
#print ("outer = " + str(outer))
print ("\n ----- Computation time outer prod (for)= " + str(1000*(toc - tic)) + "ms")

### CLASSIC ELEMENTWISE IMPLEMENTATION ###
tic = time.process_time()
mul = np.zeros(len(x1))
for i in range(len(x1)):
    mul[i] = x1[i]*x2[i]
toc = time.process_time()
#print ("elementwise multiplication = " + str(mul))
print ("\n ----- Computation time elementwise (for)= " + str(1000*(toc - tic)) + "ms")

### CLASSIC GENERAL DOT PRODUCT IMPLEMENTATION ###
W = np.random.rand(3,len(x1)) # Random 3*len(x1) numpy array
tic = time.process_time()
gdot = np.zeros(W.shape[0])
for i in range(W.shape[0]):
    for j in range(len(x1)):
        gdot[i] += W[i,j]*x1[j]
toc = time.process_time()
#print ("gdot = " + str(gdot))
print ("\n ----- Computation time dot prod general (for) = " + str(1000*(toc - tic)) + "ms")

dot = 2515.956269714853

 ----- Computation time dot product (for) = 15.256158000001463ms

 ----- Computation time outer prod (for)= 48474.208836000005ms

 ----- Computation time elementwise (for)= 4.002757999998607ms

 ----- Computation time dot prod general (for) = 25.204635999998004ms


# Exercise

Look for the np methods for:

1. dot product of vectors

2. outer product of vectors

3. elementwise multiplication

4. general dot product of W (previously generated) and x1


In [22]:
### VECTORIZED DOT PRODUCT OF VECTORS ###
tic = time.process_time()
prodescalar = np.dot(x1, x2)
toc = time.process_time()
print ("\n ----- Computation time dot prod (np) = " + str(1000*(toc - tic)) + "ms")

### VECTORIZED OUTER PRODUCT ###
tic = time.process_time()
outernp = np.outer(x1, x2)
toc = time.process_time()
print ("\n ----- Computation time outer (np) = " + str(1000*(toc - tic)) + "ms")

### VECTORIZED ELEMENTWISE MULTIPLICATION ###
tic = time.process_time()
elementwise = np.multiply(x1, x2)
toc = time.process_time()
print ("\n ----- Computation time elementwise (np) = " + str(1000*(toc - tic)) + "ms")

### VECTORIZED GENERAL DOT PRODUCT ###
tic = time.process_time()
general = np.dot(W, x1)
toc = time.process_time()
print ("\n ----- Computation time dot prod general (np) = " + str(1000*(toc - tic)) + "ms")


 ----- Computation time dot prod (np) = 5.199277999999197ms

 ----- Computation time outer (np) = 1203.406729000001ms

 ----- Computation time elementwise (np) = 3.1477799999990452ms

 ----- Computation time dot prod general (np) = 1.5353810000036106ms


As you may have noticed, the vectorized implementation is much cleaner and more efficient. For bigger vectors/matrices, you simply **cannot compute them without numpy**. 

**Note** that `np.dot()` performs a matrix-matrix or matrix-vector multiplication. This is different from `np.multiply()` and the `*` operator (which is equivalent to  `.*` in Matlab/Octave), which performs an element-wise multiplication.

# Loss and cost functions for binary target 
## i.e. as logistic regression, or those linking x and y through sigmoid function

**Reminder**:

- The loss is used to evaluate the performance of your model. The bigger your loss is, the more different your predictions ($ \hat{y} $) are from the true values ($y$). 
- In deep learning, you use optimization algorithms like Gradient Descent to train your model and to minimize the cost.
- For binary targets loss is defined as:

For one example $x^{(i)}$:
$$z^{(i)} = w^T x^{(i)} + b \tag{1}$$
$$\hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})\tag{2}$$ 
$$ \mathcal{L}(a^{(i)}, y^{(i)}) =  - y^{(i)}  \log(a^{(i)}) - (1-y^{(i)} )  \log(1-a^{(i)})\tag{3}$$

The cost is then computed by summing over all training examples:
$$ J = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})\tag{6}$$

- media


In [23]:
# ERROR FUNCTION

def loss(a, y):
    """
    Arguments:
    yhat = a =  vector of size m (predicted labels)
    y -- vector of size m (true labels)
    
    Returns:
    loss -- the value of the loss function defined above **for each target element**
    """
    
    loss = -y * np.log(a) - (1-y) * (np.log(1-a))              # compute cost    
    
    return loss

In [24]:
a = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])
print("loss = " + str(loss(a,y)))

loss = [0.10536052 0.22314355 0.10536052 0.91629073 0.10536052]


**Expected Output**:

<table style="width:20%">
     <tr> 
       <td> **loss** </td> 
       <td> [0.10536052 0.22314355 0.10536052 0.91629073 0.10536052] </td> 
     </tr>
</table>

# Exercise you must solve

## Write the cost function

**Exercise**: 

Once you have computed the loss function you can easily compute (using numpy) the cost function (which is not more than the average of the loss functions across the y or actual target vector).

The cost can be computed by summing over all training examples:
$$ J = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})\tag{6}$$


In [25]:
# FUNCTION: compute cost

def cost(a, y):
    """
    Computes the cost function by summing loss over all training examples.
    
    Arguments:
    a -- A numpy vector or array
    y -- A scalar or numpy vector
    
    You must obtain
    m -- the size of y (use len(y))
    
    Return:
    cost -- Your computed cost.
    """
    
    ### START CODE HERE ### (≈ 2 lines of code)
    ### HINT : Define first m as a property of the input object y
    m = len(y)
    cost = 1/m * np.sum(loss(a, y))
    
    ### END CODE HERE ###
    
    return cost

In [26]:
a = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])
print("loss = " + str(loss(a,y)))
print("cost = " + str(cost(a,y)))

loss = [0.10536052 0.22314355 0.10536052 0.91629073 0.10536052]
cost = 0.29110316603236874


In [None]:
# FUNCTION: compute cost including the error in the same line of code

def cost(a, y):
    """
    Computes the cost function by summing loss over all training examples.
    
    Arguments:
    a -- A numpy vector or array
    y -- A scalar or numpy vector
    
    You must obtain
    m -- the size of y (use len(y))
    
    Return:
    cost -- Your computed cost.
    """
    
    ### START CODE HERE ### (≈ 2 lines of code)

    
    ### END CODE HERE ###
    
    return cost

In [None]:
print("cost = " + str(cost(a, y)))


# What to remember
- Vectorization is very important in deep learning. It provides computational efficiency and clarity.
- You have reviewed the loss and cost functions for binary targets
- You are familiar with many numpy functions such as np.sum, np.dot, np.multiply, np.maximum, etc...