# Lab : Python Basics with Numpy

This assignment gives you a brief introduction on how to code some useful functions in Python 3. 

**After this assignment you will:**
- Be able to use iPython Notebooks (create Python files with extention .IPYNB)
- Be able to use numpy functions and numpy matrix/vector operations such as np.sum, np.dot, np.multiply, np.maximum,  np.exp, np.log, and np.reshape.
- Understand the concept of "broadcasting"
- Be able to vectorize code and compute L1 and L2 loss functions

## About iPython Notebooks ##

Project Jupyter proposed in 2014 by Fernando Pérez. 
Reference to the 3 core programming languages supported Julia, Python, R. 
Open-source software, open-standards,  language agnostic 
web-based interactive computational environment for creating notebook documents. 
Ordered list of cells which can contain code, text (Markdown: using Markup language – html, latex), maths, plots. 

iPython Notebooks are interactive coding environments embedded in a webpage. You will be using iPython notebooks in this course. After writing your code, you can run the cell by clicking on "Run" in the upper bar of the notebook. 

To undo text entry press "CTRL+Z". 

**Exercise**: Set test to `"Hello World"` in the cell below. Add a new cell and print test. 

In [3]:
test = "Hello World"


In [5]:
print("test: "+ test)

test: Hello World


**Expected output**:
test: Hello World

## 1 - Building basic functions with numpy ##

Numpy is the main package for scientific computing in Python. It is maintained by a large community (www.numpy.org). In this exercise you will learn several numpy functions such as np.exp, np.log, np.reshape.

### 1.1 - sigmoid function, np.exp() ###

To refer to a function belonging to a specific package you could call it using package_name.function().

Before using np.exp(), you will use math.exp() to implement the sigmoid function. You will then see why np.exp() is preferable to math.exp().

**Note**:
$sigmoid(x) = \frac{1}{1+e^{-x}}$ is also known as the logistic function. It is a non-linear function used both in Machine Learning (e.g. Logistic Regression method) and Deep Learning (DL).

<img src="images/Sigmoid.png" style="width:400px;height:200px;">

**Exercise**: Complete the function *basic_sigmoid(x)* that returns the sigmoid of a real number x. Use **math.exp(x)** for the exponential function.

In [None]:
import math

def basic_sigmoid(x):
    """
    Compute sigmoid of x.

    Arguments:
    x -- a scalar value

    Return:
    s -- sigmoid(x)
    """
    

    s= ?
   
    
    return s

In [None]:
# Call function basic_sigmoid to compute sigmoid of 3. 
?

**Expected Output**: 0.9525741268224334


The inputs of the functions in "math" library are scalar real numbers. 
In Machine Learning (ML) we mostly use matrices and vectors. This is why numpy is more useful. 

In [None]:
# Call function basic_sigmoid to compute the sigmoid of vector x. 
# It will not work, why ?
x = [1, 2, 3]

#?

If $ x = (x_1, x_2, ..., x_n)$ is a row vector then $np.exp(x)$ will apply the exponential function to every element of x. 
The output will be also a vector: $np.exp(x) = (e^{x_1}, e^{x_2}, ..., e^{x_n})$


In [2]:
# Access numpy functions by writing np.function_name() instead of numpy.function_name()
import numpy as np  

# Use np.array() do create the array [1, 2, 3].
#x = ?

# Apply np.exp() to x and print the result.

np.exp?

If x is a vector (array), Python operations (e.g. $s = x + 3$; $s = \frac{1}{x}$ will output s as a vector of the same size as x.

In [4]:
# Use np.array() do create the array [1, 2, 3].
x = 

#Print the result of operation (x+3)

#Print the result of operation (1/x)



[4 5 6]
[1.         0.5        0.33333333]


If you need more info on a numpy function, look at [the official documentation](https://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.exp.html). 

You can also  write `np.exp?` to get quick access to the documentation.

Data structures used in numpy to represent vectors, matrices are called numpy arrays. 
$$ \text{For } x \in \mathbb{R}^n \text{,     } sigmoid(x) = sigmoid\begin{pmatrix}
    x_1  \\
    x_2  \\
    ...  \\
    x_n  \\
\end{pmatrix} = \begin{pmatrix}
    \frac{1}{1+e^{-x_1}}  \\
    \frac{1}{1+e^{-x_2}}  \\
    ...  \\
    \frac{1}{1+e^{-x_n}}  \\
\end{pmatrix}\tag{1} $$

**Exercise**: Complete *sigmoid* function using numpy. 
x can now be a scalar, vector, matrix. 

In [None]:
def sigmoid(x):
    """
    Compute the sigmoid of x

    Arguments: x - a scalar or numpy array of any size

    Return:  s - sigmoid(x)
    """
    
    s = ?

    return s

In [None]:
#create the array [1, 2, 3] 
x = ?
# Call function sigmoid to compute the sigmoid of array x.
?


**Expected Output**:  array([ 0.73105858,  0.88079708,  0.95257413])


### 1.2 - Sigmoid gradient

Gradients are very important to train ML models.

**Exercise**: Implement function *sigmoid_derivative()* to compute the gradient of the sigmoid function with respect to its input x. 
The formula is: $sigmoid\_derivative(x) = sigmoid(x) (1 - sigmoid(x))\tag{2}$


In [None]:
def sigmoid_derivative(x):
    """
    Compute the gradient (also called the slope or derivative) of 
    the sigmoid function with respect to x.
     
    Arguments:  x - a scalar number or numpy array
    Return:    ds - the computed gradient.
    """
        
    ds = ?
        
    return ds

In [None]:
#create the array [1, 2, 3] 
x = ?
# Compute the gradients of x  
?

**Expected Output**: [ 0.19661193  0.10499359  0.04517666]

### 1.3 - Reshaping arrays ###

Two useful numpy functions are [np.shape](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.shape.html) and [np.reshape()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html). 
- X.shape is used to get the shape (dimension) of a matrix/vector X. 
- X.reshape(...) is used to reshape X into some other dimension. 

For example, in computer science, an RGB image is represented as a 3D array of shape $(length, height, depth = 3)$. However, when the image is an input of a ML algorithm you may need to convert it to a column vector of shape $(length*height*3, 1)$. In other words, you "unroll", or reshape, the 3D array into a 1D vector.

<img src="images/image2vector_kiank.png" style="width:500px;height:300;">

**Exercise**: Implement `image2vector()` that takes an input of shape (length, height, depth) and returns a vector of shape (length\*height\*depth, 1). 

- Extract the image dimensions with `image.shape`.

In [None]:
def image2vector(image):
    """
    Argument: image, a numpy array of shape (length, height, depth)
    
    Returns:    v, a column vector of shape (length*height*depth, 1)
    """
  
    v = ?
    
    return v

In [None]:

image = np.array([[[ 0.67826139,  0.29380381],  
        [ 0.90714982,  0.52835647],
        [ 0.4215251 ,  0.45017551]],

       [[ 0.92814219,  0.96677647],
        [ 0.85304703,  0.52351845],
        [ 0.19981397,  0.27417313]],

       [[ 0.60659855,  0.00533165],
        [ 0.10820313,  0.49978937],
        [ 0.34144279,  0.94630077]]])

#Compute the shape (dimension) of image.

#Call function image2vector and check how the shape of image changed
?


**Expected Output**: vector column with shape (18,1)

### 1.4 - Normalizing rows and Broadcasting

Useful technique in ML is data normalization. The  optimization algorithms converge faster after normalization. There are different types of normalization, here we apply $ \frac{x}{\| x\|} $ (divide each row of x by its norm 2).

For example, if $$x = 
\begin{bmatrix}
    0 & 3 & 4 \\
    2 & 6 & 4 \\
\end{bmatrix}\tag{3}$$ then $$\| x\| = np.linalg.norm(x, axis = 1, keepdims = True) = \begin{bmatrix}
    5 \\
    \sqrt{56} \\
\end{bmatrix}\tag{4} $$

 norm 2 ($\| x\|$) is computed as: 
 
 $\sqrt(0^2+3^2+4^2) =5$ (norm of row 1)
 
 $\sqrt(2^2+6^2+4^2) =\sqrt(56)$ (norm of row 2)   
 
then

$$ x\_normalized = \frac{x}{\| x\|} = \begin{bmatrix}
    0 & \frac{3}{5} & \frac{4}{5} \\
    \frac{2}{\sqrt{56}} & \frac{6}{\sqrt{56}} & \frac{4}{\sqrt{56}} \\
\end{bmatrix}\tag{5}$$ 

The norm of each row of $x\_normalized $ is 1, known also as unit length.  

**Exercise**: Implement normalizeRows() to normalize the rows of a matrix. After applying this function to an input matrix x, each row of x should be a vector of unit length. 

**Broadcasting**

You can divide matrices of different sizes: x.shape=(2,3) ,  x_norm.shape=(2,1), and **x/x_norm** works fine due to **python broadcasting**.

Python will copy x_norm 3 times and will apply for each column of x. 

An important concept in numpy is "**broadcasting**". It is very useful for performing math operations between arrays of different shapes. 

For more inf on broadcasting, read the official [broadcasting documentation](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html).

**General Principle of Broadcasting**: 

A is (m,n) matrix, B is (1,n) matrix  => Python will copy m times B and will do element-wise operations (+, -, *, /) between A and B. 



In [None]:
def normalizeRows(x):
    """
    Normalizes each row of matrix x to have unit length.
    
    Argument:
    x -- A numpy matrix of shape (n, m)
    
    Returns:
    x -- Normalized (by row) numpy matrix. 
    """
    
    # Compute x_norm as norm 2 of x. 
    # Use np.linalg.norm(..., axis = ..., keepdims = True)
    # In Python axis=0 is the vertical axis (columns) 
    # axis=1 is the horizontal axis (rows)
    
    x_norm = 
    
    # Divide x by its norm to get the normalized (by row) matrix
    𝑥_normalized = ?
   
    return 𝑥_normalized

In [None]:
x = np.array([
    [0, 3, 4],
    [1, 6, 4]])

#Apply normalizeRows to x
?

**Expected Output**: 

 [[ 0.          0.6         0.8       ]
 
 [ 0.13736056  0.82416338  0.54944226]]

    
   


## 2) Vectorization

**WHENEVER POSSIBLE AVOID EXPLICIT FOR-LOOPs !!!**

In ML, we deal with large datasets. Hence, a non-computationally-optimal function can result in a model that takes ages to run. 
The vectorized implementation is more efficient. 

**Note** that `np.dot()` performs a matrix-matrix or matrix-vector multiplication. This is different from `np.multiply()` and`*` operator which perform an element-wise multiplication ( equivalent to  `.*` in Matlab/Octave). 

In [None]:
# What is the difference between the following implementations of 
# dot product.

import time
#generate matrix a with random values 
a=np.random.rand(1000000) 
#What is the shape of a
?
# #generate matrix b with the same shape as a  
b= ?

### VECTORIZED DOT PRODUCT OF VECTORS ###
start_time = time.perf_counter()
# call np.dot to implement matrix multiplication between a and b
c= ?

timer = time.perf_counter() - start_time
print(timer)
print(c)

### FOR LOOP DOT PRODUCT OF VECTORS ###
c=0
tic = time.time()
for i in range(1000000):
    c += a[i]*b[i]
toc=time.time()
print ( " for loop = " + str(1000*(toc-tic)) + "ms")
print(c)


### 3. Implement L1 and L2 loss functions

**Exercise**: Implement numpy vectorized version to compute L1 loss function. Use abs(x) to compute absolute value of x. 

$$\begin{align*} & L_1(\hat{y}, y) = \sum_{i=0}^m|y^{(i)} - \hat{y}^{(i)}| \end{align*}\tag{6}$$

In [None]:
def L1(yhat, y):
    """
    Arguments:
    yhat -- vector of size m (predicted values/labels)
    y -- vector of size m (true value/labels)
    
    Returns:loss - value of L1 loss function defined above
    """
    
    loss = ?
    
    return loss

In [None]:
yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])

#Call function L1 to compute L1 loss 

 ?

**Expected Output**:    *L1* = 1.1 

**Exercise**: Implement the numpy vectorized version to compute L2 loss function: $\begin{align*} & L_2(\hat{y},y) = \sum_{i=0}^m(y^{(i)} - \hat{y}^{(i)})^2 \end{align*}\tag{7}$
 

In [None]:
def L2(yhat, y):
    """
    Arguments:
    yhat -- vector of size m (predicted value/labels)
    y -- vector of size m (true value/labels)
    
    Returns: loss - value of L2 loss function defined above
    """
    
     loss = ?

    return loss

In [None]:
yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])

#Compute L2 loss 

**Expected Output**: *L2* = 0.43

### Note on python/numpy vectors

Broadcasting operations have both strengths and weaknesses. Strength because it lets you get a lot done even with just a single line of code. But there's also weakness because with broadcasting sometimes it's possible you can introduce very subtle strange looking bugs. For example, if you take a column vector and add it to a row vector, you would expect an error message. But you might actually get back a matrix as a sum of a row vector and a column vector. 

Don't use data structures where the shape is (n, ), i.e. rank 1 array. 
Instead, every time you create an array, make it explicitly a column or a row vector.  

Check this with the code below. 

In [None]:
a = np.random.randn(5)  # rank 1 array 

# Compute the shape of a
?

# Print a and a_transpose (a.T) and see there is no difference between them
?


#You can reshape rank 1 array to have two dimensions
#a=a.reshape((5,1))
#print(a.shape)

#or better use 
#b = np.random.randn(5,1) 
#print(b.shape)
