## Python and Numpy - Vectorization

Numpy is a library that allows for linear algebra functions to take place to perform operations on data

<a name="toc_40015_3"></a>
# Vectors
<a name="toc_40015_3.1"></a>
<img align="right" src="./images/C1_W2_Lab04_Vectors.PNG" style="width:340px;" >Vectors are ordered arrays of numbers. The elements of a vector are all the same type and cannot contain e.g. letters and numbers. The number of elements in the array is often referred to as the *DIMENSION/RANK*. The vector shown has a dimension of $n$. The elements of a vector can be referenced with an index. In math settings, indexes typically run from 1 to n. In computer science indexing will typically run from 0 to n-1. 


<a name="toc_40015_3.2"></a>
## NumPy Arrays

NumPy's basic data structure is an indexable, n-dimensional *array* containing elements of the same type (`dtype`) where dimension/rank refers to number of indexes of an array Above, it was the number of elements in the vector, here, dimension refers to the number of indexes of an array.

 - 1-D array, shape (n,): n elements indexed [0] through [n-1]
 

In [15]:
import numpy as np 
import time

In [16]:
a = np.zeros(4)
print(f"a = {a} , shape of a = {np.shape(a)} , data type of a = {a.dtype}")

a = np.zeros(4, );
print(f"a = {a} , shape of a = {np.shape(a)} , data type of a = {a.dtype}")

a = np.random.random_sample(4)
print(f"a = {a} , shape of a = {np.shape(a)} , data type of a = {a.dtype}")

a = np.random.rand(4)
print(f"a = {a} , shape of a = {np.shape(a)} , data type of a = {a.dtype}")

a = [0. 0. 0. 0.] , shape of a = (4,) , data type of a = float64
a = [0. 0. 0. 0.] , shape of a = (4,) , data type of a = float64
a = [0.44236513 0.04997798 0.77390955 0.93782363] , shape of a = (4,) , data type of a = float64
a = [0.5792328  0.53516563 0.80204309 0.24814448] , shape of a = (4,) , data type of a = float64


some functions do not actually shape a tuple:

In [17]:
a = np.arange(4, 10, 2) # start , stop, step
print(a)

a = np.arange(4,) # up to 4 , increment by 1 (default)
print(a)

[4 6 8]
[0 1 2 3]


# Operations on Vectors

## Indexing:

In [18]:
a = np.arange(10)
print(a)

print(a[2].shape)
print(a[2]) # Accessing a element returns a scalar

print(a[-1])

try :
    c = a[10]
    print(c)
except Exception as e:
    print("The error message you'll see is:")
    print(e)

[0 1 2 3 4 5 6 7 8 9]
()
2
9
The error message you'll see is:
index 10 is out of bounds for axis 0 with size 10


## Slicing:

In [19]:
a = np.arange(10)
print(f"a             {a}")
#access 5 consecutive elements (start:stop:step)

c = a[2:7:1]
print(f"a[2:7:1] =    {c}")

# access 3 elements separated by two 
c = a[2:7:2]
print(f"a[2:7:1] =    {c}")

# access all elements index 3 and above
c = a[3:]
print(f"a[3:] =       {c}")

# access all elements up to index 3 (inclusive)
c = a[:3]
print(f"a[:3] =       {c}")

# access all elements
c = a[:]
print(f"a[:] =       {c}")

a             [0 1 2 3 4 5 6 7 8 9]
a[2:7:1] =    [2 3 4 5 6]
a[2:7:1] =    [2 4 6]
a[3:] =       [3 4 5 6 7 8 9]
a[:3] =       [0 1 2]
a[:] =       [0 1 2 3 4 5 6 7 8 9]


# Single Vector Operations

In [20]:
a = np.array([1, 2, 3, 4])
print(f"a =             {a}")

# negate elements of a
b = -a
print(f"b = -a :        {b}")

# sum all elements of a, returns scalar (single value)
b = np.sum(a)
print(f"b = np.sum(a) : {b}")

b = np.mean(a)
print(f"b = np.mean(a) : {b}")

b = a**2
print(f"b = a**2       : {b}")

a =             [1 2 3 4]
b = -a :        [-1 -2 -3 -4]
b = np.sum(a) : 10
b = np.mean(a) : 2.5
b = a**2       : [ 1  4  9 16]


## Vector with element wise operations
-> Most of the NumPy arithmetic, logical and comparison operations apply to vectors as well
$$ \mathbf{a} + \mathbf{b} = \sum_{i=0}^{n-1} a_i + b_i $$

In [21]:
a = np.array([1, 2, 3, 4])
b = np.array([-1, -2, 3, 4])
print(a + b)

c = np.array([1, 2])
try:
    d = a + c # c is incompatible with a and b due to different sizes
except Exception as e:
    print("The error message is:", e)

[0 0 6 8]
The error message is: operands could not be broadcast together with shapes (4,) (2,) 


In [22]:
a = np.array([1, 2, 3, 4])
b = 5 * a
print(b)

[ 5 10 15 20]


## Vector dot product

<img src="./images/C1_W2_Lab04_dot_notrans.gif" width=600> 

## Comparing the performance of our own .dot_product() method vs np.dot()

In [23]:
def my_dot(a, b):
    result = 0
    for i in range(a.shape[0]):
        result += a[i] * b[i]
    return result

In [24]:
a = np.random.random_sample(5)
b = np.random.random_sample(5)
print(f"my_dot({a}, {b}) : \n Dot product: ", my_dot(a, b))

my_dot([0.59096694 0.32950282 0.98797985 0.86846315 0.16452144], [0.80277814 0.02503441 0.89766393 0.41783454 0.53671084]) : 
 Dot product:  1.8207124743135992


In [25]:
a = np.random.random_sample(5)
b = np.random.random_sample(5)
c = np.dot(a, b) # scalar

print(f"Dot product: {c}, np.dot(a, b).shape = {c.shape}")
# c is scalar so shape is 0

Dot product: 1.004621042645282, np.dot(a, b).shape = ()


In [53]:
np.random.seed(1)
a = np.random.rand(10000000)  # very large arrays
b = np.random.rand(10000000)

tic = time.time()  # capture start time
c = np.dot(a, b)
toc = time.time()  # capture end time

print(f"np.dot(a, b) =  {c:.4f}")
print(f"Vectorized version duration: {1000*(toc-tic):.4f} ms ")

tic = time.time()  # capture start time
c = my_dot(a,b)
toc = time.time()  # capture end time

print(f"my_dot(a, b) =  {c:.4f}")
print(f"loop version duration: {1000*(toc-tic):.4f} ms ")

del(a)
del(b)  #remove these big arrays from memory

np.dot(a, b) =  2501072.5817
Vectorized version duration: 6.5253 ms 
my_dot(a, b) =  2501072.5817
loop version duration: 3733.8207 ms 


# MATRICES:

Matrices are 2D arrays which store the same type. denoted with bold letter like $\mathbf{X}$ . m is number of rows and n is number of columns so we need 3D index (CS uses indexing from 0 to n-1)

<figure>
    <center> <img src="./images/C1_W2_Lab04_Matrices.PNG"  alt='missing'  width=900><center/>
    <figcaption> Generic Matrix Notation, 1st index is row, 2nd is column </figcaption>
<figure/>

## Matrix Creation

We can use Numpy to create matrices where is uses brackets to denote each dimension (rank)

In [54]:
a = np.zeros((5, 5)) # np.zeros((rowNum, colNum))
print(a)
print("Dimension:", np.shape(a))

print()

a = np.zeros((2, 1)) 
print(a)
print("Dimension:", np.shape(a))

print()

a = np.random.random_sample((1, 1)) # onyl one element (1 row and 1 column)
print(a)
print("Dimension:", np.shape(a))

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]
Dimension: (5, 5)

[[0.]
 [0.]]
Dimension: (2, 1)

[[0.44236513]]
Dimension: (1, 1)


## Operations on Matrices

### Indexing

We use 2 indices to describe elements in matrices [row, column]

### To turn array into matrix:

In [68]:
a = np.arange(6)
print(a)

print()

a = np.reshape(a, (-1, 2))
# -1 refers to adjusting dimension , 2 refers to number of columns
print(f"matrix:\n  {a}, \nshape: {np.shape(a)}")

print(a[1][0]) # element at second row, first column
print(a[2, 0]) # element at index 2 , index 0 (3rd row, first column)

[0 1 2 3 4 5]

matrix:
  [[0 1]
 [2 3]
 [4 5]], 
shape: (3, 2)
2
4


### Matrix Slicing

Slicing creates an array of indices using 3 values (`start:stop:step`)



In [75]:
a = np.arange(20).reshape(-1, 10) # reshapes array into matrix with 10 columns and as many
                                    # rows needed
print(a, "\n")
# access any 5 consecutive elements in the matrix (start: stop: step)
print(a[0, 2:7:1]) # any 5 consecutive elements in the first row
print(a[1, 2:7:1]) # any 5 consecutive elements in the second row

[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]] 

[2 3 4 5 6]
[12 13 14 15 16]


In [81]:
# accessing all elements
print(a[:,:])
print(a[:,:].shape)

[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]]
(2, 10)
