# NumPy

## Useful information

[PyPl](https://pypi.org/project/numpy/)   
[Website](https://matplotlib.org/)  
[Documentation](https://numpy.org/doc/stable/reference)  
[Quickstart](https://numpy.org/doc/stable/user/quickstart.html)  
[Cheatsheet](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf)
  
<img src="https://techscript24.com/wp-content/uploads/2020/10/86498201-a8bd8680-bd39-11ea-9d08-66b610a8dc01.png)" width=200 height=200 />

## Table of Contents

  * [Introduction](#introduction)
  * [The Basics of NumPy Arrays](#the-basics-of-numpy-arrays)
    + [NumPy for Matlab users](#numpy-for-matlab-users)
    + [The ndarray data structure](#the-ndarray-data-structure)
    + [Datatypes](#datatypes)
    + [Shape and Dimensions](#shape-and-dimensions)
    + [Initial placeholders](#initial-placeholders)
  * [Array indexing](#array-indexing)
    + [Subsetting](#subsetting)
    + [Slicing](#slicing)
    + [Boolean indexing](#boolean-indexing)
  * [Array mathematics](#array-mathematics)
    + [Arithmetic operations](#arithmetic-operations)
      - [Addition](#addition)
      - [Subtraction](#subtraction)
      - [Multiplication](#multiplication)
      - [Division](#division)
      - [Squareroot](#squareroot)
    + [Product operations](#product-operations)
      - [Inner product](#inner-product)
      - [Cross product](#cross-product)
    + [Aggregations](#aggregations)
    + [Boolean Logic](#boolean-logic)
  * [Array manipulation](#array-manipulation)
    + [Reshaping arrays](#reshaping-arrays)
    + [Adding/removing elements](#adding-removing-elements)
    + [Combining arrays](#combining-arrays)
      - [Stacking](#stacking)
      - [Tiling](#tiling)
    + [Splitting arrays](#splitting-arrays)
  * [Broadcasting](#broadcasting)
    + [Rules](#rules)
  * [Exercises](#exercises)
  * [Challenge](#challenge)

## Introduction

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays, in a faster and more compact way than Python lists.

To start using it, just import in Python, for example, like this.

In [1]:
import numpy as np

## The Basics of NumPy Arrays 

### NumPy for Matlab users

MATLAB and NumPy have a lot in common. But there are many differences. NumPy was created to do numerical and scientific computing in the most natural way with Python, not to be a MATLAB clone.

However, it might be useful to draw a parallel line between these two, if you are already familiar with MATLAB matrix structures and methods. So, we advise you to consult the key differences and similataries between MATLAB and NumPy [here](https://numpy.org/doc/stable/user/numpy-for-matlab-users.html).

### The ndarray data structure

The core functionality of NumPy is its `ndarray`, for *n*-dimensional array, data structure. These arrays are strided views on memory. 

In contrast to Python's built-in list data structure, these arrays are homogeneously typed: all elements of a single array must be of the same type.

In [2]:
a = np.array([1, 2, 3])   # Create a NumPy array

print(type(a))           
print(a)  

<class 'numpy.ndarray'>
[1 2 3]


### Datatypes

As mentioned above, NumPy arrays are a grid of elements of the same type. So it provides a large set of numeric datatypes that you can use to construct arrays. 

NumPy tries to guess a datatype when you create an array, but functions that construct arrays usually also include an optional argument to explicitly specify the datatype.

Since NumPy is mostly written in C, these types map to C numeric types. You can check them all [here](https://numpy.org/devdocs/user/basics.types.html).

In [3]:
import numpy as np

x = np.array([1, 2])                   # Let numpy choose the datatype
print(x.dtype)         

x = np.array([1.0, 2.0])               # Let numpy choose the datatype
print(x.dtype)             

x = np.array([1, 2], dtype=np.int64)   # Force a particular datatype
print(x.dtype)                        

int32
float64
int64


### Shape and Dimensions

The shape of an array is a tuple of integers giving the size of the array along each dimension. The number of dimensions of an array its called its rank.

<img src="https://fgnt.github.io/python_crashkurs_doc/_images/numpy_array_t.png" width=400/>

In [4]:
a = np.array([1, 2, 3, 4])   # Rank 1 array   

print(a.shape)
 
b = np.array([               # Rank 2 array  
    [5.2, 3.0, 4.5], 
    [9.1, 0.1, 0.3]
])    

print(b.shape)

(4,)
(2, 3)


A minimum dimension might even be specified when creating an array.

In [5]:
a = np.array([1, 2, 3, 4], ndmin=5)

print(a)
print(a.shape)

[[[[[1 2 3 4]]]]]
(1, 1, 1, 1, 4)


### Initial placeholders

Sometimes we just want to create arrays filled with placeholder values, looking for an easier way to declare arrays of a given size with ease and without the trouble of filling them manually.

In [6]:
np.zeros([3, 4])           # Creates an array filled with zeros
np.ones([3, 4])            # Creates an array filled with ones
np.arange(10, 26, 5)       # Creates an array of evenly spaced values (min, max, step)
np.linspace(0, 2, 9)       # Creates an array of evenly spaced values (min, max, size)
np.full((2, 2), 7)         # Creates an array filled with a constant value
np.random.random((2, 2))   # Creates an array filled with random values
np.empty((3, 2));          # Creates an empty array

## Array indexing

Similar to Python lists, NumPy arrays can be sliced and indexed. 

In [7]:
a = np.array([1, 2, 3])

b = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

### Subsetting

In a one-dimensional array, the ith value (counting from zero) can be accessed by specifying the desired index in square brackets, just as with Python lists.

In [8]:
a[2]

3

In [9]:
a[-1]

3

In a multi-dimensional array, items can be accessed using a comma-separated tuple of indices.

In [10]:
b[1,2]

6

### Slicing

The NumPy slicing syntax follows that of the standard Python list. To access a slice of an array x, the following syntax is used

`x[start:stop:step]`

Since arrays may be multidimensional, you must specify a slice for each dimension of the array.

In [11]:
a[0:2]  

array([1, 2])

In [12]:
b[0:2, 1]

array([2, 5])

A potentially confusing case is when the step value is negative. In this case, the defaults for `start` and `stop` are swapped. This becomes a convenient way to reverse an array.

In [13]:
a[::-1]

array([3, 2, 1])

One must be careful when changing the content of an array, since when we assign an array to a variable, we're assigning its reference.

In [14]:
c = b[1]

print(b)

[[1 2 3]
 [4 5 6]]


In [15]:
c[0] = 10

print(b)

[[ 1  2  3]
 [10  5  6]]


In this case, changing a slice of the array `c` will also affect `b` since they share the same memory reference.

To avoid this situation, we can create a copy of an array using the `copy` method.

In [16]:
c = b[1].copy()

c[0] = 4

print(b)

[[ 1  2  3]
 [10  5  6]]


### Boolean indexing

Boolean array indexing lets you pick out arbitrary elements of an array. Frequently this type of indexing is used to select the elements of an array that satisfy some condition.

In [17]:
a[a < 2]

array([1])

In [18]:
b[b % 2 == 0]

array([ 2, 10,  6])

## Array mathematics

### Arithmetic operations

Basic mathematical functions operate elementwise on arrays, and are available both as operator overloads and as functions in the NumPy module:

In [19]:
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

print(x)
print(y)

[[1. 2.]
 [3. 4.]]
[[5. 6.]
 [7. 8.]]


#### Addition

In [20]:
print(x + y)
print(np.add(x, y))

[[ 6.  8.]
 [10. 12.]]
[[ 6.  8.]
 [10. 12.]]


#### Subtraction

In [21]:
print(x - y)
print(np.subtract(x, y))

[[-4. -4.]
 [-4. -4.]]
[[-4. -4.]
 [-4. -4.]]


#### Multiplication

In [22]:
print(x * y)
print(np.multiply(x, y))

[[ 5. 12.]
 [21. 32.]]
[[ 5. 12.]
 [21. 32.]]


#### Division

In [23]:
print(x / y)
print(np.divide(x, y))

[[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]


#### Squareroot

In [24]:
print(np.sqrt(x))

[[1.         1.41421356]
 [1.73205081 2.        ]]


### Product operations

In [25]:
x = np.array([[1, 2],[3, 4]])
y = np.array([[5, 6],[7, 8]])

print(x)
print(y)

[[1 2]
 [3 4]]
[[5 6]
 [7 8]]


In [26]:
v = np.array([9, 10])
w = np.array([11, 12])

print(v)
print(w)

[ 9 10]
[11 12]


#### Inner product

Inner product of vectors, produces a scalar.

In [27]:
print(v.dot(w))
print(np.dot(v, w))

219
219


Product of matrix and vector, produces a vector .

In [28]:
print(x.dot(v))
print(np.dot(x, v))

[29 67]
[29 67]


Product of two matrixes, produces a matrix.

In [29]:
print(x.dot(y))
print(np.dot(x, y))

[[19 22]
 [43 50]]
[[19 22]
 [43 50]]


#### Cross product


In [30]:
x = [1, 2, 3]
y = [4, 5, 6]

print(np.cross(x, y))

[-3  6 -3]


### Aggregations

Often when faced with a large amount of data, a first step is to compute summary statistics for the data in question.

NumPy has fast built-in aggregation functions for working on arrays to help with this process.

`np.prod` - Compute product of elements  
`np.mean` - Compute mean of elements  
`np.std` - Compute standard deviation  
`np.var` - Compute variance   
`np.min` / `np.max`- Find minimum/maximum value  
`np.argmin` / `np.argmax` - Find index of minimum/maximum value  
`np.median` - Compute median of elements  
`np.percentile` - Compute rank-based statistics of elements  
`np.any` - Evaluate whether any elements are true  
`np.all` - Evaluate whether all elements are true  

### Boolean Logic

In [45]:
x = np.array([[1, 2, 3], [4, 5, 6]])
x < 3

array([[ True,  True, False],
       [False, False, False]])

In [40]:
np.where(x < 3)

(array([0, 0], dtype=int32), array([0, 1], dtype=int32))

In [43]:
np.nonzero(x < 3)

(array([0, 0], dtype=int32), array([0, 1], dtype=int32))

In [42]:
np.argwhere(x < 3)

array([[0, 0],
       [0, 1]], dtype=int32)

## Array manipulation

In [None]:
i = np.array([[1, 2, 3], [1, 2, 3]])
i

### Reshaping arrays

To obtain the transpose of a matrix array we use the `transpose` method.

In [None]:
np.transpose(i)
i.T

To flatten an array we can use the `ravel` method.

In [None]:
i.ravel()

We can also reshape the array into any shape without changing its data.

In [None]:
i.reshape(3, -2)    

### Adding/removing elements

To add, remove or insert elements in arrays, NumPy follows the Python syntax for lists.

In [None]:
j = np.array([4, 5, 6])

np.append(i, j)  # Append items to an array

In [None]:
np.insert(i, 3, 5)   # Insert items in an array

In [None]:
np.delete(i, np.array(1))   # Delete items from an array

### Combining arrays

To concatenate any number of arrays together, the `concatenate` method is used. 

In [None]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
z = np.array([4, 4, 4])

np.concatenate([x, y, z])

As long as the arrays have the same dimension, we can concatenate them.

In [None]:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])

np.concatenate([grid, grid])

#### Stacking

For working with arrays of mixed dimensions, it can be clearer to use the `vstack` (row-wise) and `hstack` (collumn-wise) functions.

In [None]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

# Vertically stack the arrays 
np.vstack([x, grid])

In [None]:
y = np.array([[99],
              [99]])

# Horizontally stack the arrays 
np.hstack([grid, y])

#### Tiling

If we're stacking arrays with themselves, using the `tile` function provides an easier and more practical way to do it.

In [None]:
a = np.array([0, 1, 2])
a

In [None]:
np.tile(a, 2)

In [None]:
np.tile(a, (2, 2))

In [None]:
np.tile(a, (2, 1, 2))

### Splitting arrays

The opposite of concatenation is splitting, which is implemented by the functions `split`, `hsplit`, and `vsplit`. For each of these, we can pass a list of indices giving the split points.

In [None]:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])

print(x1, x2, x3)

In [None]:
grid = np.arange(16).reshape((4, 4))
print(grid)

In [None]:
upper, lower = np.vsplit(grid, [2])

print(upper)
print(lower)

In [None]:
left, right = np.hsplit(grid, [2])

print(left)
print(right)

## Broadcasting

Broadcasting is a powerful mechanism that allows NumPy to work with arrays of different shapes (sizes) when performing arithmetic operations. Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array.

For example, suppose that we want to add a constant vector to each row of a matrix.

In [None]:
# We will add the vector v to each row of the matrix x, storing the result in the matrix y

x = np.array([
    [1, 2, 3], 
    [4, 5, 6],
    [7, 8, 9],
    [10, 11, 12]
])
v = np.array([1, 0, 1])
y = np.empty_like(x)   # Create an empty matrix with the same shape as x

# Add the vector v to each row of the matrix x with an explicit loop
for i in range(4):
    y[i, :] = x[i, :] + v

print(y)

This works, however when the matrix `x` is very large, computing an explicit loop in Python could be slow. 
Note that adding the vector `v` to each row of the matrix `x` is equivalent to forming a matrix `vv` by stacking multiple copies of `v` 
vertically, then performing elementwise summation of x and vv. We could implement this approach like this:

In [None]:
# We will add the vector v to each row of the matrix x, storing the result in the matrix y

vv = np.tile(v, (4, 1))   # Stack 4 copies of v on top of each other

print(vv)                

In [None]:
y = x + vv  # Add x and vv elementwise

print(y)

NumPy broadcasting allows us to perform this computation without actually creating multiple copies of `v`. 

In [None]:
y = x + v  # Add v to each row of x using broadcasting
print(y)

### Rules

Broadcasting two arrays together follows these rules

1. If the arrays do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.
2. The two arrays are said to be compatible in a dimension if they have the same size in the dimension, or if one of the arrays has size 1 in that dimension.
3. The arrays can be broadcast together if they are compatible in all dimensions.
4. After broadcasting, each array behaves as if it had shape equal to the elementwise maximum of shapes of the two input arrays.
5. In any dimension where one array had size 1 and the other array had size greater than 1, the first array behaves as if it were copied along that dimension

## Exercises

These are some exercises we propose for you to practice your NumPy's library syntax. Almost all of them can be done using ony one line.

You can find the correspondent solutions [here](Solutions.ipynb).

1. Create a vector of length 10 with values evenly distributed between 5 and 50.
2. Create an array of shape (3, 2) filled with the number 5.
3. Convert the values of Fahrenheit degrees stored in `[0, 12, 45, 34, 99]` into Centigrade degrees. 
4. Get the indices of the elements that are bigger than 10 in `[[0, 10, 20], [20, 30, 40]]`.
5. Change the shape of `[1, 2, 3, 4, 5, 6]` to (3, 2) and select its first two rows.
6. Calculate the inner product of `[4, 5]` and `[7, 10]`.
7. Calculate the normalized vector perpendicular to `[1, 5, 3]` and `[8, 4, 2]`.
8. Subtract `[1, 0, 3]` from each row of the array `[[0, 0, 0], [1, 2, 3], [4, 5, 6]]` using broadcasting.

## Challenge

1. Suppose you have the following vectors `[1, 2, 3]` and `[8, 0, 4]`.
2. Calculate a vector perpendicular to them.
3. Rotate it by 45º degrees around the X axis.
4. Scale it by 150% in the Z axis.
5. Translate it by 10 units in the Y axis.
6. Check what's the biggest coordinate value of the resulting vector. 
7. Create a vector with that same number of evenly spaced values (between 0 and 100) and sum the first half of them.