<a href="https://colab.research.google.com/github/vinnydavies/DPIP/blob/main/python_week6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Working with numerical data using NumPy
This week we are covering [NumPy](https://numpy.org/). Python lists and tuples aren't very efficient ways of storing and using numerical data. The Python library NumPy provides useful data structures and can be used in a similar way to how vectors and matrices are used in R

## Importing NumPy
Traditionally in Python Numpy is always imported as `np`


In [None]:
import numpy as np

The reason for this is that simply readability. This is a widely adopted convention, so it is best to do the same. You may also notice some people (e.g. me) referring to 'numpy arrays' as 'np arrays' and similar.

## Simple NumPy Example
You can set up a NumPy array really easily, e.g. here is a vector and a matrix


In [None]:
vector = np.array([1,2,3])
vector

array([1, 2, 3])

and here is a matrix

In [None]:
matrix = np.array([[1,2,3],[4,5,6]])
matrix

array([[1, 2, 3],
       [4, 5, 6]])

or you can even create a NumPy array straight from a list which can be useful sometimes

In [None]:
list = [1,2,3]
nparray = np.array(list)
nparray

array([1, 2, 3])

There are also functions to create specific styles of NumPy arrays, e.g. `np.zeros`, `np.ones`, or in this case `np.arange`:

In [None]:
sequence = np.arange(10,20,2)
sequence

array([10, 12, 14, 16, 18])

You can find the shape of a NumPy, similarly to `dim` in R

In [None]:
matrix.shape

(2, 3)

and you can rearrange the matrix as well, remembering that unlike R, NumPy stores its data row-major format, e.g. `(1,2,3,4,5,6)`

In [None]:
reshaped_matrix = matrix.reshape(3,2)
reshaped_matrix

array([[1, 2],
       [3, 4],
       [5, 6]])

## Combining NumPy Arrays
We can combined NumPy arrays in a similar way to how we would in R, but we just have to be a little careful as Python doesn't necessarily act the same way as R. To start with we will look at what Python does when we have axes of the same lenght, later we will look at Broadcasting which is what happens when things don't match up!

Lets combine two example matrices to start. Here we can use `np.concatenate` to firstly do the equivilant to `rbind`

In [28]:
new_matrix = np.array([[7,8,9],[10,11,12]])
joined_matrices = np.concatenate((matrix, new_matrix), 0)
joined_matrices

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

and we can also do the equivilant of `cbind`

In [26]:
joined_matrices_v2 = np.concatenate((matrix, new_matrix), 1)
joined_matrices_v2

array([[ 1,  2,  3, 10, 11, 12],
       [ 4,  5,  6, 13, 14, 15]])

We can also join things along a new axis, e.g. to combine 2 vectors, using `np.stack`

In [27]:
np.stack((vector, vector))

array([[1, 2, 3],
       [1, 2, 3]])

## Accessing and Slicing
Remembering that Python is zero-based, we can access elements, row and columns of the NumPy arrays, just as we would in R. Let remember our joined matrix to start

In [34]:
joined_matrices

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

We can then extract an element

In [35]:
joined_matrices[1,0]

4

or we can extract a row

In [36]:
joined_matrices[1] # this would also work: joined_matrices[1,:]

array([4, 5, 6])

or a column

In [37]:
joined_matrices[:,1]

array([ 2,  5,  8, 11])

We can also select multiples at once. Be careful though, Python can spit out the results in different shapes. Remember also that when writing sequences in Python as I have below, Python does not return the last value in the series.

In [39]:
joined_matrices[:,1:2]

array([[ 2],
       [ 5],
       [ 8],
       [11]])

I can also select both columns and rows at once

In [41]:
joined_matrices[0:2,1:3]

array([[2, 3],
       [5, 6]])

## Calculations and Statistical Functions
NumPy acts more as you would expect when you attempt to do calculations on it. For instance (althought note how it changes from integars to floats when you divide)

In [44]:
vector = np.array([1,2,3])
print(vector + vector)
vector / vector


[2 4 6]


array([1., 1., 1.])

We can also do basic statistical function, e.g. `np.mean`, `np.max`, `np.sum`

In [45]:
np.mean(vector)

2.0

## Broadcasting
Broadcasting is what happens when we try and perform arithmatic on NumPy arrays that are not of the same size. I would in general urge caution with doing this kind of operation, as I would with recycling, the equivilant of broadcasting, in R. If you do do it then I highly reccommend thoroughly checking it and adding detailed comments.

However, here are some examples:

In [47]:
x1 = np.array([1,2,3])
x2 = np.array([[4,5,6],[7,8,9]])
print(x1.shape)
print(x2.shape)
x1 + x2

(3,)
(2, 3)


array([[ 5,  7,  9],
       [ 8, 10, 12]])

In [49]:
x3 = np.array([[1],
               [2]])
print(x3.shape)
print(x2.shape)
x3 + x2

(2, 1)
(2, 3)


array([[ 5,  6,  7],
       [ 9, 10, 11]])

## Linear Algrebra

Finally, we can do some linear algebra. Here we simply multiply a matrix by a diagonal matrix with 2 on the diagonal. Generally this would be a very inefficient way of of doing this, we say this has complexity $O(n^3)$

In [55]:
x = np.array([[1,2,3],[4,5,6],[7,8,9]])
d = 2*np.eye(3)
x@d

array([[ 2.,  4.,  6.],
       [ 8., 10., 12.],
       [14., 16., 18.]])

The alternative to this would be `x*2` which would be $O(n^2)$. We should also be careful not to confuse matrix multiple with `@` with scalar multiplication with `*` which will do element wise multiplication

In [56]:
x*d

array([[ 2.,  0.,  0.],
       [ 0., 10.,  0.],
       [ 0.,  0., 18.]])