# NumPy & Plotting
---
Written by Liam Thorne for SWiCS & WiE Python Data-Analysis Sessions (2023)

Python is very fast to write code in but compared to other programming languages such as C, it is very very very slow. For performant numerical operations, an external module for Python called [NumPy (Numerical Python)](https://numpy.org/) is used. To make use of external libraries, we need to first download them. To do this, we use [PIP (Package Installer for Python)])(https://pypi.org/project/pip/). PIP makes use of the [Python Package Index](https://pypi.org/) which indexes a massive number of useful libraries for Python, including NumPy. Other package managers such as [Conda](https://docs.conda.io/en/latest/) do exist but pip is the most common and simplest to use. In your command line you can enter the following command:

```bash
pip install numpy
```

Fortunately, we can run this directly from this notebook. We will also install all of the other packages we will need for the notebook here.

`Note:` You cannot do this in normal python files

In [1]:
# Normally you would write this line without the ! in your command line but we can do it in a notebook
!pip install numpy matplotlib

Defaulting to user installation because normal site-packages is not writeable
Collecting numpy
  Downloading numpy-1.24.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.3/17.3 MB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0mm
[?25hInstalling collected packages: numpy
[0mSuccessfully installed numpy-1.24.1


NumPy can now be imported into our program and used. When using functions from external packages, we use the `package.function` notation:

```python
# Imports all functions from numpy
import numpy
numpy.array([1, 2, 3])
```

You can also specify which functions to import from a package. You then don't need to use the `package.function` notation:

```python
# Imports the array function
from numpy import array
array([1, 2, 3])
```

You can also change the name of the package to make typing/reading more convenient while still protecting against multiple packages implementing a function with the same name (e.g. `numpy.array` and `other_package.array`). This is the convention for importing numpy:

```python
# imports all functions under the np namespace
import numpy as np
np.array([1, 2, 3])
```

We can now import our packages

In [4]:
import numpy as np
import matplotlib.pyplot as plt
from time import perf_counter

## Why use NumPy?

How much faster is NumPy and why is it worth it? Python lists can be any type which means they need to be checked for what operation can be applied for each element which is slow. Python lists also don't occupy a single block of memory, they can be dispersed and are ordered by a `pointer` which indicates where the next list element is. Numpy also breaks operations into chunks and performs them simultaneously which python doesn't do. The cell below multiplies each element of two arrays together. 

In [10]:
N = 10_000_000

a = b = range(N)
c = d = np.arange(N)

# Python version
start = perf_counter()
python = [a_i * b_i for a_i, b_i in zip(a, b)]
print(perf_counter() - start)

# NumPy version
start = perf_counter()
python = c * d
print(perf_counter() - start)

0.49244863300009456
0.06864063599959991


## Question 1 - ND Array

NumPy implements a version of an ND array which only allows for 1 data type but fast numerical operations. We can randomly initialise a NumPy array using [np.random.randint](https://numpy.org/doc/stable/reference/random/generated/numpy.random.randint.html).

Initialise a NumPy array with size (5, 10) and with values in the range 0-4. Call the array `a`

In [13]:
# Answer Here
a = np.random.randint(5, size=(5, 10))
a

array([[1, 0, 0, 0, 4, 1, 2, 2, 2, 3],
       [3, 4, 4, 1, 3, 2, 2, 4, 3, 0],
       [0, 1, 0, 3, 3, 3, 2, 4, 4, 4],
       [3, 2, 3, 2, 2, 2, 1, 0, 0, 3],
       [2, 2, 2, 2, 1, 0, 3, 4, 3, 1]])

## Indexing and Axes

When working with NumPy arrays, a different method of indexing to regular python lists is used. Python's slice notation is extended to allow for better specificity. For example, the following shows the difference between Python and NumPy for getting every element of the second column of elements:

In [31]:
a = [[i for i in range(5)] for _ in range(5)]
b = np.array(a)
print(b)

# Python
print([a_i[1] for a_i in a])

# NumPy
print(b[:, 1])

[[0 1 2 3 4]
 [0 1 2 3 4]
 [0 1 2 3 4]
 [0 1 2 3 4]
 [0 1 2 3 4]]
[1, 1, 1, 1, 1]
[1 1 1 1 1]


It's important to note the order of indexed elements. The first value is axis 0 (rows) and the second is axis 1 (columns). NumPy arrays can also be used as indexes in NumPy which cannot be done in Python:

In [40]:
a = list(range(50))
b = np.arange(50)

indices = [1, 20, 34, 48]

# Python
print([a[i] for i in indices])

# NumPy
print(b[indices])

[1, 20, 34, 48]
[ 1 20 34 48]


## Axes

By default, operations in numpy are treated as element wise, e.g.

$$ \sum
\begin{bmatrix}
1 & 2 & 3 \\ 
4 & 5 & 6 \\ 
7 & 8 & 9 \\ 
\end{bmatrix}
= 45
$$

But using axes, you can specify whether an operation should be performed on a specific axis, e.g. sum over axis 0

$$ \sum_{axis=0}
\begin{bmatrix}
1 & 2 & 3 \\ 
4 & 5 & 6 \\ 
7 & 8 & 9 \\ 
\end{bmatrix}
= 

\begin{bmatrix}
\sum \begin{pmatrix}1 \\ 4 \\ 7\end{pmatrix} &
\sum \begin{pmatrix}2 \\ 5 \\ 8\end{pmatrix} &
\sum \begin{pmatrix}3 \\ 6 \\ 9\end{pmatrix}
\end{bmatrix}
=

\begin{bmatrix}
12 \\ 
15 \\ 
18 \\ 
\end{bmatrix}
$$

In [51]:
a = np.random.randint(3, size=(5, 5))

print(a)
print(np.sum(a))
print(np.sum(a, axis=0))
print(np.sum(a, axis=1))

[[2 2 0 1 2]
 [1 1 0 2 0]
 [1 1 0 1 0]
 [1 0 2 1 2]
 [2 0 2 0 0]]
24
[7 4 4 5 4]
[7 4 3 6 4]


## Question - Axis Multiplication

Create a random array of size (5, 5) with values between 0 and 4, then calculate the sum over axis 0, the sum over axis 1 and multiple the two arrays together. Then multiple the resulting array by by the sum of the complete array (no specified axes).

In [53]:
# Answer Here
a = np.random.randint(5, size=(5, 5))
b = np.sum(a, axis=0)
c = np.sum(a, axis=1)

print(b * c * np.sum(a))

[4186 3680 1104 2576 9016]


## Array Modification

Often, we need to filter arrays based on the values of entries. For this we can use [np.where](https://numpy.org/doc/stable/reference/generated/numpy.where.html)

In [None]:
a = np.random.randint()

## Question - Filtering

Filter the array so that values smaller than 5 are set to 0 and values larger than are kept the same:

$$
x = 
\begin{cases}
x & \text{} \\
x(n-1)\\
x(n-1)
\end{cases}
$$