# Python for (open) Neuroscience

_Lecture 1.0_ - Introduction to `numpy`

Luigi Petrucco

Jean-Charles Mariani

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vigji/python-cimec/blob/main/lectures/Lecture1.0_Numpy.ipynb)

## A note on searching info

Google every doubt that you have! [stackoverflow](https://stackoverflow.com) is a great source of information

For simple doubts and introductory explanations on `numpy` ChatGPT is an excellent source! 

## Working with libraries (/packages/modules)

We can import external libraries using `import`:

In [None]:
import numpy
numpy.array

We can give aliases to the library we import for the sake of brevity:

In [None]:
import numpy as np
np.array

We can also import specific functions (or classes) from a library with this syntax:

In [None]:
from numpy import array
array

## The `numpy` library

High performance number crunching with Python 

C-compiled libraries make it very efficient

In [None]:
N_ELEMENTS = 1000000

In [None]:
%%timeit
a_list = list(range(N_ELEMENTS))
mean = sum(a_list) / len(a_list)  # mean

mean_subtracted = [element - mean for element in a_list]  # subtract mean

In [None]:
%%timeit
an_array = np.arange(N_ELEMENTS)
mean = np.mean(an_array)  # mean

mean_subtracted = an_array - mean  # subtract mean

### A small note on credit & citations

Many forget to do so, but remember: there's scientists behind many open-source tools, and citations are the way we can reward them in academic currency!

    Harris, C.R., Millman, K.J., van der Walt, S.J. et al. Array programming with NumPy. Nature 585, 357–362 (2020). DOI: 10.1038/s41586-020-2649-2

Let's import the library!

In [None]:
import numpy as np

## `np.ndarray`

Data type representing N-dimensional arrays

Workhorse of scientific computing!

A note on language: when we say `array` we do not imply any number of dimensions:
- <span style="color:indianred">vectors</span> will be 1D arrays
- <span style="color:indianred">matrices</span> will be 2D arrays
- for n>2, I will use <span style="color:indianred">n-dimensional matrix</span>, or <span style="color:indianred">stack</span> (if we talk about imaging data/other kind of stacked data)

Related note: there is a `np.matrix` class in `numpy`, but you **should not** use it; it has not been adopted much and  it might be removed soon!

`np.ndarray` is a powerful data storing structure:

- it addresses memory allocation efficiency issues

- it gives powerful indexing functionalities

- implements vectors/matrices algebric operations

## creating arrays

### Initialize empty arrays

We can create a simple array of zeros using the `np.zeros` function, and passing it:


A single integer n to create a 1D vector of length n:

In [None]:
np.zeros(3)  # we pass a single integer for a 1D array

A tuple of numbers indicating the size for every dimension for more dimensions

In [None]:
np.zeros((3,2))  # we pass the tuple (3, 2) to have a 3 x 2 matrix

The `np.ones` function works in the same way but creates a matrix of ones:

In [None]:
np.ones((3,2))

We can inizialize a matrix with arbitrary values using `np.full`:

In [None]:
np.full((2,3), np.nan)

In numpy arrays, **all elements must be of the same type**! (This is important to make arrays efficient)

By default, `np.nparray`s will be initialized with `float` values:

In [None]:
np.ones((2,3))

To specify the data type of our array, we can pass the `dtype` argument (for data type). For example, we can make it `int`:

In [None]:
np.ones((2,3), dtype=int)

To change the data type of an existing array, we can use the `.asdtype(new_type)` method:

In [None]:
my_arr = np.ones((2,3))
my_arr.astype(int)

We can also use some special data type from numpy, for memory saving purposes:
 - `np.uint8` (numbers from 0 to 255)
 - `np.int8` (numbers from -128 to 127)
 - `np.uint16` (numbers from 0 to 65535)
 - `np.int16` (numbers from -32768 to 32767)
 
The number (8 or 16) represents the number of **bits** used for every entry in the array!

In [None]:
import sys
a_python_int_array = np.ones((200, 300))

sys.getsizeof(a_python_int_array)  # we ask for the size of the array in memory

In [None]:
a_uint8_array = np.ones((200, 300), dtype=np.uint8)

sys.getsizeof(a_uint8_array)  # the np.uint8 type is much more efficient!

We can also make arrays of text! in this case, the dtype will be `'<U[n]'` (U for Unicode, and n will be the number of characters in the longest entry in the matrix):

In [None]:
txt = np.full((3,2), "some text")

txt  # look at the dtype! the number will be the number of characters:

### Useful attributes of `np.ndarray` objects

Being objects, `np.ndarray`s have attributes that can be useful to check out their properties.

### `.shape`

The `.shape` attribute gives us the shape (the number of elements along each dimension of the array):

In [None]:
my_array = np.zeros((4,3,5))

my_array.shape

### `.dtype`

The `.dtype` attribute gives the type of the elements in the array:

In [None]:
my_array = np.full((4,3,5), "a stri")
my_array.dtype

### `.size`

Do not confuse `.shape` with `.size`! `.size` gives the numbers of elements in the whole matrix:

In [None]:
my_array = np.full((4,3,5), 3)
my_array.size

### Convert lists to arrays

One way of creating an array is to convert an existing list into an array with the `np.array()` function:

In [None]:
my_list = [1,2,3,4]
np.array(my_list)

Converting lists of lists will add more dimensions:

In [None]:
my_list = [[1,2,3,4], [2,3,4,5]]
np.array(my_list)

### Ordered sequences

We can create ordered sequences of numbers using `np.arange()`:

In [None]:
np.arange(10)  # numbers from 0 to 9

We can optionally specify start, end, and step of the sequence (start and steps are optional):

In [None]:
np.arange(1, 10, 2)  # numbers from 1 to 10 in steps of 2:

Alternatively, we can use `np.linspace()` to generate `num` equally spaced numbers in a specified range:

In [None]:
np.linspace(0, 10, num=5)

### Random arrays

We can use the `np.random` module to create random arrays. For example:

In [None]:
np.random.randint(1, 2, (3,2))  # we pass min, max, and desired shape of the random array

(Practicals 1.0.0)

## Visualize arrays and matrices

We can visualize arrays and matrices using the `matplotlib.pyplot` library.

In [None]:
from matplotlib import pyplot as plt  # code you'll write many times in Python...

We can plot 1D arrays (or lists!) using `plt.plot()`:

In [None]:
random_vect = np.random.randint(0, 100, 200)
plt.plot(random_vect)

We can visualize 2D matrices with `plt.matshow()`

In [None]:
random_mat = np.random.randint(0, 255, (100, 200))

plt.matshow(random_mat)
plt.colorbar()

There's much more to the matplotlib library, we'll discover more things as we go!

## Indexing arrays

We can index ("slice") arrays as we were doing with lists.

For a 1D array:

In [None]:
my_vect = np.arange(0, 10)
print(my_vect[:5])  # first 5
print(my_vect[-3:])  # last 3
print(my_vect[:6:2])  # first 6, one every two

But! With `np.array`s we have more flexibility than with lists!

### Indexing by index numbers

We can index passing an array (or a list) of the index values that we want to retrieve!

In [None]:
my_vect = np.random.normal(0, 10, (10))
print(my_vect)
my_vect[[1,2,6]]

### Boolean indexing

Alternatively, we can use arrays of boolean values (with the same shape of the `np.array`)

In [None]:
my_vect = np.random.normal(0, 10, (10))
thr = 5


my_vect[my_vect > thr]  # will select only values above 5

### Indexing for multiple dimensions

We can index over multiple dimensions specifying **comma-separated** indexes along each dimension:

In [None]:
my_mat = np.array([[1,   2,  3,  4, 5],
                   [6,   7,  8,  9, 10],
                   [11, 12, 13, 14, 15]])

my_mat[2, :]

We can use boolean selectors for an axis, as long as the dimension matches:

In [None]:
selector = np.array([True, False, False])
my_mat[selector, :]

In this way, we can check for boolean conditions on the matrix:

In [None]:
my_mat = np.array([[1,   2,  3,  4, 5],
                   [6,   7,  8,  9, 10],
                   [11, 12, 13, 14, 15]])

my_mat[my_mat > 10] = 100

my_mat

this will change the shape of the output!

### Mind singleton dimensions!

Arrays can have singleton dimensions - _i.e._ dimensions along which there is a single entry:

In [None]:
my_arr = np.zeros(4)  # this is a 1D array

print(f"{my_arr}; shape: {my_arr.shape}")

In [None]:
my_arr = np.zeros((1,4))  # this is a 2D array with a singleton dimension!
print(f"{my_arr}; shape: {my_arr.shape}")

Sometimes, it can be useful to quickly add a singleton dimension

In [None]:
my_arr = np.zeros(4)  # 1D array
my_arr = my_arr[None,:]  # With None indexing over one dimension, we artificially add a singleton dimension:

print(f"{my_arr}; shape: {my_arr.shape}")

## array views

Indexing operations return views on the original arrays, NOT COPIES! Changing values in the slice we will also alter the original array!

In [None]:
my_mat = np.array([[1,   2,  3,  4, 5],
                   [6,   7,  8,  9, 10],
                   [11, 12, 13, 14, 15]])

a_slice = my_mat[2, :]

a_slice[0] = 100

my_mat

(Practicals 1.0.1)

## Transforming and combining arrays

### `.T`

We can have a transposed view of a matrix with the `.T` attribute (this will reverse the dimensions order if `n_dims>2`):

In [None]:
m = np.ones((3,2))
m_t = m.transpose()

print(m.shape, m_t.shape)

### `.flatten()`

We can flatten all values of an N-dimensional array into a 1D array with the `.flatten()` syntax. This will make a copy of the array!

In [None]:
m = np.ones((3,2,1))
m_flat = m.flatten()

print(m.shape, m_flat.shape)

### `np.concatenate()`

We can concatenate arrays along any dimension by putting them in a list and pass the list to the `np.concatenate()` function:

In [None]:
arr_list = [np.zeros(3), np.ones(3)]

np.concatenate(arr_list) 

By default, we concatenate over the first dimension:

In [None]:
arr_list = [np.zeros((3,2)), np.ones((3,2))]

np.concatenate(arr_list)  # if ndims > 1 by default we concatenate over the first dimension

but we can pass an `axis` argument to change the default behavior:

In [None]:
arr_list = [np.zeros((3,2)), np.ones((3,2))]

np.concatenate(arr_list, axis=1) 

### `np.stack()`

We can pile up arrays over a new dimension with  `np.stack()`:

In [None]:
arr_list = [np.zeros(3), np.ones(3)]

np.stack(arr_list)

## Array operations

We obviously want to do some operations with those arrays!

### Operations with numbers

Operations with arrays are **by default element-wise**!

Sum / subtraction / multiplication / division apply to individual entries of the array:

In [None]:
np.ones(3) + 1

In [None]:
my_arr = np.ones((4,3))
my_arr[0, :] *= 100
my_arr

Exponentiation also works element-wise:

In [None]:
np.array((1,2,3))**3

### Operations between arrays

`numpy` works element-wise also when operating between arrays:

In [None]:
arr_1 = np.array([[1,2],
                  [3,4]])

arr_2 = np.array([[0,0],
                  [0,2]])

arr_1 * arr_2

In [None]:
arr_1 ** arr_2

Therefore, we normally expect arrays of matching shapes, or we get a `ValueError`!

In [None]:
np.ones((2, 3)) * np.ones((4, 5))

### Broadcasting

`numpy` has a smart way of dealing with some scenarios of non-matching dimensions, and we should use it!

Can be a bit tricky at the beginning, but it is very important: we can write very efficient and readable code with it!

In [None]:
# Assume we have a matrix of data:
a = np.array([[ 0.0,  0.0,  0.0],
               [10.0, 10.0, 10.0],
               [20.0, 20.0, 20.0],
               [30.0, 30.0, 30.0]])

In [None]:
b = np.array([1.0, 2.0, 3.0])  # we want to add an offset from each column
a + b

### What is happening?

Numpy automatically infer missing values to create arrays of matching shape, where it can the operate element-wise!

![Alt Text](https://numpy.org/doc/stable/_images/broadcasting_2.png)

## How does broadcasting work

When operating on two arrays, NumPy compares their shapes. It starts **with the trailing** (i.e. rightmost) dimension and works its way left. Two dimensions are compatible when:

 - they are equal, or
 - one of them is 1.

In our case:

In [None]:
print(f"shape a: {a.shape}")
print(f"shape b: {b.shape}")

Shape b matches shape a over the last dimension, and is propagated over the rest of the dimensions

For example, this operation will not work!

In [None]:
a = np.ones((5,4,2))

b = np.ones((4,3))

a+b