# Numpy

Numpy is a library to perform fast operations with arrays, which are data structures that can represent vectors, matrices, or multi-matrices (don't confuse them with the concept of array in other programming languages). 

Numpy arrays can be easily created from lists, <i>as long as all the elements have the same data type</i>.
To turn a list into an array, you can use `np.array()` or `np.asarray()`.

In [None]:
import numpy as np

## 1D Arrays (a.k.a. vectors)

In [None]:
x1 = [3.14, 2.71, 42] # this is a list
x1 = np.array(x1) # this is a Numpy array

x2 = np.asarray([1729, 1.41, 1.62]) # this is also a Numpy array

### Slicing

In [None]:
print(x2[1:])

### Reversing

In [None]:
print(x1[::-1])

### Element-wise operations between 1D arrays

In [None]:
print("Element-wise sum:", x1 + x2)
print("Element-wise product:", x1 * x2)
print("Dot product:", x1 @ x2)
print("Sum by a scalar:", x1 + 2)

### Operations with scalars

In [None]:
print("Product by a scalar:", x1 * 2)
print("Power by a scalar:", x1 ** 2)

### Reduction operations

In [None]:
print("Sum of all elements of an array:", x1.sum())
print("Product of all elements of an array:", x1.prod())

<b>IMPORTANT</b>: Be careful not confusing lists with Numpy arrays.

In [None]:
print('If you "sum" two lists, this is what will happen:', [3.14, 2.71, 42] + [1729, 1.41, 1.62]) # concatenation
print('If you "multiply" an integer by a list, this is what will happen:', [3.14, 2.71, 42]*2) # repetition

## 2D Arrays (a.k.a. matrices)

In [None]:
x = [
    [3.14, 2.71, 42],
    [17.29, 1.41, 1.62]
]

x = np.asarray(x)

print("Shape of a 2D array:", x.shape)
print("First dimension:", x.shape[0])
print("Second dimension:", x.shape[1])

### Matrix slicing

In [None]:
print("First row:")
print(x[0])

In [None]:
print("Last two columns:")
print(x[:, :-2])

### Matrix transposition

In [None]:
y = x.T # transposition

print("Shape of transposed array:", y.shape)
print("Transposed array:")
print(y)

### Reduction operations

In [None]:
print("Sum of all elements of a 2D array:", x.sum())
print("Product of all elements of a 2D array:", x.prod())

In [None]:
print("Sum of all elements of a 2D array across one axis:", x.sum(axis=0))
print("Product of all elements of a 2D array across one axis:", x.prod(axis=1))

### Notable Numpy arrays

- <b>Range vector</b>: 1D array covering a range of integer numbers. It is characterized by three main parameters: `start`, `stop`, and `step`.

In [None]:
x = np.arange(5) # [0, 1, 2, 3, 4]
y = np.arange(2, 5) # [2, 3, 4]
z = np.arange(0, 10, 2) # [0, 2, 4, 6, 8]

print(x)
print(y)
print(z)

- <b>Identity matrix</b>: $n\times n$ matrix with ones on the diagonal, and zeros elsewhere. 
You need to specify the dimension ($n$). 

In [None]:
x = np.eye(3)
print(x)

- <b>Array of ones</b>: Array of all ones. You need to specify the shape. If you pass an integer $n$, the shape will be $(n,)$.

Example: $2\times 4$ matrix of ones.

In [None]:
x = np.ones((2, 4)) 
print(x)

- <b>Array of zeros</b>: Array of all zeros. You need to specify the shape. If you pass an integer $n$, the shape will be $(n,)$.

Example: vector of $4$ zeros.

In [None]:
y = np.zeros(4)
print(y)

### Concatenating arrays

Numpy allows you to concatenate array along one axis, as long as they have the same dimensions on the others.

This will be helpful in two main cases:

1) You have multiple data sources, and you want to join them into a single dataset. In this case, you typically concatenate along the first axis (`axis=0`).

2) You want to add one extra column to your data. In this case, you typically concatenate along the last axis (`axis=-1`).

Here is an example of the second case.

In [None]:
x = np.ones((4, 2)) 
y = np.zeros((4, 1))

z = np.concatenate([x, y], axis=-1)
print(f"Concatenated array of shape {z.shape}")
print(z)

### Reshaping arrays

You can change the shape of an array, as long as it is consistent with its size (i.e., the total number of elements).

This operation is called "reshaping".

For example, you can reshape an array with shape $(8,)$ into one with shape $(2, 4)$. 

In [None]:
x = np.arange(8)
x = x.reshape((2, 4))

print(x)

## 3D Arrays and Beyond

In many deep learning applications you will have to deal with datasets having dimensionality 3 or higher.

Examples:
- Time series datasets (e.g., sensor data) have shape `(n_datapoints, sequence_len, n_features)`.
- Image datasets have shape `(n_datapoints, height, width)` for greyscale images, and `(n_datapoints, height, width, n_channels)` for color images.

In [None]:
x = np.arange(8*10*3)
x = x.reshape(8, 10, 3)

print("Shape of x:", x.shape)

### 2D vs 3D+ matrix multiplication

2D matrix multiplication is straightforward: you have two matrices: $x$ of shape $(m, k)$ and $y$ of shape $(k, n)$, and they get multiplied row-by-column.

The result is a matrix of shape $(m, n)$.

In [None]:
x = np.arange(12).reshape((4, 3))
y = np.arange(15).reshape((3, 5))
z = x @ y

print("Shape of x:", x.shape)
print("Shape of y:", y.shape)
print("Shape of x @ y:", z.shape)
print("Result:")
print(z)

However, what does it mean to multiply two 3D or 4D arrays? 
In Numpy, this operation is done by treating multidimensional arrays as collections of 2D arrays.
In other words, it is like doing several matrix multiplications in parallel.
Multidimensional arrays are multiplied along the last two axes, meaning that the dimensions that should match are the last two.

For example, suppose you have an array $x$ of shape $(8, 4, 3)$ and another array $y$ of shape $(8, 3, 5)$.
If you multiply them, it will be like multiplying $8$ different matrix pairs of shapes $(4, 3)$ and $(3, 5)$.

In [None]:
x = np.arange(8*12).reshape((8, 4, 3))
y = np.arange(8*15).reshape((8, 3, 5))
z = x @ y

print("Shape of x:", x.shape)
print("Shape of y:", y.shape)
print("Shape of z:", z.shape)

## Random number generation with Numpy

In [None]:
# seed for pseudo-random number generation (for reproducibility) 
np.random.seed(42)

In [None]:
# Generate random integers uniformly between low and high
x_int = np.random.randint(low=-2, high=10, size=20)
print(x_int)

In [None]:
# Generate float numbers uniformly in the interval [a, b]
a = 3
b = 10
num_points = 10000

x_unif = (b-a)*np.random.rand(num_points) + a  


In [None]:
# generate normally-distributed data with mean mu=3.14 and std sigma=2.71
mu = 3.14
sigma = 2.71
num_points = 10000

x_norm = sigma*np.random.randn(num_points) + mu


In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.figure()
plt.hist(x_unif, bins=10)
plt.draw()

In [None]:
plt.figure()
plt.hist(x_norm, bins=20, label='data')
plt.vlines([mu-sigma, mu+sigma], 0, 2000, linestyles='--', colors=['green', 'green'], label='67% CI')
plt.vlines([mu-2*sigma, mu+2*sigma], 0, 2000, linestyles='--', colors=['orange', 'orange'], label='96% CI')
plt.vlines([mu-3*sigma, mu+3*sigma], 0, 2000, linestyles='--', colors=['red', 'red'], label='99% CI')
plt.legend()
plt.draw()