# Nummerical Python: `NumPy`

## Programming and Data Management (EDI 3400)

### *Vegard H. Larsen (Department of Data Science and Analytics)*

# 1. Introduction to NumPy

NumPy, which stands for "Numerical Python", is a foundational library in the Python ecosystem that provides a powerful array object and an assortment of mathematical functions to effortlessly perform complex numerical computations. At its heart lies the `ndarray` or n-dimensional array, a versatile data structure that facilitates efficient storage and operations on large datasets. NumPy's capabilities are particularly tailored for mathematical tasks like linear algebra, statistical analysis, and matrix manipulations, making it an essential tool for anyone venturing into data analysis, scientific computing, or machine learning in Python. Getting acquainted with NumPy can greatly enhance their ability to handle and analyze numerical data, bridging the gap between basic programming and more advanced computational tasks.

## What is NumPy?

- A library that supports large, multi-dimensional **numerical** arrays and matrices, and a large collection of functions for these objects
- NumPy is a cornerstone in the scientific Python community, and most other scientific packages rely on NumPy to work
- Most types of data we will ever encounter can be represented as numerical arrays and matrices, and that includes
    * Collections of documents
    * Collections of images and videos
    * Collections of sound clips
- NumPy arrays provides `list` like functionality but NumPy is much faster, more scalable and more efficient than a regular `list`

## Importing NumPy

In [None]:
import numpy
numpy.__version__

In [None]:
# The standard convention for importing NumPy is by using the keyword np

import numpy as np

In [None]:
# Then we can access the functionality of the library through this keyword.  
# We can create an NumPy array from a regular list as follows:

x = np.array([1, 2, 3])
print(x)

In [None]:
# We now have a new type of container

type(x)

## Speed test: NumPy vs. Python list

In [None]:
import numpy as np

python_list = list(range(10_000_000)) # [0, 1, 2, ..., 9 999 999]
numpy_array = np.arange(10_000_000)   # np.array([0, 1, 2, ..., 9 999 999])

- Let's multiply all the numbers by 5 and check how long it takes

In [None]:
%timeit python_list5 = [x * 5 for x in python_list]

In [None]:
%timeit numpy_array5 = numpy_array*5

# 2. Working with NumPy arrays

## The NumPy Array Object (`ndarray`)

- A multidimensional container of items of the **same type**
- A ndarray can be accessed and modified by indexing or slicing such as regular lists
- We can use several built-in methods and attributes that belongs to the ndarray object

## Let us create a one-dimensional numpy-array

- Make sure all the values are numeric

In [None]:
# A one dimensional array

numpy_array = np.array([-1, 4, -3, 10, 7, 3.0, 8, 10])
print(numpy_array)
numpy_array.dtype # the type of the elements in the array

In [None]:
# It is possible to add different types,
# but then we will lose most advantages of using Numpy

numpy_array_with_different_types = np.array([-1.5, 4.0, 'j', 10, 7.99, 3, 8, 'hello!'])
print(numpy_array_with_different_types)
numpy_array_with_different_types.dtype

In [None]:
# You can convert an array of strings into an array of floats (with NaNs) 

numpy_array_numerics = np.genfromtxt(numpy_array_with_different_types)
numpy_array_numerics.sort()
print(numpy_array_numerics)
numpy_array_numerics.dtype


## The dimensions of the `ndarray`

In [None]:
# What is the dimensions of this ndarray?

numpy_array = np.array([-1.5, 4.0, -3, 10, 7.99, 3, 8, 9])
print(numpy_array.shape)

In [None]:
# Alternatively, we can use two sets of brackets

numpy_array_alt = np.array([[-1.5, 4.0, -3, 10, 7.99, 3, 8, 9]])
print(numpy_array_alt.shape)

## Two-dimensional arrays

- When the array has more than one dimension, the shape of the array is given as $$rows * columns$$ 

An example of an $3 * 4$ matrix:

|        | column 0 | column 1  | column 2  | column 3  |
|---     |---       |---|---|---|
| row 0  | 10       | 12        | 14        | 16        |
| row 1  | 10.5     | 12.5      | 14.5      | 16.5      |
| row 2  | 11       | 13        | 15        | 17        | 

## Let's create this $3*4$ matrix in NumPy

In [None]:
three_times_four = np.array([[10, 12, 14, 16,],
                             [10.5, 12.5, 14.5, 16.5],
                             [11, 13, 15, 17]])

In [None]:
print(three_times_four)

In [None]:
# What is the shape of the array?

three_times_four.shape

In [None]:
# The type of the elements of the array

three_times_four.dtype

## We could have started with a one-dimensional array

In [None]:
a = np.array([10, 12, 14, 16, 10.5, 12.5, 14.5, 16.5, 11, 13, 15, 17])
a.shape

In [None]:
# And we can set the dimensions as a tuple

a.shape = (3, 4)  # let's try using -1 as one of the elements in the tuple 

In [None]:
a

In [None]:
a.shape

## Accessing the elements of `a`

In [None]:
# Accessing the first (0th) row

a[0]

In [None]:
# Accessing the last column

a[:, 3] # all the rows in the last (3rd) column (can also use -1)

In [None]:
# Accessing the element in the 1st row and 2nd column (14.5)

a[1, 2]

## The `ndarray` is mutable and we can change the values  

In [None]:
a[2, 2] = 100  # the previous value was 15

In [None]:
a

## Slicing and indexing work as with regular lists

In [None]:
# Select the last two rows and the last two columns

a[1:3, 2:4] # Remember the endpoint in the slice is non-inclusive

In [None]:
# Alternatively
a[1:, 2:]

# 3. NumPy methods

## Some special array methods

In [None]:
# A range of numbers (similar to the range() function, but for Numpy arrays)

np.arange(10)

In [None]:
# A matrix of zeros (4 rows and 4 columns)

np.zeros((4,4))

In [None]:
# A matrix of ones (2*2)
np.ones((2,2))

In [None]:
# The identity matrix

np.eye(4,4)

## Arithmetic operations with NumPy

In [None]:
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
a.shape = 4, 3

In [None]:
print(a)

### Multiplying the elements  (element wise multiplication)

In [None]:
a * a

### Adding elements 

In [None]:
a + a 

### Rise all elements to the power of 3

In [None]:
a ** 3

## Some more array functions 

#### Make a matrix one dimensional with the `ravel`  method

In [None]:
A = np.array([[2, 3, 4],
              [14, 10, 6]])
A.shape

In [None]:
A_flattened = A.ravel()

In [None]:
# ravel() does not change the original matrix

A

## Random numbers with Numpy

In [None]:
# random integers below a given value (here I use 10)

rand_ints = np.random.randint(1000, size=(10))
print(rand_ints)

In [None]:
# random samples from a uniform distribution over [0, 1)

rand_floats = np.random.rand(3,3)
print(rand_floats)

## Return an array drawn from the "standard normal" distribution.



In [None]:
x = np.random.randn(1000, 1)
#print(x)

In [None]:
# Calculate the mean of the numbers

np.mean(x)

In [None]:
# Calculate the standard deviation of the numbers

np.std(x)

## We can also use random choice with Numpy

In [None]:
def drawRandomCard():
    '''
    This function returns a random card from a deck of 52 cards.
    '''
    card_value = np.random.choice(
        ['J', 'Q', 'K', 'A'] + [str(i+2) for i in range(9)]
    )

    card_suit = np.random.choice(
        ['Heart', 'Diamonds', 'Clubs', 'Spades']
    )

    return card_suit + '-' + card_value

In [None]:
drawRandomCard()

## Numpy methods

In [None]:
rand_ints = np.random.randn(10)

In [None]:
# Sorting the array

rand_ints.sort()
print(rand_ints)

## Working with NaNs

In [None]:
rand_ints = np.random.randn(10)
rand_ints[3] = np.nan
rand_ints

In [None]:
type(rand_ints[3])

In [None]:
# Operations with a nan always returns a nan

rand_ints[3]*3

In [None]:
# We can convert the nans to zeros

np.nan_to_num(rand_ints)

# 4. Linear algebra with NumPy

## Let's have a brief look at some methods for linear algebra

- NumPy is a Python library for linear algebra. Most of what NumPy is used for is beyond the scope of this course, but let's have a quick look at some more advanced futures.  

- Let us define two square matrices $(2*2)$

In [None]:
# Using the method reshape does the same as A.shape = (2, 2) 

A = np.array([1, 2, 3, 4]).reshape((4,1)) 


B = np.array([2, 4, 6, 8]).reshape((2,2)) 

In [None]:
print('The A matrix:')
print(A)
print('')
print('The B matrix')
print(B)

### The transpose of a matrix
   - The transpose of a matrix is found by interchanging its rows into columns or columns into rows.

In [None]:
A

In [None]:
A.T

### Matrix multiplication (dot-product)
- Matrix multiplication means a row-by-column multiplication, where the entries in the $i$th row of $A$ are multiplied by the corresponding entries in the $j$th column of $B$ and then adding the results.

$A = \left[\begin{matrix}
 A_{11} & A_{12} \\
 A_{21} & A_{22}
\end{matrix}\right]= \left[\begin{matrix}
 1 & 2 \\
 3 & 4
\end{matrix}\right]\;\;$ and $\;\;B = \left[\begin{matrix}
 B_{11} & B_{12} \\
 B_{21} & B_{22}
\end{matrix}\right]= \left[\begin{matrix}
 2 & 4 \\
 6 & 8
\end{matrix}\right]$

We want to compute

$C = AB = \left[\begin{matrix}
 C_{11} & C_{12} \\
 C_{21} & C_{22}
\end{matrix}\right]$

Let's compute $C$

$C_{11} = (A_{11}*B_{11}) + (A_{12}*B_{21}) = (1*2) + (2*6) = 14$

$C_{12} = (A_{11}*B_{12}) + (A_{12}*B_{22}) = (1*4) + (2*8) = 20$

$C_{21} = (A_{21}*B_{11}) + (A_{22}*B_{21}) = (3*2) + (4*6) = 30$

$C_{22} = (A_{21}*B_{12}) + (A_{22}*B_{22}) = (3*4) + (4*8) = 44$

### Let's calculate the whole C-matrix using NumPy:

In [None]:
A = np.array([1, 2, 3, 4]).reshape((2,2)) 
B = np.array([2, 4, 6, 8]).reshape((2,2)) 

In [None]:
# There are several ways of computing the dot product
import numpy as np

np.dot(A, B)

In [None]:
# Alternative syntax 2

A.dot(B)

In [None]:
# Alternative syntax 3

A @ B

### The matrix inverse 

- If $A$ is a square matrix and $B$ is its inverse, then the product of two matrices is equal to the identity matrix.

In [None]:
invB = np.linalg.inv(B)

In [None]:
invB

In [None]:
# The dot product of B and invB is the identity matrix

np.dot(B, invB)