# Nummerical Python: `NumPy`

## Programming and Data Management (EDI 3400)

### *Vegard H. Larsen (Department of Data Science and Analytics)*

# 1. Introduction to NumPy

NumPy, which stands for "Numerical Python", is a foundational library in the Python ecosystem that provides a powerful array object and an assortment of mathematical functions to effortlessly perform complex numerical computations. At its heart lies the `ndarray` or n-dimensional array, a versatile data structure that facilitates efficient storage and operations on large datasets. NumPy's capabilities are particularly tailored for mathematical tasks like linear algebra, statistical analysis, and matrix manipulations, making it an essential tool for anyone venturing into data analysis, scientific computing, or machine learning in Python. Getting acquainted with NumPy can greatly enhance their ability to handle and analyze numerical data, bridging the gap between basic programming and more advanced computational tasks.

## What is NumPy?

- A library that supports large, multi-dimensional **numerical** arrays and matrices, and a large collection of functions for these objects
- NumPy is a cornerstone in the scientific Python community, and most other scientific packages rely on NumPy to work
- Most types of data we will ever encounter can be represented as numerical arrays and matrices, and that includes
    * Collections of documents
    * Collections of images and videos
    * Collections of sound clips
- NumPy arrays provides `list` like functionality but NumPy is much faster, more scalable and more efficient than a regular `list`

## Importing NumPy

In [13]:
import numpy
numpy.__version__

'1.23.5'

In [14]:
# The standard convention for importing NumPy is by using the keyword np

import numpy as np

In [15]:
# Then we can access the functionality of the library through this keyword.  
# We can create an NumPy array from a regular list as follows:

x = np.array([1, 2, 3])
print(x)

[1 2 3]


In [16]:
# We now have a new type of container

type(x)

numpy.ndarray

## Speed test: NumPy vs. Python list

In [17]:
import numpy as np

python_list = list(range(10_000_000)) # [0, 1, 2, ..., 9 999 999]
numpy_array = np.arange(10_000_000)   # np.array([0, 1, 2, ..., 9 999 999])

- Let's multiply all the numbers by 5 and check how long it takes

In [18]:
%timeit python_list5 = [x * 5 for x in python_list]

217 ms ± 1.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [19]:
%timeit numpy_array5 = numpy_array*5

8.9 ms ± 265 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


# 2. Working with NumPy arrays

## The NumPy Array Object (`ndarray`)

- A multidimensional container of items of the **same type**
- A ndarray can be accessed and modified by indexing or slicing such as regular lists
- We can use several built-in methods and attributes that belongs to the ndarray object

## Let us create a one-dimensional numpy-array

- Make sure all the values are numeric

In [20]:
# A one dimensional array

numpy_array = np.array([-1, 4, -3, 10, 7, 3.0, 8, 10])
print(numpy_array)
numpy_array.dtype # the type of the elements in the array

[-1.  4. -3. 10.  7.  3.  8. 10.]


dtype('float64')

In [21]:
# It is possible to add different types,
# but then we will lose most advantages of using Numpy

numpy_array_with_different_types = np.array([-1.5, 4.0, 'j', 10, 7.99, 3, 8, 'hello!'])
print(numpy_array_with_different_types)
numpy_array_with_different_types.dtype

['-1.5' '4.0' 'j' '10' '7.99' '3' '8' 'hello!']


dtype('<U32')

In [22]:
# You can convert an array of strings into an array of floats (with NaNs) 

numpy_array_numerics = np.genfromtxt(numpy_array_with_different_types)
numpy_array_numerics.sort()
print(numpy_array_numerics)
numpy_array_numerics.dtype


[-1.5   3.    4.    7.99  8.   10.     nan   nan]


dtype('float64')

## The dimensions of the `ndarray`

In [23]:
# What is the dimensions of this ndarray?

numpy_array = np.array([-1.5, 4.0, -3, 10, 7.99, 3, 8, 9])
print(numpy_array.shape)

(8,)


In [24]:
# Alternatively, we can use two sets of brackets

numpy_array_alt = np.array([[-1.5, 4.0, -3, 10, 7.99, 3, 8, 9]])
print(numpy_array_alt.shape)

(1, 8)


## Two-dimensional arrays

- When the array has more than one dimension, the shape of the array is given as $$rows * columns$$ 

An example of an $3 * 4$ matrix:

|        | column 0 | column 1  | column 2  | column 3  |
|---     |---       |---|---|---|
| row 0  | 10       | 12        | 14        | 16        |
| row 1  | 10.5     | 12.5      | 14.5      | 16.5      |
| row 2  | 11       | 13        | 15        | 17        | 

## Let's create this $3*4$ matrix in NumPy

In [25]:
three_times_four = np.array([[10, 12, 14, 16,],
                             [10.5, 12.5, 14.5, 16.5],
                             [11, 13, 15, 17]])

In [26]:
print(three_times_four)

[[10.  12.  14.  16. ]
 [10.5 12.5 14.5 16.5]
 [11.  13.  15.  17. ]]


In [27]:
# What is the shape of the array?

three_times_four.shape

(3, 4)

In [28]:
# The type of the elements of the array

three_times_four.dtype

dtype('float64')

## We could have started with a one-dimensional array

In [29]:
a = np.array([10, 12, 14, 16, 10.5, 12.5, 14.5, 16.5, 11, 13, 15, 17])
a.shape

(12,)

In [30]:
# And we can set the dimensions as a tuple

a.shape = (3, 4)  # let's try using -1 as one of the elements in the tuple 

In [31]:
a

array([[10. , 12. , 14. , 16. ],
       [10.5, 12.5, 14.5, 16.5],
       [11. , 13. , 15. , 17. ]])

In [32]:
a.shape

(3, 4)

## Accessing the elements of `a`

In [33]:
# Accessing the first (0th) row

a[0]

array([10., 12., 14., 16.])

In [34]:
# Accessing the last column

a[:, 3] # all the rows in the last (3rd) column (can also use -1)

array([16. , 16.5, 17. ])

In [35]:
# Accessing the element in the 1st row and 2nd column (14.5)

a[1, 2]

14.5

## The `ndarray` is mutable and we can change the values  

In [36]:
a[2, 2] = 100  # the previous value was 15

In [37]:
a

array([[ 10. ,  12. ,  14. ,  16. ],
       [ 10.5,  12.5,  14.5,  16.5],
       [ 11. ,  13. , 100. ,  17. ]])

## Slicing and indexing work as with regular lists

In [38]:
# Select the last two rows and the last two columns

a[1:3, 2:4] # Remember the endpoint in the slice is non-inclusive

array([[ 14.5,  16.5],
       [100. ,  17. ]])

In [39]:
# Alternatively
a[1:, 2:]

array([[ 14.5,  16.5],
       [100. ,  17. ]])

In [40]:
## Filtering arrays

# Create a new array with some values

a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

In [41]:
# Create a new array with the values of a that are greater than 5

a[a>5]

array([ 6,  7,  8,  9, 10, 11, 12])

# 3. NumPy methods

## Some special array methods

In [42]:
# A range of numbers (similar to the range() function, but for Numpy arrays)

np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [43]:
# A matrix of zeros (4 rows and 4 columns)

np.zeros((4,4))

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [44]:
# A matrix of ones (2*2)
np.ones((2,2))

array([[1., 1.],
       [1., 1.]])

In [45]:
# The identity matrix

np.eye(4,4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

## Arithmetic operations with NumPy

In [46]:
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
a.shape = 4, 3

In [47]:
print(a)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


### Multiplying the elements  (element wise multiplication)

In [48]:
a * a

array([[  1,   4,   9],
       [ 16,  25,  36],
       [ 49,  64,  81],
       [100, 121, 144]])

### Adding elements 

In [49]:
a + a 

array([[ 2,  4,  6],
       [ 8, 10, 12],
       [14, 16, 18],
       [20, 22, 24]])

### Rise all elements to the power of 3

In [50]:
a ** 3

array([[   1,    8,   27],
       [  64,  125,  216],
       [ 343,  512,  729],
       [1000, 1331, 1728]])

## Some more array functions 

#### Make a matrix one dimensional with the `ravel`  method

In [51]:
A = np.array([[2, 3, 4],
              [14, 10, 6]])
A.shape

(2, 3)

In [52]:
A_flattened = A.ravel()

In [53]:
# ravel() does not change the original matrix

A

array([[ 2,  3,  4],
       [14, 10,  6]])

## Random numbers with Numpy

In [54]:
# random integers below a given value (here I use 10)

rand_ints = np.random.randint(1000, size=(10))
print(rand_ints)

[ 95 710   2 834 733 768  86 352 635  99]


In [55]:
# random samples from a uniform distribution over [0, 1)

rand_floats = np.random.rand(3,3)
print(rand_floats)

[[0.18373158 0.74435425 0.2833022 ]
 [0.07159149 0.32419837 0.91077249]
 [0.19366387 0.9290263  0.14915245]]


## Return an array drawn from the "standard normal" distribution.



In [56]:
x = np.random.randn(1000, 1)
#print(x)

In [57]:
# Calculate the mean of the numbers

np.mean(x)

-0.04101970528310343

In [58]:
# Calculate the standard deviation of the numbers

np.std(x)

0.9764967486189323

## We can also use random choice with Numpy

In [59]:
def drawRandomCard():
    '''
    This function returns a random card from a deck of 52 cards.
    '''
    card_value = np.random.choice(
        ['J', 'Q', 'K', 'A'] + [str(i+2) for i in range(9)]
    )

    card_suit = np.random.choice(
        ['Heart', 'Diamonds', 'Clubs', 'Spades']
    )

    return card_suit + '-' + card_value

In [60]:
drawRandomCard()

'Diamonds-5'

## Numpy methods

In [61]:
# Numpy has its own random number generator

rand_ints = np.random.randn(10)

In [62]:
# Sorting the array
# This will change the original array

rand_ints.sort()
print(rand_ints)

[-0.64608155 -0.64282067 -0.55422714 -0.36694679  0.07344789  0.48558739
  0.48960269  0.53007004  0.55985312  1.89184768]


## Working with NaNs

In [63]:
rand_ints = np.random.randn(10)
rand_ints[3] = np.nan
rand_ints

array([ 1.18998221, -0.41813104, -1.83304688,         nan, -0.41761412,
        1.68148214,  0.04486807, -1.66298718, -0.69684782,  1.08758214])

In [64]:
# The nan´s are of the type float

type(rand_ints[3])

numpy.float64

In [65]:
# Operations with a nan always returns a nan

rand_ints[3]*3

nan

In [66]:
# We can convert the nans to zeros

np.nan_to_num(rand_ints)

array([ 1.18998221, -0.41813104, -1.83304688,  0.        , -0.41761412,
        1.68148214,  0.04486807, -1.66298718, -0.69684782,  1.08758214])

# 4. Linear algebra with NumPy

## Let's have a brief look at some methods for linear algebra

- NumPy is a Python library for linear algebra. Most of what NumPy is used for is beyond the scope of this course, but let's have a quick look at some more advanced futures.  

- Let us define two square matrices $(2*2)$

In [67]:
# Using the method reshape does the same as A.shape = (2, 2) 

A = np.array([1, 2, 3, 4]).reshape((2,2)) 


B = np.array([2, 4, 6, 8]).reshape((2,2)) 

In [68]:
print('The A matrix:')
print(A)
print('')
print('The B matrix')
print(B)

The A matrix:
[[1 2]
 [3 4]]

The B matrix
[[2 4]
 [6 8]]


### The transpose of a matrix
   - The transpose of a matrix is found by interchanging its rows into columns or columns into rows.

In [69]:
A

array([[1, 2],
       [3, 4]])

In [70]:
A.T

array([[1, 3],
       [2, 4]])

### Matrix multiplication (dot-product)
- Matrix multiplication means a row-by-column multiplication, where the entries in the $i$th row of $A$ are multiplied by the corresponding entries in the $j$th column of $B$ and then adding the results.

$A = \left[\begin{matrix}
 A_{11} & A_{12} \\
 A_{21} & A_{22}
\end{matrix}\right]= \left[\begin{matrix}
 1 & 2 \\
 3 & 4
\end{matrix}\right]\;\;$ and $\;\;B = \left[\begin{matrix}
 B_{11} & B_{12} \\
 B_{21} & B_{22}
\end{matrix}\right]= \left[\begin{matrix}
 2 & 4 \\
 6 & 8
\end{matrix}\right]$

We want to compute

$C = AB = \left[\begin{matrix}
 C_{11} & C_{12} \\
 C_{21} & C_{22}
\end{matrix}\right]$

Let's compute $C$

$C_{11} = (A_{11}*B_{11}) + (A_{12}*B_{21}) = (1*2) + (2*6) = 14$

$C_{12} = (A_{11}*B_{12}) + (A_{12}*B_{22}) = (1*4) + (2*8) = 20$

$C_{21} = (A_{21}*B_{11}) + (A_{22}*B_{21}) = (3*2) + (4*6) = 30$

$C_{22} = (A_{21}*B_{12}) + (A_{22}*B_{22}) = (3*4) + (4*8) = 44$

### Let's calculate the whole C-matrix using NumPy:

In [71]:
A = np.array([1, 2, 3, 4]).reshape((2,2)) 
B = np.array([2, 4, 6, 8]).reshape((2,2)) 

In [72]:
# There are several ways of computing the dot product
import numpy as np

np.dot(A, B)

array([[14, 20],
       [30, 44]])

In [73]:
# Alternative syntax 2

A.dot(B)

array([[14, 20],
       [30, 44]])

In [74]:
# Alternative syntax 3

A @ B

array([[14, 20],
       [30, 44]])

### The matrix inverse 

- If $A$ is a square matrix and $B$ is its inverse, then the product of two matrices is equal to the identity matrix.
- The inverse of a matrix $A$ is denoted as $A^{-1}$, and it is defined as the matrix that satisfies the following condition:
$$A^{-1}A = AA^{-1} = I$$
- The inverse of a matrix does not always exist. If the inverse of a matrix exists, it is unique.
- The inverse of a matrix can be found using the `inv` method in NumPy´s `linalg` module.

In [75]:
B

array([[2, 4],
       [6, 8]])

In [76]:
invB = np.linalg.inv(B)

In [77]:
invB

array([[-1.  ,  0.5 ],
       [ 0.75, -0.25]])

In [80]:
# The dot product of B and invB is the identity matrix

np.dot(B, invB)

array([[1.0000000e+00, 0.0000000e+00],
       [8.8817842e-16, 1.0000000e+00]])