<a href="https://colab.research.google.com/github/trefftzc/cis677/blob/main/CIS677_Fall2024_numpy_and_numba.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CIS 677 -- A COLAB notebook to learn the basics about numpy and numba


Save this notebook on your Google account:

1. Click on File and select Save on Drive
2. Rename the file to your last name
3. Make sure that your notebook is visible to others. Check on the Share button at the top, click on General Access and select Anyone with the link


It is possible to interact with the operating system of the computer at Google that is hosting this notebook.

To interact with the operating system, start with the ! character in a code cell.

When you click on the cell below, you will see the current list of files on the host computer.

In [6]:
!ls

A_100.Text  A_200.Text	B_100.Text  B_200.Text	matrix_multiplication.py  sample_data


# Matrix multiplication

Matrix multiplication is a fundamental operation in numerical linear algebra.

Given two matrices:
1. Matrix A with size $l \times m$
2. Matrix B with size $m \times n$
The product will be a matrix C of size $l \times n$

This is Wikipedia's entry on Matrix Multiplication:

https://en.wikipedia.org/wiki/Matrix_multiplication#:~:text=For%20matrix%20multiplication%2C%20the%20number,B%20is%20denoted%20as%20AB.

To simplify things, we will use square matrices, with the same number of rows and columns, for the following examples.

Let's create a couple of sample matrices.

It is possible to create a file in COLAB using the directive
%%writefile nameOfTheFile

In [7]:
%%writefile A_4.Text
4
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16

Writing A_4.Text


In [8]:
%%writefile B_4.Text
4
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1

Writing B_4.Text


Now, let's create a program that reads both files, multiplies the corresponding matrices and then prints the result.

In [9]:
%%writefile matrix_multiplication.py
import sys
import time

def read_file(file_name):
  file_object = open(file_name, "r")
  # Input the number of rows and columns
  size = int(file_object.readline())
  rows = size
  cols = size
  # Initialize an empty matrix
  matrix = []

  # Input the matrix elements
  for i in range(rows):
    row = list(map(int, file_object.readline().split()))
    matrix.append(row)
  # Display the matrix
  print("The matrix contained in the file ",file_name," is: ")
  for row in matrix:
    print(row)
  return matrix,size

# Main code

# Read the content of the files passed in the command line
# that contain the matrices to be multiplied
A,size = read_file(sys.argv[1])
B,size = read_file(sys.argv[2])

# Initialize the result matrix to 0s
C = [[0 for x in range(size)] for y in range(size)]

# Multiply the matrices
N = size
start_time = time.time()
for i in range(N):
  for j in range(N):
    for k in range(N):
      C[i][j] += A[i][k]*B[k][j]

end_time = time.time()
elapsed_time = end_time - start_time
print("Time required to carry out the computation: ",elapsed_time)

# Print out the resulting matrix
for i in range(N):
  for j in range(N):
    print(C[i][j]," ",end="")
  print("\n")

Overwriting matrix_multiplication.py


In [10]:
!python3 matrix_multiplication.py A_4.Text B_4.Text

The matrix contained in the file  A_4.Text  is: 
[1, 2, 3, 4]
[5, 6, 7, 8]
[9, 10, 11, 12]
[13, 14, 15, 16]
The matrix contained in the file  B_4.Text  is: 
[1, 0, 0, 0]
[0, 1, 0, 0]
[0, 0, 1, 0]
[0, 0, 0, 1]
Time required to carry out the computation:  4.38690185546875e-05
1  2  3  4  

5  6  7  8  

9  10  11  12  

13  14  15  16  



Let's try with slightly larger matrices:

In [11]:
%%writefile A_8.Text
8
1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56
57 58 59 60 61 62 63 64


Writing A_8.Text


In [12]:
%%writefile B_8.Text
8
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 1

Writing B_8.Text


Now, let's execute the program again with these larger files:


In [13]:
!python3 matrix_multiplication.py A_8.Text B_8.Text

The matrix contained in the file  A_8.Text  is: 
[1, 2, 3, 4, 5, 6, 7, 8]
[9, 10, 11, 12, 13, 14, 15, 16]
[17, 18, 19, 20, 21, 22, 23, 24]
[25, 26, 27, 28, 29, 30, 31, 32]
[33, 34, 35, 36, 37, 38, 39, 40]
[41, 42, 43, 44, 45, 46, 47, 48]
[49, 50, 51, 52, 53, 54, 55, 56]
[57, 58, 59, 60, 61, 62, 63, 64]
The matrix contained in the file  B_8.Text  is: 
[1, 0, 0, 0, 0, 0, 0, 0]
[0, 1, 0, 0, 0, 0, 0, 0]
[0, 0, 1, 0, 0, 0, 0, 0]
[0, 0, 0, 1, 0, 0, 0, 0]
[0, 0, 0, 0, 1, 0, 0, 0]
[0, 0, 0, 0, 0, 1, 0, 0]
[0, 0, 0, 0, 0, 0, 1, 0]
[0, 0, 0, 0, 0, 0, 0, 1]
Time required to carry out the computation:  0.00040984153747558594
1  2  3  4  5  6  7  8  

9  10  11  12  13  14  15  16  

17  18  19  20  21  22  23  24  

25  26  27  28  29  30  31  32  

33  34  35  36  37  38  39  40  

41  42  43  44  45  46  47  48  

49  50  51  52  53  54  55  56  

57  58  59  60  61  62  63  64  



And now, a third pair of matrices, now of size 12.

In [14]:
%%writefile A_12.Text
12
1 2 3 4 5 6 7 8 9 10 11 12
13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36
37 38 39 40 41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70 71 72
73 74 75 76 77 78 79 80 81 82 83 84
85 86 87 88 89 90 91 92 93 94 95 96
97 98 99 100 101 102 103 104 105 106 107 108
109 110 111 112 113 114 115 116 117 118 119 120
121 122 123 124 125 126 127 128 129 130 131 132
133 134 135 136 137 138 139 140 141 142 143 144

Writing A_12.Text


In [15]:
%%writefile B_12.Text
12
1 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 1

Writing B_12.Text


And now we run the code again:


In [16]:
!python3 matrix_multiplication.py A_12.Text B_12.Text

The matrix contained in the file  A_12.Text  is: 
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
[13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
[25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36]
[37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48]
[49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60]
[61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72]
[73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84]
[85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96]
[97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108]
[109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120]
[121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132]
[133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144]
The matrix contained in the file  B_12.Text  is: 
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0

Let's download a couple of larger matrices of sizes $100 \times 100$ and $200 \times 200$ to carry out some more time consuming computations.

In [17]:
!wget https://faculty.computing.gvsu.edu/trefftzc/cs677/A_100.Text
!wget https://faculty.computing.gvsu.edu/trefftzc/cs677/B_100.Text
!wget https://faculty.computing.gvsu.edu/trefftzc/cs677/A_200.Text
!wget https://faculty.computing.gvsu.edu/trefftzc/cs677/B_200.Text

--2024-09-21 04:54:40--  https://faculty.computing.gvsu.edu/trefftzc/cs677/A_100.Text
Resolving faculty.computing.gvsu.edu (faculty.computing.gvsu.edu)... 104.17.87.18, 104.17.88.18, 2606:4700::6811:5812, ...
Connecting to faculty.computing.gvsu.edu (faculty.computing.gvsu.edu)|104.17.87.18|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 48998 (48K) [text/plain]
Saving to: ‘A_100.Text.1’


2024-09-21 04:54:41 (840 KB/s) - ‘A_100.Text.1’ saved [48998/48998]

--2024-09-21 04:54:41--  https://faculty.computing.gvsu.edu/trefftzc/cs677/B_100.Text
Resolving faculty.computing.gvsu.edu (faculty.computing.gvsu.edu)... 104.17.87.18, 104.17.88.18, 2606:4700::6811:5812, ...
Connecting to faculty.computing.gvsu.edu (faculty.computing.gvsu.edu)|104.17.87.18|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20104 (20K) [text/plain]
Saving to: ‘B_100.Text.1’


2024-09-21 04:54:41 (743 KB/s) - ‘B_100.Text.1’ saved [20104/20104]

--2024-09-21 04:54:41-- 

# Numpy

numpy is a widely used library for numerical linear algebra.
It executes code much faster thatn regular python.

This is numpy's web site: https://numpy.org/

Numpy uses arrays that are much faster to operate on than regular Python lists.

Many implementations of numpy use internally an Intel library called MKL(https://en.wikipedia.org/wiki/Math_Kernel_Library) that is multi-threaded and takes advantage of multiple cores.

The following code is similar in purpose to the previous matrix multiplication code but it uses numpy arrays instead of Python's lists and it uses the python function matmul.

This version of the code performs the matrix multiplication twice, once using the original code and the next time using the matmul function in numpy. The difference in execution speeds is very significant.

In [18]:
%%writefile matrix_multiplication.py
import sys
import time
import numpy as np

def read_file(file_name):
  file_object = open(file_name, "r")
  # Input the number of rows and columns
  size = int(file_object.readline())
  rows = size
  cols = size
  # Initialize an empty matrix
  matrix = []

  # Input the matrix elements
  for i in range(rows):
    row = list(map(int, file_object.readline().split()))
    matrix.append(row)
  # Display the matrix
  # print("The matrix contained in the file ",file_name," is: ")
  # for row in matrix:
  #   print(row)
  return matrix,size

# Main code
def matrix_multiplication():
# Read the content of the files passed in the command line
# that contain the matrices to be multiplied
  A,size = read_file(sys.argv[1])
  B,size = read_file(sys.argv[2])

# Initialize the result matrix to 0s
  C = [[0 for x in range(size)] for y in range(size)]

# Multiply the matrices
  N = size
  start_time = time.time()
  for i in range(N):
    for j in range(N):
      for k in range(N):
        C[i][j] += A[i][k]*B[k][j]

  end_time = time.time()
  elapsed_time = end_time - start_time
  print("Time required to carry out the computation with regular python: ",elapsed_time)

# Now use numpy to compare the execution times
  a_numpy = np.array(A)
  b_numpy = np.array(B)
  C = [[0 for x in range(size)] for y in range(size)]
  c_numpy = np.array(C)
  start_time = time.time()
  C_numpy = np.matmul(a_numpy,b_numpy)
  end_time = time.time()
  elapsed_time = end_time - start_time
  print("Time required to carry out the computation with numpy: ",elapsed_time)


if __name__ == "__main__":
  matrix_multiplication()

Overwriting matrix_multiplication.py


Now, let's compare the execution times with the matrices of sizes $100 \times 100$ and $200 \times 200$


In [19]:
!python3 matrix_multiplication.py A_100.Text B_100.Text

Time required to carry out the computation with regular python:  0.1957845687866211
Time required to carry out the computation with numpy:  0.0009784698486328125


In [20]:
!python3 matrix_multiplication.py A_200.Text B_200.Text

Time required to carry out the computation with regular python:  1.5662422180175781
Time required to carry out the computation with numpy:  0.008792400360107422


# NUMBA
Numba (https://numba.pydata.org/) is another useful library in Python.

Python is an interpreted language and that makes it very flexible but rather slow.

There is an open-source compiler framework called LLVM (https://llvm.org/). It is very flexible and it has been used to write compilers for many different programming languages and execution environments.

Numba uses llvm to compile specific functions (chosen by the programmer) into binary code. This binary code will execute much faster than the original function in python.

One imports the numba library at the top of the python program.

One then chooses the function that one wants to compile into binary code and uses a decorator at the beginning of that function.

The compilation will take some additional time, but after the code has been compiled the execution will be much faster.

The code below uses numba to compile the code that multiplieds the matrices. In this particular example, the matrices are converted into numpy arrays before calling the matrix multiplication function. This will allow numba to compile the function and it will help make the execution faster.


In [21]:
%%writefile matrix_multiplication.py
import sys
import time
from numba import jit
import numpy as np

def read_file(file_name):
  file_object = open(file_name, "r")
  # Input the number of rows and columns
  size = int(file_object.readline())
  rows = size
  cols = size
  # Initialize an empty matrix
  matrix = []

  # Input the matrix elements
  for i in range(rows):
    row = list(map(int, file_object.readline().split()))
    matrix.append(row)
  # Display the matrix
  # print("The matrix contained in the file ",file_name," is: ")
  # for row in matrix:
  #   print(row)
  return matrix,size

@jit(nopython=True)
def matrix_multiplication_core(A,B,C,N):
  for i in range(N):
    for j in range(N):
      for k in range(N):
        C[i][j] += A[i][k]*B[k][j]



def matrix_multiplication_core_with_numpy(A,B,C):
  C = np.matmul(A,B)


# Main code
def matrix_multiplication():
# Read the content of the files passed in the command line
# that contain the matrices to be multiplied
  A,size = read_file(sys.argv[1])
  B,size = read_file(sys.argv[2])

# Initialize the result matrix to 0s
  C = [[0 for x in range(size)] for y in range(size)]
  a_numpy = np.array(A)
  b_numpy = np.array(B)
  c_numpy = np.array(C)
# To time without the cost of the compilation, execute a couple of dry runs
# without timing
  N = size
  matrix_multiplication_core(a_numpy,b_numpy,c_numpy,N)
  matrix_multiplication_core_with_numpy(a_numpy,b_numpy,c_numpy)

# Initialize C again to be ready for the timed executions
  C = [[0 for x in range(size)] for y in range(size)]
  c_numpy = np.array(C)
# Multiply the matrices

  start_time = time.time()
  matrix_multiplication_core(a_numpy,b_numpy,c_numpy,N)
  end_time = time.time()
  elapsed_time = end_time - start_time
  print("Time required to carry out the computation with python compiled with numba: ",elapsed_time)

# Now use numpy to compare the execution times
  a_numpy = np.array(A)
  b_numpy = np.array(B)
  C = [[0 for x in range(size)] for y in range(size)]
  c_numpy = np.array(C)
  start_time = time.time()
  matrix_multiplication_core_with_numpy(a_numpy,b_numpy,c_numpy)
  end_time = time.time()
  elapsed_time = end_time - start_time
  print("Time required to carry out the computation with numpy: ",elapsed_time)


if __name__ == "__main__":
  matrix_multiplication()

Overwriting matrix_multiplication.py


In [22]:
!python3 matrix_multiplication.py A_100.Text B_100.Text

Time required to carry out the computation with python compiled with numba:  0.0006916522979736328
Time required to carry out the computation with numpy:  0.0009198188781738281


In [23]:
!python3 matrix_multiplication.py A_200.Text B_200.Text

Time required to carry out the computation with python compiled with numba:  0.007699489593505859
Time required to carry out the computation with numpy:  0.010648250579833984


And now with prange to parallelize the most external loop

In [27]:
%%writefile matrix_multiplication.py
import sys
import time
from numba import jit,prange
import numpy as np

def read_file(file_name):
  file_object = open(file_name, "r")
  # Input the number of rows and columns
  size = int(file_object.readline())
  rows = size
  cols = size
  # Initialize an empty matrix
  matrix = []

  # Input the matrix elements
  for i in range(rows):
    row = list(map(int, file_object.readline().split()))
    matrix.append(row)
  # Display the matrix
  # print("The matrix contained in the file ",file_name," is: ")
  # for row in matrix:
  #   print(row)
  return matrix,size

@jit(nopython=True,parallel=True)
def matrix_multiplication_core(A,B,C,N):
  for i in prange(N):
    for j in range(N):
      for k in range(N):
        C[i][j] += A[i][k]*B[k][j]



def matrix_multiplication_core_with_numpy(A,B,C):
  C = np.matmul(A,B)


# Main code
def matrix_multiplication():
# Read the content of the files passed in the command line
# that contain the matrices to be multiplied
  A,size = read_file(sys.argv[1])
  B,size = read_file(sys.argv[2])

# Initialize the result matrix to 0s
  C = [[0 for x in range(size)] for y in range(size)]
  a_numpy = np.array(A)
  b_numpy = np.array(B)
  c_numpy = np.array(C)
# To time without the cost of the compilation, execute a couple of dry runs
# without timing
  N = size
  matrix_multiplication_core(a_numpy,b_numpy,c_numpy,N)
  matrix_multiplication_core_with_numpy(a_numpy,b_numpy,c_numpy)

# Initialize C again to be ready for the timed executions
  C = [[0 for x in range(size)] for y in range(size)]
  c_numpy = np.array(C)
# Multiply the matrices

  start_time = time.time()
  matrix_multiplication_core(a_numpy,b_numpy,c_numpy,N)
  end_time = time.time()
  elapsed_time = end_time - start_time
  print("Time required to carry out the computation with python compiled with numba: ",elapsed_time)

# Now use numpy to compare the execution times
  a_numpy = np.array(A)
  b_numpy = np.array(B)
  C = [[0 for x in range(size)] for y in range(size)]
  c_numpy = np.array(C)
  start_time = time.time()
  matrix_multiplication_core_with_numpy(a_numpy,b_numpy,c_numpy)
  end_time = time.time()
  elapsed_time = end_time - start_time
  print("Time required to carry out the computation with numpy: ",elapsed_time)


if __name__ == "__main__":
  matrix_multiplication()

Overwriting matrix_multiplication.py


In [28]:
!python3 matrix_multiplication.py A_100.Text B_100.Text

Time required to carry out the computation with python compiled with numba:  0.0008451938629150391
Time required to carry out the computation with numpy:  0.0009076595306396484


In [29]:
!python3 matrix_multiplication.py A_200.Text B_200.Text

Time required to carry out the computation with python compiled with numba:  0.006304264068603516
Time required to carry out the computation with numpy:  0.008056163787841797
