<a href="https://colab.research.google.com/github/trefftzc/cis677/blob/main/Starting_Point_for_Project_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# A Skeleton for the fourth programming project

You will modify a sequential version of the python code that solves the maximum independent set problem introducing MPI4PY calls.

As a first step, look at this code that was provided in class to solve the partition problem using MPI4PY:

https://github.com/trefftzc/partition_COLAB_notebooks/blob/main/partition_mpi4py.ipynb

Notice that you need to add the following code:

1. At the very beginning of the code:
  from mpi4py import MPI
2. At the beginning of the main method  
  comm = MPI.COMM_WORLD
  rank = comm.Get_rank()
  number_nodes = comm.Get_size()
3. The coordinator node, with rank 0, reads the size of the problem and the adjacency matrix.
4. The coordinator node should broadcast the size of the problem and the adjacency matrix to all other nodes.
5. Every node calculates which portion of the values in the main loop it should work on.
6. Every node works on a different portion of the main loop
7. Perform a reduction to find the largest independent set. The results should be placed on node 0, the coordinator. Node 0 will print the result.

Let's start with several test files:


In [1]:
%%writefile k4.txt
4
0 1 1 1
1 0 1 1
1 1 0 1
1 1 1 0

Writing k4.txt


In [2]:
%%writefile no_edges_4.txt
4
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0

Writing no_edges_4.txt


In [3]:
%%writefile k16.txt
16
0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0

Writing k16.txt


Now the original sequential python code. It is probably a good idea to keep it without modifications to compare the results.

In [4]:
%%writefile original_python.py
import sys
import time
import numpy as np

def read_adjacency_matrix(file_name):
  file_object = open(file_name, "r")
  # Input the number of rows and columns
  size = int(file_object.readline())
  rows = size
  cols = size
  # Initialize an empty matrix
  matrix = []

  # Input the matrix elements
  for i in range(rows):
    row = list(map(int, file_object.readline().split()))
    matrix.append(row)

  return matrix,size

# Convert an integer into a set of nodes
def convert_from_int_to_set(integer,size):
  set_of_nodes = []
  mask = 1
  for i in range(size):
    if ((mask & integer) != 0):
      set_of_nodes.append(i)
    mask = mask * 2
  return set_of_nodes

# Find the maximum independent set
def find_max_ind_set(adj_mat_numpy,size):
  max_independent_set_size = 0
  max_independent_set = []

  size_of_power_set = 1
  for i in range(size):
    size_of_power_set *= 2
  # print("The power set has ",size_of_power_set," elements")
  array_with_sizes = np.zeros(size_of_power_set)
  for i in range(size_of_power_set):
    this_set = convert_from_int_to_set(i,size)
    is_independent = True
    for n1 in this_set:
      for n2 in this_set:
        if (adj_mat_numpy[n1][n2] == 1):
          is_independent = False
    if (is_independent):
      array_with_sizes[i] = len(this_set)
    else:
      array_with_sizes[i] = 0


  max_independent_set_size = np.max(array_with_sizes)
  max_independent_set = np.where(array_with_sizes == max_independent_set_size)[0]
  print("The max independent sets are encoded by: ",max_independent_set)
  return max_independent_set_size



if __name__ == "__main__":
# Read the content of the file with the a passed in the command line
# that contain the matrices to be multiplied
  adj_matrix,size = read_adjacency_matrix(sys.argv[1])
  adj_matrix_numpy = np.array(adj_matrix)
  start_time = time.time()
  max_independent_set_size = find_max_ind_set(adj_matrix_numpy,size)
  end_time = time.time()
  elapsed_time = end_time - start_time
  print("Time required to carry out the computation in python: ",elapsed_time)
  print("The size of the maximum independent set is: ",max_independent_set_size)


Writing original_python.py


In [5]:
!python3 original_python.py k4.txt


The max independent sets are encoded by:  [1 2 4 8]
Time required to carry out the computation in python:  0.0005950927734375
The size of the maximum independent set is:  1.0


Install the mpi4py library:

In [6]:
!pip install mpi4py

Collecting mpi4py
  Downloading mpi4py-4.0.1.tar.gz (466 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/466.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.6/466.2 kB[0m [31m5.6 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m466.2/466.2 kB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: mpi4py
  Building wheel for mpi4py (pyproject.toml) ... [?25l[?25hdone
  Created wheel for mpi4py: filename=mpi4py-4.0.1-cp310-cp310-linux_x86_64.whl size=4266342 sha256=17e0b8a36f58b3c0415fef35e449e1e3bcccc57bbdd74ffdd28945add0853441
  Stored in directory: /root/.cache/pip/wheels/3c/ca/

Let's test that mpi4py is working correctly on a very small program.

In [7]:
%%writefile small_test.py
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
number_nodes = comm.Get_size()
print("I am node: ",rank)
print("There are ",number_nodes," copies of this program in this execution.")

Writing small_test.py


In [8]:
!OMPI_ALLOW_RUN_AS_ROOT=1
!mpiexec --allow-run-as-root -n 2 --oversubscribe python small_test.py

I am node:  1
There are  2  copies of this program in this execution.
I am node:  0
There are  2  copies of this program in this execution.


Now a second copy of the original python sequential code.
Change this second version to make sure it works as expected.

In [9]:
%%writefile with_mpi4py.py
import sys
import time
import numpy as np

def read_adjacency_matrix(file_name):
  file_object = open(file_name, "r")
  # Input the number of rows and columns
  size = int(file_object.readline())
  rows = size
  cols = size
  # Initialize an empty matrix
  matrix = []

  # Input the matrix elements
  for i in range(rows):
    row = list(map(int, file_object.readline().split()))
    matrix.append(row)

  return matrix,size

# Convert an integer into a set of nodes
def convert_from_int_to_set(integer,size):
  set_of_nodes = []
  mask = 1
  for i in range(size):
    if ((mask & integer) != 0):
      set_of_nodes.append(i)
    mask = mask * 2
  return set_of_nodes

# Find the maximum independent set
def find_max_ind_set(adj_mat_numpy,size):
  max_independent_set_size = 0
  max_independent_set = []

  size_of_power_set = 1
  for i in range(size):
    size_of_power_set *= 2
  # print("The power set has ",size_of_power_set," elements")
  array_with_sizes = np.zeros(size_of_power_set)
  for i in range(size_of_power_set):
    this_set = convert_from_int_to_set(i,size)
    is_independent = True
    for n1 in this_set:
      for n2 in this_set:
        if (adj_mat_numpy[n1][n2] == 1):
          is_independent = False
    if (is_independent):
      array_with_sizes[i] = len(this_set)
    else:
      array_with_sizes[i] = 0


  max_independent_set_size = np.max(array_with_sizes)
  max_independent_set = np.where(array_with_sizes == max_independent_set_size)[0]
  print("The max independent sets are encoded by: ",max_independent_set)
  return max_independent_set_size



if __name__ == "__main__":
# Read the content of the file with the a passed in the command line
# that contain the matrices to be multiplied
  adj_matrix,size = read_adjacency_matrix(sys.argv[1])
  adj_matrix_numpy = np.array(adj_matrix)
  start_time = time.time()
  max_independent_set_size = find_max_ind_set(adj_matrix_numpy,size)
  end_time = time.time()
  elapsed_time = end_time - start_time
  print("Time required to carry out the computation in python: ",elapsed_time)
  print("The size of the maximum independent set is: ",max_independent_set_size)

Writing with_mpi4py.py


And now the command to execute the code that incorporates MPI4PY functions.

In [10]:
!OMPI_ALLOW_RUN_AS_ROOT=1
!mpiexec --allow-run-as-root -n 2 --oversubscribe python with_mpi4py.py k4.txt

The max independent sets are encoded by:  [1 2 4 8]
The max independent sets are encoded by:  [1 2 4 8]
Time required to carry out the computation in python:  0.0025606155395507812
The size of the maximum independent set is:  1.0
Time required to carry out the computation in python:  0.004187107086181641
The size of the maximum independent set is:  1.0
