## Demonstrating serial vs parallel processing - Example #2

**Week02, Example 2**

ISM6562 

&copy; 2023 Dr. Tim Smith


<a target="_blank" href="https://colab.research.google.com/github/prof-tcsmith/bd-f23/blob/main/W02/W02.2-multiprocessing-ex2.ipynb#offline=1">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>


---

In [1]:
# uncomment the following line to install the package on google colab
# ! pip install multiprocess

# uncomment the following line to install the package on your local machine with conda
# ! conda install -c conda-forge multiprocess

In [2]:
import multiprocess
from multiprocess import Pool
import requests

number_of_cores = multiprocess.cpu_count()

print(f"The computer you are running this on has {number_of_cores} cores.")

The computer you are running this on has 32 cores.


In [3]:
# the number of cores to use is dependent on a number factors
#  * how many other processes are running on your computer
#  * how many cores your computer has
#  * newer CPUs have p-cores (performance cores) and e-cores (efficiency cores) 
#       - the p-cores are faster but use more power, the e-cores are slower but use less power

# Uncomment on of the following to set the number of cores to use


#number_of_cores_to_use = 2 # if you are on colaboratory, you have 2 cores

# if you are running this on your local machine, you likely have 4 or more cores
number_of_cores_to_use = number_of_cores - 1 # leave one core for the OS 

In this notebook, we repeat the same approach as per Example 01, but this time we process a large text file.

In [4]:
import re

def task(s):
    vowel_count = len([ch for ch in s if ch in 'aeiou'])
    consonant_count = len([ch for ch in s if ch not in 'bcdfghjklmnpqrstvwyz'])
    blank_count = len([ch for ch in s if ch == ' '])
    return [vowel_count, consonant_count, blank_count]
    

Load our datafile

In [5]:
# download the data from the class repo
# Thie will make it easier to run this demo, as you will not need to download the text file to your local computer.
text = requests.get('https://raw.githubusercontent.com/prof-tcsmith/big-data-f23/main/W02/data/war_and_peace.txt').text
print(text[10_000:10_300])
text = text*20 # make the text 20 times longer; to make the demo require more processing

d grown old in society and at court. He went up to Anna Pávlovna,
kissed her hand, presenting to her his bald, scented, and shining head,
and complacently seated himself on the sofa.

“First of all, dear friend, tell me how you are. Set your friend’s
mind at rest,” said he without altering his 


Let's use a single core approach

In [6]:
%%time
results = task(text)
print(f"Vowels: {results[0]:,d}")
print(f"Consonants: {results[1]:,d}")
print(f"Blanks: {results[2]:,d}")
print()

Vowels: 18,487,820
Consonants: 35,371,760
Blanks: 10,339,260

CPU times: user 2.77 s, sys: 84.5 ms, total: 2.86 s
Wall time: 2.86 s


Next, let's do this using mulutiple cores in parallel

In [7]:
%%time
from multiprocess import Pool

parts = [text[i:i+len(text)//number_of_cores] for i in range(0, len(text), len(text)//number_of_cores)]

with Pool(number_of_cores_to_use) as p:
    count = p.map(task, parts)

print(f"Vowels: {results[0]:,d}")
print(f"Consonants: {results[1]:,d}")
print(f"Blanks: {results[2]:,d}")                  

Vowels: 18,487,820
Consonants: 35,371,760
Blanks: 10,339,260
CPU times: user 39.5 ms, sys: 49.5 ms, total: 89 ms
Wall time: 1.59 s
