##### College of Engineering, Construction and Living Sciences<br>Bachelor of Information Technology<br>IN608: Intermediate Application Development Concepts<br>Level 6, Credits 15<br><br> 

# Python 22: Concurrency & Parallelism

In this practical, you will complete a series of tasks covering today's lecture. 

Before you start, in your practicals repository, create a new branch called **22-practical**.

**Note:** Some of this code does note like being run in a Jupyter
Notebook. You may need to enter the code in a text editor and run  in directly.

## <ins>Programming Activity</ins>
 
**Question 1:** Answer the following questions:
1. What is concurrency?
2. What is the difference between CPU bound & I/O bound?
3. What is a thread?
4. When the following cell is executed, what is happening?
5. The `Thread` object has three arguments. What are these arguments & their purpose?
6. What does the `threading` function `start` do?
7. What does the `threading` function `join` do?

Resource - https://docs.python.org/3/library/threading.html

In [None]:
# Write your answers here

# 1.
# 2.
# 3.
# 4.
# 5.
# 6.
# 7.

In [None]:
from threading import Thread
from time import perf_counter, sleep

def sleeping(secs):
    print(f'Going to sleep for {secs} second(s)')
    sleep(secs)
    print(f'Woke up after {secs} second(s)')

def main():
    start = perf_counter()

    threads = [Thread(target=sleeping, args=[5], daemon=True) for _ in range(5)]

    for t in threads:
        t.start()

    for t in threads:
        t.join()

    finish = perf_counter()
    
    elapsed_time = round(finish - start, 2)
    
    print(f'Process finished in {elapsed_time} second(s)')

if __name__ == '__main__':
    main()



### Threading & Picsum API

**Question 2:** In this question, you will download 10 images from the **Picsum API** to your current directory. Currently, it takes eight seconds to download all 10 images, but with threading, it takes two seconds. Use the hints in the `main` method to download the 10 images.

In [None]:
from concurrent.futures import ThreadPoolExecutor
from requests import get
from time import perf_counter

def download_img(url):
    img_bytes = get(url).content
    img_name = ''.join(url.split('/')[4:])
    img_name = f'{img_name}.jpg'
    with open(img_name, 'wb') as f:
        f.write(img_bytes)
        print(f'{img_name} was downloaded.')

def main():
    start = perf_counter()
    
    req = get('https://picsum.photos/v2/list?limit=10')
    download_urls = [url['download_url'] for url in req.json()]
    
    for img in  download_urls:
        download_img(img)
        
    # Instead of the loop above, use threads.    
    # Use a context manager with ThreadPoolExecutor() as executor
    # Call the executor's map method - pass in a function & iterable

    finish = perf_counter()
    
    elapsed_time = round(finish - start, 2)
    
    print(f'Process finished in {elapsed_time} second(s)')

if __name__ == '__main__':
    main()

### Race Conditions
**Question 3:** When multiple threads read from & write to the same memory in an unpredictable way, we can get race conditions that produce hard to find bugs. The code below has such a race condition.

Answer the following questions:
1. Before you run the code below, what do you think the value of `racy_counter` will be when it completes?
2. What was the actual result?
3. Increase `max_workers` to 40. Do you expect the final value of `racy_counter` to increase, decrease, or stay the same?
4. What was the actual result?
5. Explain what you've observed.
(Disclaimer: This code is unpredictable. You'll get different results at 
different times and on  different platforms.)

In [None]:
# Write your answers here

# 1.
# 2.
# 3.
# 4.
# 5.

In [1]:
from concurrent.futures import ThreadPoolExecutor
from time import sleep, perf_counter
from random import randint

racy_counter = 0

def racy(i):
    global racy_counter
    sleep(randint(1, 2))
    local_count = racy_counter
    sleep(randint(1, 2))
    racy_counter = local_count + 1
    return 0

def do_racy_tasks(num):
    tasks = list(range(num))
    with ThreadPoolExecutor(max_workers=20) as ex:
        ex.map(racy, tasks)
        
def main():
    count = 500
    
    start = perf_counter()
    
    do_racy_tasks(count)
    
    finish = perf_counter()
    
    elapsed_time = round(finish - start, 2)
    
    print(f'Process finished in {elapsed_time} second(s)')
    print(f'racy_counter = {racy_counter}')

if __name__ == '__main__':
    main()

Process finished in 77.14 second(s)
racy_counter = 45


### Multi-Processing & Parallelism
**Question 4:** Answer the following questions:
1. What is parallelism?
2. When the following cell is executed, what is happening?
3. What does the `multiprocessing` function `start` do?
4. What does the `multiprocessing` function `join` do?

Resource - https://docs.python.org/3/library/multiprocessing.html 

In [None]:
# Write your answers here

# 1.
# 2.
# 3.
# 4.

In [None]:
from multiprocessing import Process
from time import perf_counter, sleep

def sleeping(secs):
    print(f'Going to sleep for {secs} second(s)')
    sleep(secs)
    print(f'Woke up after {secs} second(s)')

def main():     
    start = perf_counter()

    processes = [Process(target=sleeping, args=[5]) for _ in range(5)]

    for p in processes:
        p.start()

    for p in processes:
        p.join()

    finish = perf_counter()
    
    elapsed_time = round(finish - start, 2)
    
    print(f'Process finished in {elapsed_time} second(s)')

if __name__ == '__main__':
    main()

### Alternative - ProcessPoolExecutor

`concurrent.futures` `ProcessPoolExecutor` is an alternative to the `threading` module.

In [None]:
from concurrent.futures import ProcessPoolExecutor
from time import perf_counter, sleep

def sleeping(secs):
    print(f'Going to sleep for {secs} second(s)')
    sleep(secs)
    return f'Woke up after {secs} second(s)'

def main():
    start = perf_counter()

    with ProcessPoolExecutor() as ex:
        secs = [_ for _ in reversed(range(1, 6))]
        ex.map(sleeping, secs)
        

    finish = perf_counter()
    
    elapsed_time = round(finish - start, 2)
    
    print(f'Process finished in {elapsed_time} second(s)')

if __name__ == '__main__':
    main()

### Multi-Processing & Photo Filtering

**Question 5:** 
1. In this question, you will apply an image filter to each downloaded image using multi-processing. Use the `glob` module to get all downloaded images in the current directory with the extension `.jpg`. Append these images to a list called `imgs`.
2. Make a note of the process time. Refactor the code so that it is using `ThreadPoolExecutor` instead of `ProcessPoolExecutor`. Execute the code & compare the process time. Note the differences in a comment at the bottom of the cell.

Resource - https://docs.python.org/3/library/glob.html 

In [None]:
from concurrent.futures import ProcessPoolExecutor
from glob import glob
from os import chdir
from PIL import Image, ImageFilter
from requests import get
from time import perf_counter

def filter_img(img_name):
    img = Image.open(img_name)
    img = img.filter(ImageFilter.GaussianBlur(15))
    img.save(img_name)
    print(f'{img_name} was processed.')

def main():
    start = perf_counter()

    # Get all images in the current directory with the extension .jpg and append to a list

    # Use a context manager with ProcessPoolExecutor() as ex
        # Call the ex's map method - pass in a function & iterable

    finish = perf_counter()
    
    elapsed_time = round(finish - start, 2)
    
    print(f'Process finished in {elapsed_time} second(s)')

if __name__ == '__main__':
    main()

# Submission
Create a new pull request and assign **tclark** to review your submission

**Note:** Please don't merge your own pull request.