## Lecture 9: Python Concurrency
### March 29, 2023

Partly based on [https://nyu-cds.github.io/python-concurrency/](https://nyu-cds.github.io/python-concurrency/)


## Improving performance by using concurrency

Concurrency vs parallelism:

    Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once.
    
[source](https://medium.com/@itIsMadhavan/concurrency-vs-parallelism-a-brief-review-b337c8dac350)

We will illustrate some benefits of concurrency with a program downloading images from the `imgur.com` website.

For this you will need to:

- create an account in [imgur.com](https://imgur.com/)
- register your application [here](https://api.imgur.com/oauth2/addclient)
  - Authorization Type: __OAuth 2 authorization with a callback URL__
  - Authorization Callback URL: __https://www.getpostman.com/oauth2/callback__
  - email:
  - Description:
  

---
The functions below fetchs a list of images and download them __imgur__ repository: 
[https://imgur.com/](https://imgur.com/)

- We will start with a version that downloads images sequentially, or one at a time

- Then improve the performance by introducing multiprocessing and threading

---
We will split the functionality into three separate functions, see the file `download.py`
- get_links
- download_link
- setup_download_dir

In [5]:
from time import time

# 'replace with your client ID'
CLIENT_ID = 'e7bea2539b6e0ac'
from download import setup_download_dir, get_links, download_link

ts = time()
download_dir = setup_download_dir()

links = [l for l in get_links(CLIENT_ID)]

for i, link in enumerate(links):
    print("%2d %s" % (i, link))
    download_link(download_dir, link)

print('Took {}s'.format(time() - ts))

 0 http://i.imgur.com/nvcycnkh.gif
 1 https://i.imgur.com/nvWmpzV.jpg
 2 http://i.imgur.com/nvsVWNXh.gif
 3 https://i.imgur.com/nvL2F.png
 4 https://i.imgur.com/nvqX8.jpg
 5 https://i.imgur.com/nvRcyJM.jpg
 6 http://i.imgur.com/nvh22YSh.gif
 7 http://i.imgur.com/nvvAMgjh.gif
 8 https://i.imgur.com/nvcplSu.jpg
 9 http://i.imgur.com/nvxsuyHh.gif
10 https://i.imgur.com/nvj6pYU.png
11 https://i.imgur.com/nvW6yNi.png
12 http://i.imgur.com/nvXT5pth.gif
13 https://i.imgur.com/nvr7149.gif
Took 10.135595083236694s


In [6]:
ls images/

2420PVx.png   4TDSRam.jpg   4TzeI21.gif   LsAMj.jpg     nvqX8.jpg
247syip.jpg   4TGGmHjh.gif  Ix3EQ.gif     LsEW9kS.png   nvr7149.gif
249ocqC.jpg   4THHJ0M.jpg   Ix3OxTE.png   LsM6KI8.gif   nvsVWNXh.gif
24E9yfU.png   4TI7x.jpg     Ix46rpd.jpg   LsP5KiT.jpg   nvvAMgjh.gif
24arB.jpg     4TI9qrf.jpg   Ix9nmD2.jpg   LscbnDs.gif   nvxsuyHh.gif
24cqIGgh.gif  4TOxno6.gif   IxBXYx6h.gif  Lsf7aH4.gif   oy6kw.jpg
24fmdAy.gif   4TPwKnVh.gif  IxBwClK.jpg   Lsg9qgrh.gif  oy9qwAoh.gif
24gu7yX.jpg   4TQP5jI.jpg   IxEGPdi.png   LsgNC5y.gif   oyK8HcJh.gif
24inH.gif     4TRPpvI.jpg   IxFRq2R.gif   LsmnU4w.gif   oyKsobC.jpg
24jkqkjh.gif  4TW82.jpg     IxLvnI8h.gif  LsqzOvk.jpg   oyNSra8.png
24nyADh.jpg   4TZ22ZX.gif   IxXF0CW.jpg   nvL2F.png     oyP1W.jpg
24oiWvMh.gif  4TggCMGh.gif  IxZjY7B.gif   nvRcyJM.jpg   oyTPssF.jpg
24pWw.jpg     4Th1kuph.gif  IxaPpPV.gif   nvW6yNi.png   oyeeIgm.jpg
24vCA0Dh.gif  4ThYR6u.gif   Ixd38.jpg     nvWmpzV.jpg   oyiQl.jpg
24vhZSE.jpg   4Tk29JA.jpg   Ixjycvc.p

---

- To improve the performance of the image downloader we can run **multiple copies** of the program at the same time. 


- However, we would need to know what images are available so that we could ensure that one process didn’t download an image that had already been downloaded by a different process.  


- Fortunately the multiprocessing module is available for this purpose.

---

### Pool

- To use multiple processes we need a multiprocessing **Pool**. 


- The Pool class provides a map method that runs a function as a separate process, passing arguments from a supplied iterable. 


- The iterable is divided into a number of chunks, so that each process gets roughly the same number of elements. 


- We will pass the list of URLs to the pool, which starts 8 new processes and use each one to download the images in parallel.

In [7]:
from multiprocess import cpu_count
print("number of CPU cores:", cpu_count())

number of CPU cores: 8


In [8]:
from functools import partial
from multiprocessing.pool import Pool

def multi_processes_download():
    ts = time()
    download_dir = setup_download_dir()
    links = [l for l in get_links(CLIENT_ID)]

    # functools.partial makes a new version of a function 
    # with one or more fixed arguments
    download = partial(download_link, download_dir)
   
    with Pool(8) as p:  # 8 = number of cores; should be <= your core
        p.map(download, links)
        
    print('Took {}s'.format(time() - ts))

multi_processes_download()

Took 1.7109696865081787s


---

Although easy to implement, the parallelism bears some drawbacks:
- each process contains **a copy of the entire memory**
- it does not handle processes that depend on each other

Those issues can be tackled by shared memory and message passing mechanisms, which we will learn from later lessons.

## Using Threads

Threading is a well known approach to attaining concurrency: 
- typically threads are lighter weight than processes
- **lower memory requirements**, as **they share the same memory space**

A basic way to use threads is through `ThreadPoolExecutor` in `concurrent.futures`, which provides a similar interface to `multiprocessing.Pool`.

For more refined behavior will rely on the `Thread` class, which provides a `run` method that should be overridden with a method that does the actual work of the thread.

In [None]:
## Simple example with ThreadPoolExecutor

from functools import partial
from concurrent.futures import ThreadPoolExecutor

def multithreaded_download():
    ts = time()
    download_dir = setup_download_dir()
    links = [l for l in get_links(CLIENT_ID)]

    download = partial(download_link, download_dir)
   
    with ThreadPoolExecutor(max_workers=8) as ex:
        ex.map(download, links)
        
    print('Took {}s'.format(time() - ts))

multithreaded_download()

### Thread Safety

- Variables in the program are shared by all the threads and should not be accessed the way you would normally access a variable. One thread may change the variable while another thread is reading it, or worse, two threads may try to update the variable at the same time. 


- This is known as a **race condition**, it is one of the leading sources of errors in threaded programs and needs to be addressed properly.



- A way to deal with thread safety is using the __Queue Class__

In [1]:
# Understanding Queue 
from queue import Queue

def do_work(q):
    while not q.empty():
        item = q.get()
        print(str(item)) 
        q.task_done()    # this is important when combining Queue with Threads

q = Queue() # FIFO queue

for i in range(20):
    q.put(i)

do_work(q)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19


A simpler example before going back to the image downloader code

In [10]:
# in this example each thread prints an element of the queue

from time import sleep
from queue import Queue
from threading import Thread
import logging  

# set up a logger
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
logging.basicConfig(format='(%(threadName)-9s) %(message)s', level=logging.DEBUG)

def do_work(q):
    while True:   # queue implementation: 1st in 1st out
        item = q.get()
        logger.debug("e" + str(item) + ' ')
        print(str(item) + ' ')
        q.task_done()
        sleep(2)
    
q = Queue()
num_threads = 10

for i in range(num_threads):   
    worker = Thread(target=do_work, args=(q,), name='thread_' + str(i))
    worker.setDaemon(True) # this stop the threads when the program quits  
    worker.start()         # start the threads

# now we have started 10 threads:

for i in range(50):  # doing 50 task with 10 threads. The threads shouldn't have same order
    q.put(i)

q.join() # wait until all threads have finished

(thread_2 ) e0 
(thread_0 ) e1 
(thread_3 ) e2 
(thread_5 ) e3 
(thread_1 ) e4 
(thread_6 ) e5 
(thread_7 ) e6 
(thread_8 ) e7 
(thread_4 ) e8 
(thread_9 ) e9 


0 
1 
2 
3 
4 
5 
6 
7 
8 
9 


(thread_1 ) e10 
(thread_0 ) e11 
(thread_3 ) e12 
(thread_7 ) e13 
(thread_2 ) e14 
(thread_5 ) e15 
(thread_6 ) e16 
(thread_8 ) e17 
(thread_4 ) e18 
(thread_9 ) e19 


10 
11 
12 
13 
14 
15 
16 
17 
18 
19 


(thread_1 ) e20 
(thread_0 ) e21 
(thread_3 ) e22 
(thread_7 ) e23 
(thread_2 ) e24 
(thread_5 ) e25 
(thread_4 ) e26 
(thread_9 ) e27 
(thread_6 ) e28 
(thread_8 ) e29 


20 
21 
22 
23 
24 
25 
26 
27 
28 
29 


(thread_1 ) e30 
(thread_0 ) e31 
(thread_3 ) e32 
(thread_7 ) e33 
(thread_2 ) e34 
(thread_5 ) e35 
(thread_4 ) e36 
(thread_9 ) e37 
(thread_6 ) e38 
(thread_8 ) e39 


30 
31 
32 
33 
34 
35 
36 
37 
38 
39 


(thread_1 ) e40 
(thread_0 ) e41 
(thread_3 ) e42 
(thread_7 ) e43 
(thread_2 ) e44 
(thread_5 ) e45 
(thread_4 ) e46 
(thread_9 ) e47 
(thread_6 ) e48 
(thread_8 ) e49 


40 
41 
42 
43 
44 
45 
46 
47 
48 
49 


In [11]:
from queue import Queue
from threading import Thread

class DownloadWorker(Thread):
    def __init__(self, queue):
        super(DownloadWorker, self).__init__()
        self.queue = queue
    
    def run(self):
        while True:
            # Get the work from the queue and expand the tuple
            (directory, link) = self.queue.get()
            # call the function donwload_link (from download.py)
            download_link(directory, link)
            self.queue.task_done()

            
def threaded_download():
    ts = time()
    download_dir = setup_download_dir()
    links = [l for l in get_links(CLIENT_ID)]
    
    # Create a queue to communicate with the worker threads
    queue = Queue()
    
    # Create 8 worker threads
    for _ in range(8):
        worker = DownloadWorker(queue)
        # Setting daemon to True will let the main thread exit 
        # even if the workers are blocking
        worker.daemon = True
        worker.start()

    
    # Put the tasks into the queue as a tuple
    for link in links:
        print('Queueing: {}'.format(link))
        queue.put((download_dir, link))
    
    # Causes the main thread to wait for the queue to finish processing all the tasks
    queue.join()
    
    print('Took {}s'.format(time() - ts))

threaded_download()

(MainThread) Starting new HTTPS connection (1): api.imgur.com:443
(MainThread) https://api.imgur.com:443 "GET /3/gallery/random/random/ HTTP/1.1" 200 21997
(Thread-23) Starting new HTTPS connection (1): i.imgur.com:443
(Thread-27) Starting new HTTPS connection (1): i.imgur.com:443
(Thread-26) Starting new HTTPS connection (1): i.imgur.com:443
(Thread-30) Starting new HTTPS connection (1): i.imgur.com:443
(Thread-28) Starting new HTTP connection (1): i.imgur.com:80
(Thread-25) Starting new HTTPS connection (1): i.imgur.com:443
(Thread-24) Starting new HTTPS connection (1): i.imgur.com:443
(Thread-29) Starting new HTTPS connection (1): i.imgur.com:443
(Thread-28) http://i.imgur.com:80 "GET /24oiWvMh.gif HTTP/1.1" 301 0
(Thread-28) Starting new HTTPS connection (1): i.imgur.com:443


Queueing: https://i.imgur.com/24nyADh.jpg
Queueing: https://i.imgur.com/24pWw.jpg
Queueing: https://i.imgur.com/24arB.jpg
Queueing: https://i.imgur.com/247syip.jpg
Queueing: https://i.imgur.com/24fmdAy.gif
Queueing: https://i.imgur.com/24wUPcI.jpg
Queueing: http://i.imgur.com/24oiWvMh.gif
Queueing: https://i.imgur.com/24inH.gif
Queueing: http://i.imgur.com/24jkqkjh.gif
Queueing: http://i.imgur.com/24cqIGgh.gif
Queueing: https://i.imgur.com/2420PVx.png
Queueing: https://i.imgur.com/24E9yfU.png
Queueing: https://i.imgur.com/24vhZSE.jpg
Queueing: http://i.imgur.com/24vCA0Dh.gif
Queueing: https://i.imgur.com/249ocqC.jpg
Queueing: https://i.imgur.com/24gu7yX.jpg
Queueing: https://i.imgur.com/24xbFz0.jpg


(Thread-27) https://i.imgur.com:443 "GET /247syip.jpg HTTP/1.1" 200 45570
(Thread-23) https://i.imgur.com:443 "GET /24fmdAy.gif HTTP/1.1" 200 9050688
(Thread-30) https://i.imgur.com:443 "GET /24wUPcI.jpg HTTP/1.1" 200 53006
(Thread-26) https://i.imgur.com:443 "GET /24arB.jpg HTTP/1.1" 200 377324
(Thread-28) https://i.imgur.com:443 "GET /24oiWvMh.gif HTTP/1.1" 200 116590
(Thread-25) https://i.imgur.com:443 "GET /24pWw.jpg HTTP/1.1" 200 474068
(Thread-24) https://i.imgur.com:443 "GET /24nyADh.jpg HTTP/1.1" 200 207131
(Thread-29) https://i.imgur.com:443 "GET /24inH.gif HTTP/1.1" 200 1714576
(Thread-27) Starting new HTTP connection (1): i.imgur.com:80
(Thread-27) http://i.imgur.com:80 "GET /24jkqkjh.gif HTTP/1.1" 301 0
(Thread-30) Starting new HTTP connection (1): i.imgur.com:80
(Thread-27) Starting new HTTPS connection (1): i.imgur.com:443
(Thread-28) Starting new HTTPS connection (1): i.imgur.com:443
(Thread-30) http://i.imgur.com:80 "GET /24cqIGgh.gif HTTP/1.1" 301 0
(Thread-30) Startin

Took 1.9802379608154297s


## The Global Interpreter Lock
#### Not really parallel !

- Python has a **Global Interpreter Lock (GIL)**, which allows only **one thread to be executed at a time** throughout this process. Therefore, **this code is concurrent but not parallel**. 

- The reason it is still faster is because the image downloader is an input/output (I/O) bound task. 
The majority of the time is spent waiting for the network. This is why threading can provide a large speed increase. 

- **The processor can switch between the threads** whenever one of them is **ready** to do some work.



- If the program was performing a task that was CPU bound, using the threading module in Python or any other interpreted language with a GIL could actually result in reduced performance.

- For CPU bound tasks and truly parallel execution in Python, the multiprocessing module is a better option.

- Some parallelism is still possible with threads if the executed functions rely on low-level code that realeases the GIL (e.g. many Numpy/Scipy functions). This includes custom Cython programs (see the `nogil` keyword [here](https://cython.readthedocs.io/en/latest/src/userguide/parallelism.html) and [here](https://cython.readthedocs.io/en/latest/src/userguide/numpy_tutorial.html))

- Other packages for parallelization: task/job queues (e.g. [python-rq](https://python-rq.org/)), [joblib](https://joblib.readthedocs.io/en/latest/parallel.html), [dask](https://dask.org/)


### Example: sum of array elements in parallel

In [None]:
n = int(1e8)

In [None]:
# Sequential version
from time import time

ts = time()
s = 0
for i in range(n):
    s = s + i
print(s, '-->', time()-ts,'s')   

In [None]:
# multiprocessing version
from time import time
from multiprocessing.pool import Pool

from download import sum_multi_processes_1, sum_multi_processes_2

def sum_multi_processes_1_(chunk):
    y = 0
    for i in chunk:
        y = y + i
    return y


def sum_multi_processes_2_(start, end):
    y = 0
    for i in range(start, end):  # range is faster than list iteration
        y = y + i
    return y

chunks1 = [list(range(i,i + 100)) for i in range(0, n, 100)]
chunks2 = [(i,i + 100) for i in range(0, n, 100)]

print(len(chunks1), 'chunks')

ts = time()
with Pool(8) as p:
     results = p.map(sum_multi_processes_1, chunks1)
#     results = p.starmap(sum_multi_processes_2, chunks2)

print(sum(results), '-->', time()-ts,'s')   

In [17]:
# Thread version
from queue import Queue
from threading import Thread
from threading import Lock

x = 0
lock = Lock()
def sum_chunk(q):
    while True:
        global x
        start, end = q.get()
        for i in range(start, end):
            with lock:  # force synchronization
                x = x + i
        q.task_done()

n = int(1e8)
chunks = [(i, i + 100) for i in range(0, n, 100)]

ts = time()
q = Queue()
num_threads = 10

for i in range(num_threads):
    worker = Thread(target=sum_chunk, args=(q, ))
    worker.setDaemon(True) # this stop the threads when the program quits  
    worker.start()         # start the threads

for chunk in chunks:
    q.put(chunk)

q.join()
print(x, '-->', time() - ts, 's')    

4999999950000000 --> 38.928231954574585 s


### Example: Pi Simulation

In [None]:
from download import monte_carlo_pi
import numpy as np

def monte_carlo_pi_(n):
    s = 0
    for i in range(n):
        x = np.random.uniform(0, 1)
        y = np.random.uniform(0, 1)
        if (x**2 + y**2) < 1:
            s += 1
    return 4*s/n

In [None]:
%%time
result = [monte_carlo_pi(int(3e5)) for _ in range(10)]

In [None]:
np.array(result)

In [None]:
from multiprocessing.pool import Pool

In [None]:
%%time
with Pool(8) as pool:
    result = pool.map(monte_carlo_pi, [int(3e5) for _ in range(10)])

In [None]:
np.array(result)