![Py4Eng](img/logo.png)

# Concurrency
## Yoav Ram

# Threading

https://learn-gevent-socketio.readthedocs.org/en/latest/general_concepts.html

Thread are very useful for maintaining several program flows running (quasi-)simultaneously: but in Python, they are really running in the same process so the main benefit from threading is that one job doesn't block other jobs from running simultaneously.

If some of the jobs are doing I/O we can gain benefit async programming - see relevant [session](async.ipynb).

Let's start with a simple example: a worker thread that counts from 1 to 10, waiting one second between numbers, but doesn't block the main thread that counts from 11 to 20 (also waiting). 

We use the [threading](https://docs.python.org/3.5/library/threading.html) module from the standard library.

In [1]:
import threading
import time

In [2]:
def create_counting_task(start, end):
    def task():
        for i in range(start, end):
            print(" ", i, end=" ")
            time.sleep(1)
    return task

In [3]:
main_task = create_counting_task(11, 21)
worker_task = create_counting_task(1, 11)
worker = threading.Thread(target=worker_task)
worker.start()
main_task()
worker.join()

  1   11   2   12   3   13   4   14   5   15   6   16   7   17   8   18   9   19   10   20 

## Communicating with queues


https://docs.python.org/3/library/queue.html

In [1]:
import threading
import queue
import collections
import urllib.request
import time

In [2]:
stop_words_url = 'https://github.com/Alir3z4/stop-words/raw/25c6a0aea665871e887f155b883e950c3743ce50/english.txt'
with urllib.request.urlopen(stop_words_url) as f:
    stop_words = [line.decode().strip() for line in f]
print(stop_words[:3])

['a', 'about', 'above']


In [3]:
urls = {
    'Gulliver':'https://raw.githubusercontent.com/yoavram/Py4Eng/master/data/gulliver.txt',
    'Alice in Wonderland':'https://raw.githubusercontent.com/yoavram/Py4Eng/master/data/alice.txt',
    'Pride and prejudice':'https://www.gutenberg.org/cache/epub/1342/pg1342.txt',
    'Yellow wallpaper':'https://www.gutenberg.org/cache/epub/1952/pg1952.txt',
    'Metamorphosis ':'https://www.gutenberg.org/cache/epub/5200/pg5200.txt',
    'A Tale of Two Cities':'https://www.gutenberg.org/ebooks/98.txt.utf-8',
    'The Importance of Being Earnest':'https://www.gutenberg.org/ebooks/844.txt.utf-8',
    'Frankenstein':'https://www.gutenberg.org/ebooks/84.txt.utf-8'
}

def most_common_word(name, url):
    with urllib.request.urlopen(url) as f:        
        counter = collections.Counter()
        for line in f:        
            if not line:
                break
            line = line.decode().lower()
            counter.update(line.split())
        for word in ['the', 'a', 'is', 'of', 'and', 'to', 'i', 'in']:
            counter.pop(word)
        word, appearances = counter.most_common(1)[0]
        return 'Most common word in {} is "{}" ({} appearances)'.format(name, word, appearances)

In [None]:
tic = time.time()

for name, url in urls.items():
    print(most_common_word(name, url))
    break
    
toc = time.time()
print("Elapsed time: {:.2f} seconds".format(toc - tic))

In [4]:
def task():
    while True:
        item = q.get()
        if item is None:
            break
        name, url = item
        print(most_common_word(name, url))
        q.task_done()

In [None]:
tic = time.time()

num_workers = 3
q = queue.Queue()
workers = []

for i in range(num_workers):
    w = threading.Thread(target=task)
    w.start()
    workers.append(w)

for item in urls.items():
    q.put(item)
    
# block until all tasks are done
q.join()

# stop workers
for i in range(num_workers):
    q.put(None)
for w in workers:
    w.join()
    
toc = time.time()
print("Elapsed time: {:.2f} seconds".format(toc - tic))

# Multi-processing

https://docs.python.org/2/library/multiprocessing.html

In [5]:
from multiprocessing import Pool

In [6]:
pool = Pool(3)
pool.close()

In [None]:
tic = time.time()

pool = Pool(3)
def f(item):
    most_common_word(*item)
results = pool.map(f, urls.items())
for r in results:
    print(r)
pool.close()

toc = time.time()
print("Elapsed time: {:.2f} seconds".format(toc - tic))