# High-Performance Python

## Objectives

- Describe basic components of a computer
- Describe basic components of an operating system (OS)
- State difference between processes & threads
- List issues involved in parallelizing computation

## Multi-Processing vs. Multi-Threading

Q: What is the difference between *multi-processing* and *multi-threading*?
- Multi-threading (also known as concurrency) splits the work between different threads running on the same processor. 
- When one thread is blocked the processor works on the tasks for the next one.
- Multi-processing splits work across processes running on different processors or even different machines.
- Multi-threading works better if you need to exchange data between the threads. 
- Multi-processing works better if the different processes do not need to pass much data to each other.

### Pop Quiz

<details>
<summary>Q: I have to process a very large dataset and run it through a CPU-intensive algorithm. Should I use multi-processing or multi-threading to speed it up?</summary>
A: Multi-processing will produce a result faster. This is because it will be able to split the work across different processors or machines.
</details>

<details>
<summary>Q: I have a web scraping application that spends most of its time waiting for web servers to respond. Should I use multi-processing or multi-threading to speed it up?
</summary>
A: Multi-threading will produce a bigger payoff. This is because it will ensure that the CPU is fully utilized and does not waste time blocked on input.
</details>

### Analogies

Multi-Threading | Multi-Processing
---|---
Laundromat | Everyone has a washer-dryer
Uber or Carpool | Everyone has a car

## Multi-Threading

Let's write a multi-threaded program that prints `"hello"` in different threads.

- Import `threading`

In [1]:
import threading

- Define a "print after delay" function.

In [2]:
from time import sleep

def print_with_delay(d, x):
    sleep(d)
    print(x)

- Create threads for printing.

In [12]:
t1 = threading.Thread(target = print_with_delay,
                      args = (5, 'hello with delay 5'))
t2 = threading.Thread(target = print_with_delay,
                      args = (2, 'hello with delay 2'))
t3 = threading.Thread(target = print_with_delay,
                      args = (3, 'hello with delay 3'))

- Start the threads.

In [13]:
t1.start()
print('{} started'.format(t1.name))
t2.start()
print('{} started'.format(t2.name))
t3.start()
print('{} started'.format(t3.name))


Thread-7 started
Thread-8 started
Thread-9 started


- Wait for threads to finish.

In [14]:
threading.currentThread().getName()

'MainThread'

In [15]:
print(threading.currentThread().getName())

t1.join()
print('{} finished'.format(t1.name))
t2.join()
print('{} finished'.format(t2.name))
t3.join()
print('{} finished'.format(t3.name))

MainThread
hello with delay 2
hello with delay 3
hello with delay 5
Thread-7 finished
Thread-8 finished
Thread-9 finished


In [16]:
t1.name

'Thread-7'

What if our function returned something instead of printing?

In [17]:
def count_string(string):
    return len(string)

In [18]:
t1 = threading.Thread(target = count_string,
                      args = ("here's a string",))
t2 = threading.Thread(target = count_string,
                      args = ("here's another",))
t3 = threading.Thread(target = count_string,
                      args = ("watch out for a third",))

In [19]:
for thread in [t1, t2, t3]:
    thread.start()

In [20]:
for thread in [t1, t2, t3]:
    thread.join()

Nothing. The output of the functions was `return`ed into the ether. So let's set up some data structure to keep our results in, and the functions should explicitly populate that data structure. 

In [52]:
def count_and_store(string, results_container):
    results_container.append(len(string+string.upper()))

In [53]:
results = []

t1 = threading.Thread(target = count_and_store,
                      args = ("here's a string"*1000000, results))
t2 = threading.Thread(target = count_and_store,
                      args = ("here's another"*1000000, results))
t3 = threading.Thread(target = count_and_store,
                      args = ("watch out for a third", results))

In [54]:
for thread in [t1, t2, t3]:
    thread.start()
print(results)    
for thread in [t1, t2, t3]:
    thread.join()
print(results)

[30000000, 42]
[30000000, 42, 28000000]


In [35]:
results

[150000, 140000, 21]

In [36]:
for thread in [t1, t2, t3]:
    thread.join()

In [37]:
results

[150000, 140000, 21]

Instead of populating a list, you may find it safer to have a database and let each function populate the databse.

### Multi-Processing

Let's calculate the word count of strings using multi-processing.

In [55]:
import multiprocessing
multiprocessing.cpu_count()

4

In [56]:
from multiprocessing import Pool

- Define how to count words in a string.

In [57]:
def word_count(string):
    return len(string.split())

- Define counting words sequentially.

In [58]:
def sequential_word_count(strings):
    return sum([word_count(string) for string in strings])

- First, here's the multi-threaded ("concurrent") version

In [59]:
# each thread will execute this function, which counts words & appends the
# result to the specified list
def thread_word_count(string, results_container):
    results_container.append(word_count(string))
    
# this function creates a thread for each string
# in strings, then sums the results when they've
# all finished executing
def concurrent_word_count(strings):
    threads = []
    thread_results = []
    for string in strings:
        thread = threading.Thread(
            target = thread_word_count,
            args = (string, thread_results))
        threads.append(thread)
        
    for thread in threads:
        thread.start()
        
    for thread in threads:
        thread.join()
        
    return sum(thread_results)

- Here's the truly parallel (multiprocessing) version

In [77]:
def parallel_word_count(strings):
    pool = Pool(processes = 4)
    results = pool.map(word_count, strings)
    #print(type(results))
    #print(results)
    return sum(results)

In [78]:
parallel_word_count(['nice hat', 
                     'for a clown to wear',
                     'to the circus. idiot.']*100)

1100

How does this work? What is `pool.map`?

In [79]:
Pool?

[0;31mSignature:[0m [0mPool[0m[0;34m([0m[0mprocesses[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0minitializer[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0minitargs[0m[0;34m=[0m[0;34m([0m[0;34m)[0m[0;34m,[0m [0mmaxtasksperchild[0m[0;34m=[0m[0;32mNone[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Returns a process pool object
[0;31mFile:[0m      ~/anaconda3/lib/python3.7/multiprocessing/context.py
[0;31mType:[0m      method


In [80]:
Pool().map?

[0;31mInit signature:[0m [0mmap[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
map(func, *iterables) --> map object

Make an iterator that computes the function using arguments from
each of the iterables.  Stops when the shortest iterable is exhausted.
[0;31mType:[0m           type
[0;31mSubclasses:[0m     


In [None]:
Pool().map

In [81]:
Pool().map

<bound method Pool.map of <multiprocessing.pool.Pool object at 0x7f5b29f13438>>

- Well, perhaps you've seen the built-in function `map`. It takes a function and an iterable, and applies the function to each element of that iterable.

In [82]:
map?

[0;31mInit signature:[0m [0mmap[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
map(func, *iterables) --> map object

Make an iterator that computes the function using arguments from
each of the iterables.  Stops when the shortest iterable is exhausted.
[0;31mType:[0m           type
[0;31mSubclasses:[0m     


In [83]:
def dum_fun(x):
    return int(x**3.2)

In [84]:
numbers = range(10)
print(list(numbers))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [85]:
map(dum_fun, numbers)

<map at 0x7f5b29f0e0b8>

In [86]:
list(map(dum_fun, numbers))

[0, 1, 9, 33, 84, 172, 309, 506, 776, 1131]

`pool.map` works in a similar way: it takes a function and an iterable, and it splits up the job of applying the function to each element across all available processors, to be executed in parallel.

Notice that our `word_count` function takes in a string a returns an `int`, and that `pool.map(word_count, strings)` returned a **list** of `int`s: all of the results got collected into one big list.

### Let's time the three versions above 
- Create a sample input.

In [99]:
strings = [
    'hello world',
    'this is another line',
    'this is yet another line'] * 10000000

- Time each one

In [100]:
%time print(sequential_word_count(strings))

110000000
CPU times: user 10.5 s, sys: 196 ms, total: 10.7 s
Wall time: 10.7 s


In [97]:
#%time print(concurrent_word_count(strings))

1100000
CPU times: user 14.8 s, sys: 8.67 s, total: 23.5 s
Wall time: 17.7 s


In [101]:
%time print(parallel_word_count(strings))

110000000
CPU times: user 2.39 s, sys: 298 ms, total: 2.68 s
Wall time: 7.07 s


### Pop Quiz

<details>
<summary>Q: Between sequential, parallel, and concurrent, which one is the fastest? Which one is the slowest? Why?</summary>
1. Parallel is the fastest. Sequential is second.  Concurrent is the slowest.
<br/>
2. Concurrent and parallel have a higher setup overhead. This is not recovered for small problems.
<br/>
3. Use these only if your processing takes longer than the setup overhead.
</details>

### Cleaning Up Zombie Python Processes

Here is how to kill all the processes that `multiprocessing` will bring up in the background.

```sh
ps ux | grep ipykernel | grep -v grep | awk '{print $2}' | xargs kill -9
```