# High-Performance Python

## Objectives

- Describe basic components of a computer
- Describe basic components of an operating system (OS)
- State difference between processes & threads
- List issues involved in parallelizing computation

## Multi-Processing vs. Multi-Threading

Q: What is the difference between *multi-processing* and *multi-threading*?
- Multi-threading (also known as concurrency) splits the work between different threads running on the same processor. 
- When one thread is blocked the processor works on the tasks for the next one.
- Multi-processing splits work across processes running on different processors or even different machines.
- Multi-threading works better if you need to exchange data between the threads. 
- Multi-processing works better if the different processes do not need to pass much data to each other.

### Pop Quiz

<details>
<summary>Q: I have to process a very large dataset and run it through a CPU-intensive algorithm. Should I use multi-processing or multi-threading to speed it up?</summary>
A: Multi-processing will produce a result faster. This is because it will be able to split the work across different processors or machines.
</details>

<details>
<summary>Q: I have a web scraping application that spends most of its time waiting for web servers to respond. Should I use multi-processing or multi-threading to speed it up?
</summary>
A: Multi-threading will produce a bigger payoff. This is because it will ensure that the CPU is fully utilized and does not waste time blocked on input.
</details>

### Analogies

Multi-Threading | Multi-Processing
---|---
Laundromat | Everyone has a washer-dryer
Uber or Carpool | Everyone has a car

## Multi-Threading

Let's write a multi-threaded program that prints `"hello"` in different threads.

- Import `threading`

In [None]:
import threading

- Define a "print after delay" function.

In [None]:
from time import sleep

def print_with_delay(d, x):
    sleep(d)
    print(x)

- Create threads for printing.

In [None]:
t1 = threading.Thread(target = print_with_delay,
                      args = (5, 'hello with delay 5'))
t2 = threading.Thread(target = print_with_delay,
                      args = (2, 'hello with delay 2'))
t3 = threading.Thread(target = print_with_delay,
                      args = (3, 'hello with delay 3'))

- Start the threads.

In [None]:
t1.start()
print('{} started'.format(t1.name))
t2.start()
print('{} started'.format(t2.name))
t3.start()
print('{} started'.format(t3.name))


- Wait for threads to finish.

In [None]:
threading.currentThread().getName()

In [None]:
print(threading.currentThread().getName())

t1.join()
print('{} finished'.format(t1.name))
t2.join()
print('{} finished'.format(t2.name))
t3.join()
print('{} finished'.format(t3.name))

In [None]:
t1.name

What if our function returned something instead of printing?

In [None]:
def count_string(string):
    return len(string)

In [None]:
t1 = threading.Thread(target = count_string,
                      args = ("here's a string",))
t2 = threading.Thread(target = count_string,
                      args = ("here's another",))
t3 = threading.Thread(target = count_string,
                      args = ("watch out for a third",))

In [None]:
for thread in [t1, t2, t3]:
    thread.start()

In [None]:
for thread in [t1, t2, t3]:
    thread.join()

Nothing. The output of the functions was `return`ed into the ether. So let's set up some data structure to keep our results in, and the functions should explicitly populate that data structure. 

In [None]:
def count_and_store(string, results_container):
    results_container.append(len(string))

In [None]:
results = []

t1 = threading.Thread(target = count_and_store,
                      args = ("here's a string"*10000, results))
t2 = threading.Thread(target = count_and_store,
                      args = ("here's another"*10000, results))
t3 = threading.Thread(target = count_and_store,
                      args = ("watch out for a third", results))

In [None]:
results

In [None]:
for thread in [t1, t2, t3]:
    thread.start()

In [None]:
results

In [None]:
for thread in [t1, t2, t3]:
    thread.join()

In [None]:
results

Instead of populating a list, you may find it safer to have a database and let each function populate the databse.

### Multi-Processing

Let's calculate the word count of strings using multi-processing.

In [None]:
import multiprocessing
multiprocessing.cpu_count()

In [None]:
from multiprocessing import Pool

- Define how to count words in a string.

In [None]:
def word_count(string):
    return len(string.split())

- Define counting words sequentially.

In [None]:
def sequential_word_count(strings):
    return sum([word_count(string) for string in strings])

- First, here's the multi-threaded ("concurrent") version

In [None]:
# each thread will execute this function, which counts words & appends the
# result to the specified list
def thread_word_count(string, results_container):
    results_container.append(word_count(string))
    
# this function creates a thread for each string
# in strings, then sums the results when they've
# all finished executing
def concurrent_word_count(strings):
    threads = []
    thread_results = []
    for string in strings:
        thread = threading.Thread(
            target = thread_word_count,
            args = (string, thread_results))
        threads.append(thread)
        
    for thread in threads:
        thread.start()
        
    for thread in threads:
        thread.join()
        
    return sum(thread_results)

- Here's the truly parallel (multiprocessing) version

In [None]:
def parallel_word_count(strings):
    pool = Pool(processes = 4)
    results = pool.map(word_count, strings)
    return sum(results)

In [None]:
parallel_word_count(['nice hat', 
                     'for a clown to wear',
                     'to the circus. idiot.']*100)

How does this work? What is `pool.map`?

In [None]:
Pool?

In [None]:
Pool().map?

- Well, perhaps you've seen the built-in function `map`. It takes a function and an iterable, and applies the function to each element of that iterable.

In [None]:
map?

In [None]:
def dum_fun(x):
    return int(x**3.2)

In [None]:
numbers = range(10)
print(list(numbers))

In [None]:
map(dum_fun, numbers)

In [None]:
list(map(dum_fun, numbers))

`pool.map` works in a similar way: it takes a function and an iterable, and it splits up the job of applying the function to each element across all available processors, to be executed in parallel.

Notice that our `word_count` function takes in a string a returns an `int`, and that `pool.map(word_count, strings)` returned a **list** of `int`s: all of the results got collected into one big list.

### Let's time the three versions above 
- Create a sample input.

In [None]:
strings = [
    'hello world',
    'this is another line',
    'this is yet another line'] * 100

- Time each one

In [None]:
%time print(sequential_word_count(strings))

In [None]:
%time print(concurrent_word_count(strings))

In [None]:
%time print(parallel_word_count(strings))

### Pop Quiz

<details>
<summary>Q: Between sequential, parallel, and concurrent, which one is the fastest? Which one is the slowest? Why?</summary>
1. Parallel is the fastest. Sequential is second.  Concurrent is the slowest.
<br/>
2. Concurrent and parallel have a higher setup overhead. This is not recovered for small problems.
<br/>
3. Use these only if your processing takes longer than the setup overhead.
</details>

### Cleaning Up Zombie Python Processes

Here is how to kill all the processes that `multiprocessing` will bring up in the background.

```sh
ps ux | grep ipykernel | grep -v grep | awk '{print $2}' | xargs kill -9
```