# Examples on parallel processing with Python

Tested on `dev2023a` conda environment, see [here](https://github.com/vvoutilainen/dsenvs/blob/main/condaenv.md).

## Backround on parallel computing

A [process](https://en.wikipedia.org/wiki/Process_(computing)) is the instance of a computer program that is being executed by one or many threads. A [thread](https://en.wikipedia.org/wiki/Thread_(computing)), is the smallest sequence of programmed instructions within a *process*. *Multithreading* allows for parallel execution of tasks.

A useful analogy: cores are like chefs in the kitchen and threads are meal orders. The more chefs you have, the more orders you can prepare at a given time. Meanwhile, many chefs are not necessary when there is not a large number of orders to process. See [here](https://ioflood.com/blog/what-are-cpu-threads-cores-vs-threads-explained/).

Modern computers often have a [Central Processing Unit](https://en.wikipedia.org/wiki/Central_processing_unit) (CPU) that is a multi-core processor. This is a single piece of hardware that provides several processing units, also called *cores*. Multiple cores can work concurrently. Intal cores use [hyperthreading](https://en.wikipedia.org/wiki/Hyper-threading), a system that essentially splits the pysical core into two *logical cores* (a.k.a. *virtual cores* or *threads*) and shares workloads between the two. See [this](https://superuser.com/a/168813/1679577).

## Parallelization in Python

### General

*Global Interprete Lock* (GIL, from [here](https://docs.python.org/3.10/glossary.html#term-global-interpreter-lock)): "The mechanism used by the CPython interpreter to assure that only one thread executes Python bytecode at a time. This simplifies the CPython implementation by making the object model (including critical built-in types such as dict) implicitly safe against concurrent access. **Locking the entire interpreter makes it easier for the interpreter to be multi-threaded, at the expense of much of the parallelism afforded by multi-processor machines**."

[threading](https://docs.python.org/3.10/library/threading.html#module-threading) package allows to leverage multithreading. However, in order to make better use of the computational resources of multi-core machines, on is advised to use [multiprocessing](https://docs.python.org/3.10/library/multiprocessing.html) packge. *multiprocessing* side-steps the Global Interpreter Lock by using subprocesses instead of threads (however, it also provdies threading via `ThreadPool` instance) via, e.g., the `Pool` instance. `Pool` offers a convenient means of parallelizing the execution of a function across multiple input values, distributing the input data across processes.

Multithreading might not be that useful (see comments [here](https://stackoverflow.com/q/70700809/7037299)). The Global Instruction Lock means that only one thread can use the Python interpreter at a time. Multithreading works best when your threads are waiting for external resources, not when needing data-parallelism.

### Interactive interpreters like in Jupyter notebooks

Functionality within *multiprocessing* package requires that the [main](https://docs.python.org/3.10/library/__main__.html) module module be importable by the children (new Python interpreters).  This means that using some of the package assets, like `Pool`, might not work in interactive interpreters, e.g., in Jupyter notebooks (see [this](https://docs.python.org/3.10/library/multiprocessing.html#using-a-pool-of-workers)). This can be circumvented by defining functions needing parallelization in separate .py files. 

In any case, to play it safe one should protect the “entry point” of the main program by using `if __name__ == '__main__'`. See [here](https://docs.python.org/3.10/library/multiprocessing.html#the-spawn-and-forkserver-start-methods).

In [1]:
import psutil
import numpy as np
from tests import(
    test_map_pool,
    test_map_threadpool,
    test_loop,
    test_loop_pool,
)

In [2]:
print(
    "{} cores, {} when counting logical cores.".format(
        psutil.cpu_count(logical=False),
        psutil.cpu_count(),
    )
)

4 cores, 8 when counting logical cores.


In [3]:
if __name__ == '__main__':
    job_list = [range(10000000)]*64
    test_map_pool(job_list, pool=4)
    test_map_threadpool(job_list, pool=4)

------------------------------------------------------------
Pool map
Time: 7.13
------------------------------------------------------------
------------------------------------------------------------
ThreadPool map
Time: 25.18
------------------------------------------------------------


In [4]:
if __name__ == "__main__":
    values_to_compare = np.random.randint(low=0, high=20, size=[20000, 5]).tolist()
    values_compared_against = np.random.randint(low=0, high=20, size=[20000, 5]).tolist()
    out1 = test_loop(values_to_compare, values_compared_against)
    out2 = test_loop_pool(values_to_compare, values_compared_against, pool=4)
    print("Lists agree? {}".format(out1==out2))

------------------------------------------------------------
test_loop
Time: 11.38
------------------------------------------------------------
------------------------------------------------------------
test_loop_pool
Time: 4.29
------------------------------------------------------------
Lists agree? True
