# Lesson 04: Concurrency & Parallelism

## 1. Parallelism

Compared to concurrency, parallelism is easier to use, and is _usually_ easier to think about and design.

We can explore this through a data-processing scenario: going through a very large CSV/JSON doc and filtering out columns or keys.
This is usually done when selecting a relevant subset of data from a very broad data set, usually sourced from a 3rd party.

e.g. You want to download the wikipedia dataset and filter for actors, movie titles and release years, so that you can make a simple and comprehensive list.

e.g.2 You want to make a demo example for this lesson, so you have to fake the data before demonstrating the filtering

### Example 1: Generating a text file

1. We need to generate a _very_ large NDJSON file (newline-delimited JSON). For simplicities sake, all lines are readable/same schema etc.

    ```json
    {"a": "B"}
    {"a": "C"}
    ```

1.1. There are many language and OS-level optimisations around doing the _exact_ same thing, like performing the same calculation over the same file line data. This means that we have to randomise the values in order to make a good test file.

    > use `faker`

In [3]:
from faker import (Faker, providers)
F = Faker()
F.add_provider(providers.misc)
F.add_provider(providers.geo)

In [4]:
def fkr_n(fkr, n): return [fkr() for _ in range(n)]

In [5]:
def gen_movie(f=F):
    return {
        "titleId":         f.uuid4(),
        "ordering":        f.random_int(),
        "title":           f.catch_phrase(),
        "region":          f.locale(),
        "language":        f.language_name(),
        "types":           fkr_n(f.name, 5),
        "attributes":      fkr_n(f.name, 5),
        "isOriginalTitle": f.boolean(),
        "tconst":          f.uuid4(),
        "titleType":       f.domain_name(),
        "primaryTitle":    f.catch_phrase(),
        "originalTitle":   ":".join([f.company(), f.catch_phrase()]),
        "isAdult":         f.boolean(),
        "startYear":       f.date(),
        "endYear":         f.year(),
        "runtimeMinutes":  f.random_int(),
        "genres":          fkr_n(f.country, 5),
        "tconst":          f.hex_color(),
        "directors":       fkr_n(f.name, 2),
        "writers":         fkr_n(f.name, 15),
        "actors":          fkr_n(f.name, 50),
    }

In [6]:
import json
from IPython.display import JSON
JSON(gen_movie(F))

<IPython.core.display.JSON object>

Woohoo! Now we just need to write this to a file

Lets make a function that loops and `yields` data

In [7]:
class MovieTable:
    def records(fpath, n_records=10):
        print(f"Writing {n_records} records to {fpath}")
        with open(fpath, "w") as ostream:
            for line in MovieTable.iter(n_records):
                print(line, file=ostream)

    def iter(n_records=10):
        for _ in range(n_records):
            yield gen_movie()

In [8]:
%timeit MovieTable.records("/tmp/movies.ndjson", 20)

Writing 20 records to /tmp/movies.ndjson
Writing 20 records to /tmp/movies.ndjson
Writing 20 records to /tmp/movies.ndjson
Writing 20 records to /tmp/movies.ndjson
Writing 20 records to /tmp/movies.ndjson
Writing 20 records to /tmp/movies.ndjson
Writing 20 records to /tmp/movies.ndjson
Writing 20 records to /tmp/movies.ndjson
294 ms ± 68.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


But I need to show you the CPU usage _per core_!

What about threading?

> Use psutil

In [34]:
import psutil
def cpu(*args):
    print("\t".join(map(str, psutil.cpu_percent(percpu=True))))

In [10]:
import threading
import time

def loop_with_timer():
    timer = threading.Timer(1, cpu, args=None, kwargs=None)
    timer.start()

    # Actually do stuff here
    for i in range(5):
        print(i)
        time.sleep(1)

    timer.cancel()

loop_with_timer()

0
3.9	2.0	2.0	7.1	2.0	2.0
1
2
3
4


Huh? The threading. Timer only printed once?!

- The "interval" of the timer is more like a "delay"   
- from the docs:
    ```
    "Create a timer that will run function with arguments args and keyword arguments kwargs, 
    after interval seconds have passed."
    ```

In [11]:
# Shout out -> https://stackoverflow.com/a/48741004
class RepeatTimer(threading.Timer):
    def run(self):
        while not self.finished.wait(self.interval):
            self.function(*self.args, **self.kwargs)

def loop_with_timer():
    timer = RepeatTimer(2, cpu, args=None, kwargs=None)
    timer.start()

    for i in range(5):
        print(i)
        time.sleep(1)

    timer.cancel()

loop_with_timer()

0
1
8.1	6.9	6.6	5.4	5.6	7.8
2
3
1.5	3.5	3.5	3.5	3.5	2.0
4


Hmmm, there's a downside here -   
In every function that we want to track CPU usage for, we have to add all of this threading code -_-

However - we could make a decorator function that would allow anything to be timed just by adding a @decoration

In [12]:
import functools

def with_cpu(func):
    'Looks gross, but you only have to write it once!'

    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        timer = RepeatTimer(1, cpu, args=None, kwargs=None)
        timer.start()

        result = func(*args, **kwargs)

        timer.cancel()
        print("fin!")
        return result
    return wrapper

Now, we can just add `@with_cpu` to any function that we need to investigate

In [13]:
@with_cpu
def poc():
    for i in range(5):
        print(i)
        time.sleep(1)

In [14]:
poc()

0
1
2.9	1.9	2.3	5.2	1.9	1.0
2
2.0	0.0	7.1	3.0	11.0	3.0
3
6.1	1.0	1.0	0.0	2.0	1.0
4
3.0	0.0	1.0	1.0	0.0	1.0
fin!


Let's add the decorator to our `MovieTable.records` method

In [15]:
import math
class MovieTable:
    @with_cpu
    def records(fpath, n_records=10):
        print(f"Writing {n_records} records to {fpath}")
        with open(fpath, "w") as ostream:
            for i, line in enumerate(MovieTable.iter(n_records)):
                print(line, file=ostream)

    def iter(n_records=10):
        for _ in range(n_records):
            yield gen_movie()

In [16]:
MovieTable.records("/tmp/movies.ndjson", 500)

Writing 500 records to /tmp/movies.ndjson
1.4	6.2	2.9	7.7	45.0	1.5
1.0	1.0	0.0	5.9	100.0	0.0
0.0	1.0	2.9	2.0	100.0	0.0
1.0	1.0	0.0	5.9	100.0	1.0
0.0	1.0	2.9	3.0	99.0	1.0
1.0	2.9	1.0	5.9	100.0	0.0
fin!


Nice!

- Now we're running some code, an running another looping bit of code in a thread on the side.
- We can see that we are currently using 1 core

- what about `async`?

In [17]:
import json
import asyncio
from dataclasses import dataclass
import functools

import psutil

@dataclass
class Timer:
    f: object
    id: int = 1
    sentinel: bool = False

    def task(self):
        async def run():
            while not self.sentinel:
                self.f(self.id)
                await asyncio.sleep(1)
        return asyncio.create_task(run())

    async def stop(self):
        self.sentinel = True

async def run_with_timer(f: functools.partial, t: Timer):
    tsk = t.task()
    await f()
    tsk.cancel()

    try:
        await tsk
    except asyncio.CancelledError:
        print("finished")

In [18]:
class AMovieTable:
    async def records(fpath, n_records=10):
        print(f"Writing {n_records} records to {fpath}")
        with open(fpath, "w") as ostream:
            for _ in range(n_records):
                print(json.dumps(gen_movie()), file=ostream, end="\n")
                await asyncio.sleep(0)

In [19]:
import asyncio

def run_async(f):
    try:
        loop = asyncio.get_running_loop()
    except RuntimeError:  # 'RuntimeError: There is no current event loop...'
        loop = None

    if loop and loop.is_running():
        task = loop.create_task(f)
        task.add_done_callback(lambda _: print('fin!'))
    else:
        asyncio.run(f)

run_async(
    run_with_timer(functools.partial(AMovieTable.records, *("/tmp/movies.ndjson",500)), Timer(cpu))
)

Cool, cool. How could we do the same thing, but with a decorator?

In [28]:

def awith_cpu(func):
    @functools.wraps(func)
    async def wrapped(*args, **kwargs):
        tsk = Timer(cpu).task()
        result = await func(*args, **kwargs)
        tsk.cancel()
        try:
            await tsk
        except asyncio.CancelledError:
            print("fin!")
            return result
    return wrapped

In [21]:
class AMovieTable:
    @awith_cpu
    async def records(fpath, n_records=10):
        print(f"Writing {n_records} records to {fpath}")
        with open(fpath, "w") as ostream:
            for _ in range(n_records):
                print(json.dumps(gen_movie()), file=ostream, end="\n")
                await asyncio.sleep(0)

import asyncio

def run_async(f):
    try:
        loop = asyncio.get_running_loop()
    except RuntimeError:  # 'RuntimeError: There is no current event loop...'
        loop = None

    if loop and loop.is_running():
        task = loop.create_task(f)
        task.add_done_callback(lambda _: print('callback fin!'))
    else:
        asyncio.run(f)
run_async(AMovieTable.records("/tmp/movies.ndjson",500))

In [22]:
!wc -l "/tmp/movies.ndjson" && ls -alh "/tmp/movies.ndjson"

Writing 500 records to /tmp/movies.ndjson
86.7	3.6	24.1	19.4	17.2	20.0
4 /tmp/movies.ndjson
-rw-r--r-- 1 jovyan users 23K Nov 30 22:13 /tmp/movies.ndjson


In [23]:
from multiprocessing import Pool, Process

procs = []
for fpath in ["/tmp/movies.ndjson", "/tmp/movies2.ndjson"]:
    p = Process(target=MovieTable.records, args=(fpath,500))
    p.start()
    procs.append(p)

while True:
    if not any(p.is_alive() for p in procs):
        print("process fin!")
        break

Writing 500 records to /tmp/movies.ndjson
Writing 500 records to /tmp/movies2.ndjson
70.8	3.7	21.6	51.9	61.2	14.1
71.0	3.7	22.0	52.4	61.7	15.0
100.0	2.9	2.0	100.0	100.0	5.9100.0	2.9	1.0	100.0	100.0	4.1

100.0	5.8	2.9	100.0	99.0	9.1100.0	5.9	3.0	100.0	99.0	9.3

99.0	1.0	2.0	99.0	100.0	5.099.0	1.0	1.9	99.0	100.0	5.0

100.0	11.9	4.9	100.0	100.0	5.1100.0	11.9	4.9	100.0	100.0	5.1

99.0	4.9	2.0	100.0	99.0	0.0
99.0	5.8	1.9	100.0	99.0	0.0
100.0	10.1	3.9	99.0	100.0	5.1100.0	9.2	4.0	99.0	100.0	5.1

fin!
fin!
process fin!
94.1	6.6	6.7	88.5	91.4	6.7
0.0	0.0	0.0	0.0	0.0	0.0


In [24]:
def run_async_pool(args):
    run_async(AMovieTable.records(*args))

def _():
    with Pool(2) as p:
        p.map(run_async_pool, [("/tmp/movies.ndjson", 500), ("/tmp/movies2.ndjson", 500)])

In [25]:
# _()

### Logging

Don't forget about logging! As you can see, it's hard to tell what thread is producing any particular log line

- pass an ID/name/any metadata that you might need to the method in question, and ensure that it logs using it

In [44]:
def gen_fpath(fmt_s, i):
    return fmt_s.format(i)


def awith_cpu(func):
    @functools.wraps(func)
    async def wrapped(*args, **kwargs):
        tsk = Timer(functools.partial(cpu, args[0])).task()
        result = await func(*args, **kwargs)
        tsk.cancel()
        try:
            await tsk
        except asyncio.CancelledError:
            print("fin!")
            return result
    return wrapped

class AMovieTable:
    @awith_cpu
    async def records(id, fpath, n_records=10):
        print(f"Writing {n_records} records to {fpath}")
        with open(fpath, "w") as ostream:
            for _ in range(n_records):
                print(json.dumps(gen_movie()), file=ostream, end="\n")
                await asyncio.sleep(0)

def cpu(*args):
    print(args, "\t".join(map(str, psutil.cpu_percent(percpu=True))))


def cpu(*args):
    print(" - ".join([str(args), "\t".join(map(str, psutil.cpu_percent(percpu=True)))]))

# NPROCS = os.cpu_count()
NPROCS = 2

fpaths = []
for i in range(NPROCS):
    fpaths.append((i, f'/tmp/movies_{i}.ndjson', 500))

def _():
    with Pool(NPROCS) as p:
        p.map(run_async_pool, fpaths)

In [43]:
_()

Writing 500 records to /tmp/movies_0.ndjsonWriting 500 records to /tmp/movies_1.ndjson

(0,)-9.3	9.0	9.0	8.5	8.7	9.3(1,)-9.3	9.0	9.0	8.5	8.7	9.3

(1,)-11.5	9.3	83.2	5.9	6.2	98.9
(0,)-11.5	9.3	83.2	5.9	6.2	98.9
(1,)-64.8	7.4	100.0	3.7	6.4	36.1(0,)-64.8	7.4	100.0	3.7	6.5	36.1

(0,)-2.9	1.0	100.0	1.0	3.8	97.1(1,)-2.9	1.0	100.0	1.0	3.8	97.1

(0,)-2.9	5.8	99.0	3.9	2.9	100.0
(1,)-2.9	5.8	99.0	3.9	2.9	99.0
(1,)-1.0	4.8	100.0	0.0	0.0	100.0
(0,)-1.0	5.7	100.0	0.0	0.0	99.0
(1,)-44.2	7.8	55.2	1.9	2.8	99.0
(0,)-45.2	7.7	55.1	2.8	2.8	99.0
(1,)-100.0	36.3	7.7	6.1	7.9	100.0
(0,)-99.0	39.0	8.3	6.8	8.6	100.0
(1,)-99.0	20.2	5.7	2.0	1.9	99.0
(0,)-91.7	15.0	3.7	1.0	9.3	99.1
(1,)-3.7	7.4	2.8	1.9	100.0	99.1
(0,)-3.6	7.0	2.7	1.8	99.1	99.1
fin!
fin!


## Thoughts on the GIL

- Python process == 1 thread is analagous to microservices architecture (lambda)
- This is easy to reason about and to design
    - For a single thread, write code that ensures that the usage is saturated
    - If you need parallel, simply take this same code and run it on another core (multiprocessing)

In [26]:
os.cpu_count()

6

98.1	4.0	17.6	1.9	2.9	7.3
0.0	0.0	0.0	0.0	0.0	0.0
100.0	0.0	2.9	0.0	0.0	1.0
0.0	0.0	0.0	0.0	0.0	0.0
100.0	1.0	6.9	1.9	1.9	2.0
100.0	0.0	0.0	0.0	0.0	0.0
100.0	0.0	1.0	0.0	0.0	1.0
100.0	33.3	0.0	0.0	0.0	0.0
100.0	1.0	7.0	1.0	1.0	1.0
100.0	0.0	0.0	0.0	0.0	0.0
100.0	0.0	1.0	0.0	1.0	1.0
100.0	0.0	0.0	0.0	0.0	0.0
100.0	5.8	3.8	1.0	1.0	2.0
0.0	0.0	0.0	0.0	0.0	0.0
100.0	4.7	3.7	0.9	0.0	1.0
0.0	0.0	0.0	0.0	0.0	0.0
100.0	7.7	1.0	1.0	1.0	2.9
0.0	0.0	0.0	0.0	0.0	0.0
99.1	3.8	1.9	0.0	0.9	0.0
0.0	0.0	0.0	0.0	0.0	0.0
100.0	6.7	1.0	0.9	1.9	1.0
0.0	0.0	0.0	0.0	0.0	0.0
finished
fin!
fin!
callback fin!
