# Chapter 19: Concurrency Models in Python

> Concurrency is about dealing with lots of things at once.
> 
> Parallelism is about doing lots of things at once.
> 
> Not the same, but related.
> 
> One is about structure, one is about execution.
> 
> Concurrency provides a way to structure a solution to solve a problem that may (but not necessarily) be parallelizable.
> 
> -- Rob Pike, co-creator of Go

## A Bit of Jargon

- Concurrency: The ability to handle multiple pending tasks, making progress one at a time or in parallel(if possible) so that each of them eventually succeeds or fails. A single-core CPU is capable of concurrency if it runs an OS scheduler that interleaves the execution of different tasks. Also known as _multi-tasking_.

- Parallelism: The ability to run multiple tasks _at the same time_. This requires a multi-core CPU, multiple CPUs, a GPU, or a cluster of computers. Parallelism is about _speeding up_ computation by using extra computing resources.

- Execution unit: General term for objects that execute code concurrently, each with independent state and call stack. Python natively supports three kinds of execution units: processes, threads, and coroutines.

- Process: An instance of a computer program while it is running, using memory and a slice of the CPU time. Modern desktop operating systems routinely manage hundreds of processes concurrently, with each process isolated in its own private memory space. Processes communicate via pipes, sockets, or memory mapped files -- all of which can only carry raw bytes. Python objects must be serialized(converted) into raw bytes to pass from one process to another. This is costly, and not all Python objects are serializable. A process can spawn subprocesses, each called a child process. These are also isolated from each other and from the parent. Processes allow _preemptive multitasking_: the OS scheduler _preempts_ --i.e., suspends -- each running process periodically to allow other processes to run. This means that a frozen process can't freeze the whole system -- in theory.

- Thread: An execution unit within a single process. When a process starts, it uses a single thread: the _main thread_. A process can create more threads to operate concurrently by calling operating system APIs. Threads within a process share the same memory space, which holds live Python objects. This allows easy data sharing between threads, but can also lead to corrupted data when more than one thread updates the same object concurrently. Like processes, threads also enable _preemptive multitasking_ under the supervision of the OS scheduler. A thread consumes less resources than a process doing the same job.

- Coroutine: A function that can suspend itself and resume later. In Python, _classic coroutines_ are built from generator functions, and _native coroutines_ are defined with _async def_. "Classic Coroutines" on page 641 introduced the concept, and Chapter 21 covers the use of the supervision of an _event loop_, also in the same thread. Asynchronous programming frameworks such as _asyncio_, _Curio_, or _Trio_ provide an event loop and supporting libraries for nonblocking, coroutine-based I/O. Coroutines support _cooperative multitasking_: each coroutine must explicitly cede control with the _yield_ or _await_ keywords, so that another may proceed concurrently(but not in parallel). This means that any blocking code in coroutine blocks the execution of the event loop and all other coroutines--in contrast with the _preemptive multitasking_ of processes and threads. On the other hand, each coroutine consumes less resources that a thread doing the same job.

- Queue: A data structure that lets us put and get items, usually in FIFO order: first in, first out. Queues allow separate execution units to exchange application data and control messages, such as error codes and signals to terminate. The implementation of a queue varies according to the underlying concurrency model: the _queue_ package in Python's standard library provides queue classes to support threads, while the _multiprocessing_ and _asyncio_ packages implement their own queue classes. The _queue_ and _asyncio_ packages also include queues that are not FIFO: _PriorityQueue_ and _LifoQueue_.

- Lock: An object that execution units can use to synchronize their actions and avoid corrupting data. While updating a shared data structure, the running code should hold an associated lock. This signals other parts of the program to wait until the lock is released before accessing the same structure. The simplest type of lock is also known as a mutex(for mutual exclusion). The implementation of a lock depends on the underlying concurrency model.

- Contention: Dispute over a limited asset. Resource contention happens when multiple execution units try to access a shared resource--such as a lock or storage. There's also CPU contention, when compute-intensive processes or threads must wait for the OS scheduler to give them CPU time.

## Processes, Threads, and Python's infamous GIL

Here is how concepts we just saw apply to Python programming, in 10 points:

1. Each instance of the Python interpreter is a process. You can start additional Python processes using `multiprocessing` or `concurrent.futures` libraries. Python `subprocess` library is designed to launch processes to run external programs, regardless of the languages used to write them.

2. The Python interpreter uses a single thread to run the user's program and the memory garbage collector. You can start additional Python threads using the `threading` or `concurrent.futures` libraries.

3. Access to object reference counts and other internal interpreter state is controlled by a lock, the Global Interpreter Lock(GIL). Only one Python thread can hold the GIL at any time. This means that only one thread can execute Python code at any time, regardless of the number of CPU cores.

4. To prevent a Python thread from holding the GIL indefinitely, Python's bytecode interpreter pauses the current Python thread every 5ms by default, releasing the GIL. The thread can then try to reacquire the GIL, but if there are other threads waiting for it, the OS scheduler may pick one of them to proceed.

In [11]:
import sys

# Call sys.getswitchinterval() to get the current value of the switch interval
print(sys.getswitchinterval())

# Change it with sys.setswitchinterval()
sys.setswitchinterval(0.008)
print(sys.setswitchinterval())
sys.setswitchinterval(0.005)

0.005
0.008


5. When we write Python code, we have no control over the GIL. But a built-in function or an extension written in C--or any language that interfaces at the Python/C API level--can release the GIL while running time-consuming tasks.

6. Every Python standard library function that makes a syscall release GIL. This includes all functions that perform disk I/O, network I/O, and `time.sleep()`. Many CPU-intensive function in the `Numpy/SciPy` libraries, as well as the compressing/decompressing functions from `zlib` and `bz2` modules also release the GIL.

7. Extensions that integrate the Python/C API level can also launch other non Python threads that are not affected by the GIL. Such GIL-free threads generally cannot change Python objects, but they can read from and write to the memory underlying objects that support the buffer protocol, such as `bytearray`, `array.array`, and `NumPy` arrays.

8. The effect of the GIL on network programming with Python threads is relatively small, because the I/O functions release the GIL, and reading or writing to the network always implies high latency--compared to reading and writing to memory. Consequently, each individual thread spends a lot of time waiting anyway, so their execution can be interleaved without major impact on the overall throughput. That's why David Beazley says: "Python threads are great at doing nothing."

9. Contention over GIL slows down compute-intensive Python threads. Sequential, single-threaded code is simpler and faster for such tasks.

10. To run CPU-intensive Python code on multiple cores, you must use multiple Python processes.

Here is good summary of the `threading` module documentation:

> CPython implementation detail: In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use `multiprocessing` or `concurrent.futures.ProcessPoolExecutor`. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.

The best practice is that one thread runs the event loop and all coroutines, while additional threads carry out specific tasks.

## A Concurrent Hello World

### Spinner with Threads

The idea of the next few examples is simple: start a function that blocks for 3 seconds while animating characters in the terminal to let the user know the program is "thinking" and not stalled.

In [4]:
# spinner_thread.py: the spin and slow functions

import itertools
import time
from threading import Thread, Event

def spin(msg: str, done: Event): # <1>
    for char in itertools.cycle('|/-\\'): # <2>
        status = f'\r{char} {msg}' # <3>
        print(status, flush=True, end='')
        if done.wait(.1): # <4>
            break # <5>
    blanks = ' ' * len(status)
    print(f'\r{blanks}\r', end='') # <6>
    
def slow():
    time.sleep(3) # <7>
    return 42

In [None]:
import itertools
import time
from threading import Thread, Event

def spin(msg: str, done: Event): # <1>
    for char in itertools.cycle('|/-\\'): # <2>
        status = f'\r{char} {msg}' # <3>
        print(status, flush=True, end='')
        if done.wait(.2): # <4>
            break # <5>
    blanks = ' ' * len(status)
    print(f'\r{blanks}\r', end='') # <6> 

done = Event()        
spin('thinking!', done)

# You can't make done.wait() return True and break the for loop,
# because done.set() is only called after spin() has finished.
# You can only call done.set() to make spin() function return 
# from another thread.
done.set()

1. This function will run in a separate thread. The `done` argument is an instance of `threading.Event`, a simple object to synchronize threads.

2. This is an infinite loop because `itertools.cycle` yields one character at a time, cycling through the string forever.

3. The trick for text-mode animation: move the cursor back to the start of the line with the carriage return ASCII control character('\r').

In [6]:
# \n passes the cursor to beginning of a new line
print('shubham\nmishra')

# \r passes the cursor to the starting point
print('shubham\rmishra')

shubham
mishra
mishram


4. The `Event.wait(timeout=None)` method returns `True` when the event is set by another thread; if the `timeout` elapses, it return `False`. The .1s timeout sets the "frame rate" of the animation to 10FPS. if you want the spinner to go faster, use a smaller timeout.

5. Exit the infinite loop.

6. Clear the status line by overwriting with spaces and moving the cursor back to the beginning.

7. `slow()` will be called by the main thread. Imagine this is a slow API call over the network. Calling `sleep` blocks the main thread, but the GIL is released so the spinner thread can proceed.

By design, there is no API for terminating a thread in Python. You must send it a message to shut down.

The `threading.Event` class is Python's simplest signalling mechanism to coordinate threads. An `Event` instance has an internal boolean flag that starts as `False`. Calling `Event.set()` sets the flag to `True`. While the flag is false, if a thread calls `Event.wait()`, it is blocked until another thread calls `Event.set()`, at which time `Event.wait()` returns `True`. If a timeout in seconds is given to `Event.wait(s)`, this call returns `False` when the timeout elapses, or return `True` as soon as `Event.set()` is called by another thread.

In [17]:
# spinner_thread.py: the supervisor and main functions

def supervisor() -> int: #1
    done = Event() #2
    spinner = Thread(target=spin, args=('thinking!', done)) #3
    print(f'spinner object: {spinner}') #4
    spinner.start() #5
    result = slow() #6
    done.set() #7
    spinner.join()  #8
    return result

def main() -> None:
    result = supervisor() #9
    print(f'Answer: {result}')
    
if __name__ == '__main__':
    main()

spinner object: <Thread(Thread-9 (spin), initial)>
Answer: 42  


1. `supervisor` will return the result of `slow`.
   
2. The `threading.Event` instance is the key to coordinate the activities of the `main` thread and the `spinner` thread, as explained further down.

3. To create a new `Thread`, provide a function as the `target` keyword argument, and positional arguments to the `target` as a tuple passed via `args`.

4. Display the `spinner` object. The output is `<Thread(Thread-1, initial)>`, where `initial` is the state of the thread--meaning it has not started.

5. Start the `spinner` thread.

6. Call `slow`, which blocks the `main` thread. Meanwhile, the secondary thread is running the spinner animation.

7. Set the `Event` flag to `True`; this will terminate the `for` loop inside the `spin` function.

8. Wait until the `spinner` thread finishes.

9. Run the `supervisor` function. I wrote separate `main` and `supervisor` functions to make this example look more like the `asyncio` version.

### Spinner with Processes

When you create a `multiprocessing.Process` instance, a whole new Python interpreter is started as a child process in the background. Since each Python process has its own GIL, this allows your program to use all available CPU cores--but that ultimately depends on the operating system scheduler.

In [7]:
# spinner_proc.py: only the changed parts are shown; 
# everything else is the same as spinner_thread.py

import itertools
import time

#1 The basic multiprocessing API imitates the threading API,
# but type hints and Mypy expose this difference: 
# multiprocessing.Event is a function(not a class
# like threading.Event) which returns a synchronize.Event
# instance...
from multiprocessing import Process, Event

#2 ...forcing us import multiprocessing.synchronize...
from multiprocessing import synchronize

#3 ...to write this type hind. 
# spin and slow functions are unchanged.
from multiprocessing import synchronize
def spin(msg: str, done: synchronize.Event) -> None: 
    for char in itertools.cycle('|/-\\'):
        status = f'\r{char} {msg}' 
        print(status, flush=True, end='')
        if done.wait(.1): 
            break 
    blanks = ' ' * len(status)
    print(f'\r{blanks}\r', end='') 
    
def slow():
    time.sleep(3) 
    return 42

def supervisor() -> int:
    done = Event()
    
    #4 Basic usage of the Process class is similar to Thread.
    spinner = Process(target=spin,
                        args=('thinking', done))
    #5 The spinner object is displayed as <Process
    # name='Process-1' parent=14868 initial>, where
    # 14868 is the process ID of the Python instance
    # running spinner_proc.py.
    print(f'spinner object: {spinner}')
    spinner.start()
    result = slow()
    done.set()
    spinner.join()
    return result

def main() -> None:
    result = supervisor() 
    print(f'Answer: {result}')
    
if __name__ == '__main__':
    main()
    
# Spinner with Process runs without animation. 
# Animation works in terminal but not in Jupyter Notebook cells.
# Why?

spinner object: <Process name='Process-3' parent=19196 initial>
Answer: 42


The basic API of threading and multiprocessing are similar, but their implementation is very different, and multiprocessing has a much larger API to handle the added complexity of multiprocess programming. For example objects crossing process boundaries have to be serialized and deserialized, which creates overhead.

Since Python 3.8, there's a `multiprocessing.shared_memory` package in the standard library, but is does not support instances of user-defined classes. Besides raw bytes, the package allows processes to share a `ShareableList`, a mutable sequence type that hold a fixed number of items of types `int`, `float`, `bool`, and `None`, as well as `str` and `bytes` up to 10MB per item.

### Spinner with Coroutines

Coroutines are driven by an application-level event loop that manages a queue of pending coroutines, drivers them one by one, monitors events triggered by I/O operations initiated by coroutines, and passes control back to the corresponding coroutine when each event happens. The event loop and the library coroutines and the user coroutines all execute in a single thread. Therefore, any time spent in a coroutine slows down the event loop--and all other coroutines.

In [1]:
import asyncio
import itertools

#1 We don't need the Event argument that was used to
# signal that slow had completed its job in spinner_thread.py.
async def spin(msg: str) -> None:
    for char in itertools.cycle(r'\|/-'):
        status = f'\r{char} {msg}'
        print(status, flush=True, end='')
        try: 
            
            #2 Use await asyncio.sleep(.1) instead of done.wait(.1)
            # to pause without blocking other coroutines.
            await asyncio.sleep(.1)
        
        #3 asyncio.CancelledError is raised when the cancle
        # method is called on the Task controlling this coroutine.
        # Time to exit the loop.
        except asyncio.CancelledError:
            break
    blanks = ' ' * len(status)
    print(f'\r{blanks}\r', end='')
    
async def slow() -> int:
    #4 The slow coroutine also use await asyncio.sleep(.1) instead 
    # of time.sleep(3)
    await asyncio.sleep(3)
    return 42

In [None]:
# spinner_async.py: the main function and supervisor coroutine

# Native coroutines are defined with async def.
async def supervisor() -> int:
    
    # asyncio.create_task schedules the spin coroutine to run,
    # and returns a Task object immediately.
    spinner = asyncio.create_task(spin('thinking!'))
    
    # The repr of the spinner object looks like <Task pending
    # name='Task-1' coro=<spin() running at 
    # /path/to/spinner_async.py:11>>
    print(f'spinner object: {spinner}')
    
    # The await keyword calls slow, blocking supervisor until
    # slow returns. the return value of slow will be assigned
    # to result.
    result = await slow()
    
    # The Task.cancel method raises a CancelledError exception
    # inside the spin coroutine.
    spinner.cancel()
    return result

# main is the only regular function defines in this program
# --the others are coroutines.
def main() -> None:
    # The asyncio.run function starts the event loop to drive the
    # coroutine that will eventually set the other coroutines in motion.
    # The main function will stay blocked until supervisor returns.
    # The return value of supervisor will be the return value of 
    # asyncio.run.
    result = asyncio.run(supervisor())
    print(f'Answer: {result}')


    
if __name__ == '__main__':
    main()

#

Three main ways of running a coroutine:

- `asyncio.run(coro())`: Called from a regular function to drive a coroutine object that usually is the entry point for all the asynchronous code in a program, like the `supervisor` in this example. This call blocks until the body of coro returns. The return value of the `run()` call is whatever the body of `coro` returns.

- `asyncio.create_task(coro())`: Called from a coroutine to schedule another coroutine to execute eventually. This call does not suspend the current coroutine. It returns a `Task` instance, an object that wraps the coroutine object and provides methods to control and query its state.

- `await coro()`: Called from a coroutine to transfer control to the coroutine object returned by `coro()`. This suspends the current coroutine until the body of `coro` returns. The value of the await expression is whatever the coroutine object returns.

In [None]:
# can't run asyncio.run() in Jupyter Notebook
import asyncio

async def my_coroutine(msg):
    print("Coroutine is running...", msg)
    return 42

result = asyncio.run(my_coroutine('thinking!'))
print(result)

Never use `time.sleep()` in `asyncio` coroutines unless you want to pause your whole program. If a coroutine needs to spend some time doing nothing, it should `await` on an `asyncio.sleep(DELAY)` call. This yields control back to the `asyncio` event loop, which can drive other pending coroutines.