# Parallel Processing
These are techniques that allow the Python script to execute code "simultaneously" within a given application. Useful when a process takes a long time (like calculating the first 1 trillion fibonacci numbers [**CPU bound**] or downloading a large file [**I/O bound**]), and this process must be repeated multiple times. Instead of waiting for the first process, the second process can execute alongside with it (and the third, depending on how it's coded).

**CPU bound** is when the process is slow due to the amount of processing that the computer needs to do, like when calculating all digits of Pi.


**Network bound** is when the process is slow due to operations that happen on storage, memory or network (download/upload), or anything that is not within the control of the CPU.

## Multiprocessing vs Multithreading vs Asyncio in Python
| | Number of CPU used | Resources Used | Main use |
| --- | --- | --- | --- |
| Multiprocessing | As configured (maximum Core count of the computer) | 1 CPU core and Memory for **each** "process" | For CPU bound processes |
| Multithreading | 1 | 1 Thread for **each** "process" and 1 memory for **all** "processes" | For I/O bound processes |
| Asyncio | 1 | 1 CPU core and memory for **all** "processes" | For I/O bound processes |
- "Hyper threading" allows 2 threads to be used for 1 CPU core
- In Python, multithreading only executes 1 thread at a time even if multiple threads are used.

Read more about it here:

https://rednafi.github.io/digressions/python/2020/04/21/python-concurrent-futures.html#running-multiple-cpu-bound-subroutines-with-multi-processing

## Multiprocessing
This is the "true" parallelism because multiple "child" applications are actually ran when used. This allows each of the processes to have their own copy of the memory and their own CPU for processing.

**Because a new process is spawned and its memory is copied, this uses more memory in general. This also means that there is a "slightly longer" delay (often called overhead) before the process is executed for each CPU**

In [4]:
from concurrent.futures import ProcessPoolExecutor, as_completed
from patents_view_api.api_utils import get_patents

# Copied from the URL calculated by the script
URLs = [
    'https://api.patentsview.org/patents/query?q=%7B%22_and%22%3A+%5B%7B%22_gte%22%3A+%7B%22patent_date%22%3A+%221980-01-01%22%7D%7D%2C+%7B%22_lte%22%3A+%7B%22patent_date%22%3A+%221980-01-31%22%7D%7D%5D%7D&f=%5B%22patent_id%22%2C+%22patent_number%22%2C+%22patent_title%22%2C+%22patent_date%22%2C+%22patent_type%22%2C+%22patent_num_us_patent_citations%22%2C+%22inventor_first_name%22%2C+%22inventor_last_name%22%2C+%22inventor_longitude%22%2C+%22inventor_latitude%22%2C+%22inventor_city%22%2C+%22inventor_state%22%5D&o=%7B%22page%22%3A+1%2C+%22per_page%22%3A+10000%7D'
] * 5

for url in URLs:
    response = get_patents(url)
    print(f'Fetched `{response["count"]}` out of `{response["total_patent_count"]}` patents...')


Getting patents...
Fetched `5902` out of `5902` patents...
Getting patents...
Fetched `5902` out of `5902` patents...
Getting patents...
Fetched `5902` out of `5902` patents...
Getting patents...
Fetched `5902` out of `5902` patents...
Getting patents...
Fetched `5902` out of `5902` patents...


In [5]:
from concurrent.futures import ProcessPoolExecutor, as_completed
from patents_view_api.api_utils import get_patents

# Copied from the URL calculated by the script
URLs = [
    'https://api.patentsview.org/patents/query?q=%7B%22_and%22%3A+%5B%7B%22_gte%22%3A+%7B%22patent_date%22%3A+%221980-01-01%22%7D%7D%2C+%7B%22_lte%22%3A+%7B%22patent_date%22%3A+%221980-01-31%22%7D%7D%5D%7D&f=%5B%22patent_id%22%2C+%22patent_number%22%2C+%22patent_title%22%2C+%22patent_date%22%2C+%22patent_type%22%2C+%22patent_num_us_patent_citations%22%2C+%22inventor_first_name%22%2C+%22inventor_last_name%22%2C+%22inventor_longitude%22%2C+%22inventor_latitude%22%2C+%22inventor_city%22%2C+%22inventor_state%22%5D&o=%7B%22page%22%3A+1%2C+%22per_page%22%3A+10000%7D'
] * 5

# Alot the resources for Multiprocessing
with ProcessPoolExecutor() as executor:
    futures = {
        executor.submit(get_patents, url) for url in URLs
    }
    for future in as_completed(futures):
        response = future.result()
        print(f'Fetched `{response["count"]}` out of `{response["total_patent_count"]}` patents...')

Getting patents...Getting patents...Getting patents...Getting patents...


Getting patents...

Fetched `5902` out of `5902` patents...
Fetched `5902` out of `5902` patents...
Fetched `5902` out of `5902` patents...
Fetched `5902` out of `5902` patents...
Fetched `5902` out of `5902` patents...


## Multithreading
This executes the processes into multiple threads. This allows each of the processes to have their own thread for processing. However, even though the processes are split into different threads, only 1 thread is executed by Python at a time.

**Because a thread is spawned for the processing and the memory are shared, there is a "slightly shorter" delay as compared to multiprocessing**

In [7]:
from concurrent.futures import ThreadPoolExecutor, as_completed
from patents_view_api.api_utils import get_patents

# Copied from the URL calculated by the script
URLs = [
    'https://api.patentsview.org/patents/query?q=%7B%22_and%22%3A+%5B%7B%22_gte%22%3A+%7B%22patent_date%22%3A+%221980-01-01%22%7D%7D%2C+%7B%22_lte%22%3A+%7B%22patent_date%22%3A+%221980-01-31%22%7D%7D%5D%7D&f=%5B%22patent_id%22%2C+%22patent_number%22%2C+%22patent_title%22%2C+%22patent_date%22%2C+%22patent_type%22%2C+%22patent_num_us_patent_citations%22%2C+%22inventor_first_name%22%2C+%22inventor_last_name%22%2C+%22inventor_longitude%22%2C+%22inventor_latitude%22%2C+%22inventor_city%22%2C+%22inventor_state%22%5D&o=%7B%22page%22%3A+1%2C+%22per_page%22%3A+10000%7D'
] * 5

with ThreadPoolExecutor() as executor:
    futures = {
        executor.submit(get_patents, url) for url in URLs
    }
    for future in as_completed(futures):
        response = future.result()
        print(f'Fetched `{response["count"]}` out of `{response["total_patent_count"]}` patents...')

Getting patents...
Getting patents...
Getting patents...
Getting patents...
Getting patents...
Fetched `5902` out of `5902` patents...
Fetched `5902` out of `5902` patents...
Fetched `5902` out of `5902` patents...
Fetched `5902` out of `5902` patents...
Fetched `5902` out of `5902` patents...


## Asyncio
This executes the processes into a single thread but allows processes to "wait on the side" while another process is executed. This allows the application to only use a single thread while still allowing multiple "waiting" processes to be executed. However, the function is written differently so that Python knows what can be "awaited".

**Because there is no thread, CPU or memory being spawned, this has the slowest delay as compared to the other 2**

In python, this will generally be better than threading. However, not all packages support asyncio yet. In those packages, threading is the only option for I/O bound processes.

In [2]:
import asyncio
from patents_view_api.api_utils import async_get_patents

# Copied from the URL calculated by the script
URLs = [
    'https://api.patentsview.org/patents/query?q=%7B%22_and%22%3A+%5B%7B%22_gte%22%3A+%7B%22patent_date%22%3A+%221980-01-01%22%7D%7D%2C+%7B%22_lte%22%3A+%7B%22patent_date%22%3A+%221980-01-31%22%7D%7D%5D%7D&f=%5B%22patent_id%22%2C+%22patent_number%22%2C+%22patent_title%22%2C+%22patent_date%22%2C+%22patent_type%22%2C+%22patent_num_us_patent_citations%22%2C+%22inventor_first_name%22%2C+%22inventor_last_name%22%2C+%22inventor_longitude%22%2C+%22inventor_latitude%22%2C+%22inventor_city%22%2C+%22inventor_state%22%5D&o=%7B%22page%22%3A+1%2C+%22per_page%22%3A+10000%7D'
] * 5

async def async_process():
    tasks = []
    for url in URLs:
        task = asyncio.ensure_future(async_get_patents(url))
        tasks.append(task)

    '''
    NOTE: Doing the following will make it synchronous:
    for url in URLs:
        respones = await async_get_patents(url)
    
    This is because each URL will be "awaited" before it moves on to the next URL
    '''

    for task in asyncio.as_completed(tasks):
        response = await task
        print(f'Fetched `{response["count"]}` out of `{response["total_patent_count"]}` patents...')

asyncio.run(async_process())

Getting patents...
Getting patents...
Getting patents...
Getting patents...
Getting patents...
Fetched `5902` out of `5902` patents...
Fetched `5902` out of `5902` patents...
Fetched `5902` out of `5902` patents...
Fetched `5902` out of `5902` patents...
Fetched `5902` out of `5902` patents...
