Python concurrency

Concurrency in Python is achieved through Multiprocessing, Multithreading, and Asynchronous Programming, each suited to different scenarios based on the nature of tasks and the desired efficiency.

Referenes:

Multiprocessing

Multiprocessing allows for parallel execution of tasks across multiple CPU cores, each process having its own memory space.

Key Components:
- Pool: Simplifies the process of spawning multiple tasks across processes. It provides a means to parallelize the execution of a function across multiple input values, distributing the input data across processes (data parallelism).
  - Usage: Pool can be used to manage a pool of worker processes for tasks that are CPU-bound and require parallel execution to speed up the processing.
- Process: Represents an activity that is run in a separate process.
- Pipe and Queue: Mechanisms for inter-process communication (IPC). A Pipe is used for bi-directional communication between two processes. A Queue is used for multiple producers and consumers.

from multiprocessing import Pool
from functools import partial

def square_number(n, multiplier):
    return n * n * multiplier

if __name__ == "__main__":
    multiplier = 2
    partial_square_number = partial(square_number, multiplier=multiplier)
    
    with Pool(processes=4) as pool:  # Use 4 worker processes
        results = pool.map(partial_square_number, range(10))
        print(results)

Multithreading

Multithreading involves running multiple threads (lighter weight than processes) within the same process, sharing memory space.

Key Component:
- threading: Python module that provides a way of using threads to achieve concurrency. Threads share the same memory space and are lighter weight than processes.
Synchronization Primitives:
- Lock and Semaphore: Essential for preventing race conditions and ensuring thread safety by controlling access to shared resources.

Multiprocessing vs Multithreading

Memory Space: Separate for multiprocessing, shared for multithreading.
Overhead: Generally higher for multiprocessing due to the cost of starting and managing new processes.
Use Case: Multiprocessing is preferred for CPU-bound tasks, while multithreading is better suited for I/O-bound tasks.

Asynchronous Programming

asyncio: A library to write concurrent code using the async/await syntax.
Key Concepts:
- async/await: Enables asynchronous programming, allowing the program to run other tasks while waiting for an operation to complete.
- Event Loop: Orchestrates the execution of various tasks and handles all the I/O operations asynchronously.
- Coroutines and Tasks: The building blocks for asynchronous programming in Python.
  - Example Coroutine:
```
async def fetch_data():
    await asyncio.sleep(1)
    return {'data': 'sample'}
```

asyncio I/O and aiohttp

asyncio I/O: Facilitates non-blocking I/O operations, significantly improving the efficiency of I/O-bound tasks.

aiohttp Usage: For asynchronous HTTP requests. It supports both client and server-side operations.

Example aiohttp Client Usage:

async def fetch_page(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            return await response.text()

This improved markdown format not only enhances readability but also provides clear, actionable information on the key components and use cases of Python's concurrency models, including practical examples of using Pool for multiprocessing, making HTTP requests with aiohttp, and writing coroutines. To enhance the quality of your markdown and add more detailed explanations for the concepts, I've reformatted and expanded on your notes. Here's an improved version that should make the information clearer and more comprehensive:

Python Concurrency Implementations

Multiprocessing

Explore the multiprocessing module in Python, which allows for the execution of multiple processes simultaneously, leveraging multiple CPU cores.

Python multiprocessing module:
- Pool: A convenient way to parallelize executing a function across multiple input values, distributing the input data across processes (data parallelism).
- cpu_count: Returns the number of CPU cores available on your system. This can be helpful to decide the number of processes to run in parallel.
- await: Not applicable here as await is used with asyncio for asynchronous programming. Multiprocessing deals with concurrent execution in separate processes.
Python functools module:
- partial: Used to create a partial function by fixing some portion of a function's arguments, which can be particularly useful in multiprocessing when you want to pass additional fixed arguments to the function being executed by the pool.

Python Multi-threading

Multi-threading in Python allows multiple threads to run concurrently in a single process, sharing the same space.

Key Functions:
- threading.active_count(): Returns the number of Thread objects currently alive.
- The combination of threading.Lock().acquire() and threading.Lock().release() manages locks explicitly. However, using the with statement for lock management is more concise and less error-prone, as it ensures that the lock is released automatically.

Basic Threading Example

Click to expand

import threading
import time

def thread_function(name):
    print(f"Thread {name}: starting")
    time.sleep(2)
    print(f"Thread {name}: finishing")

threads = []

for i in range(5):
    x = threading.Thread(target=thread_function, args=(i,))
    threads.append(x)
    x.start()

for thread in threads:
    thread.join()

Custom Thread Class Example

Click to expand

import threading
import time

class MyThread(threading.Thread):
    def __init__(self, name):
        threading.Thread.__init__(self)
        self.name = name

    def run(self):
        print(f"Thread {self.name}: starting")
        time.sleep(2)
        print(f"Thread {self.name}: finishing")

threads = []

for i in range(5):
    my_thread = MyThread(name=i)
    threads.append(my_thread)
    my_thread.start()

for thread in threads:
    thread.join()

Threading with Lock Management

Click to expand

import threading
import time

class ThreadSafeCounter:
    def __init__(self):
        self.value = 0
        self.lock = threading.Lock()

    def increment(self, n):
        for _ in range(n):
            with self.lock:
                self.value += 1

def run_threaded_increments(counter, increments):
    threads = [threading.Thread(target=counter.increment, args=(increments,)) for _ in range(4)]
    for thread in threads:
        thread.start()
    for thread in threads:
        thread.join()

if __name__ == "__main__":
    counter = ThreadSafeCounter()
    run_threaded_increments(counter, 100000)
    print(f"Counter value: {counter.value}")

YAML

YAML, a human-friendly data serialization standard, is widely used in configurations and data processing, supporting multiple documents within a single file with the --- separator.

Click to expand

rest:
  url: "https://example.org/primenumbers/v1"
  port: 8443

prime_numbers: [2, 3, 5, 7, 11, 13, 17, 19]

AsyncIO & Multi-threading

AsyncIO provides a framework for writing single-threaded concurrent code using coroutines, event loops, and I/O completion callbacks. Multi-threading runs multiple threads in a single process.

Python `aiohttp` module

aiohttp is an asynchronous HTTP Client/Server framework that supports the async/await syntax, ideal for non-blocking HTTP requests.

Coroutine

In Python, a coroutine is a special function that can be paused and resumed, allowing other code to run during the pauses. This can be useful for tasks that spend a lot of time waiting for something (like user input, file I/O, network responses, etc.), as it allows other tasks to make progress during the waiting periods.

Coroutines in Python are defined using the async def syntax:

async def my_coroutine():
    ...

Within a coroutine, the await keyword can be used to pause execution until some other function (often another coroutine) is complete:

async def my_coroutine():
    await some_other_function()

When some_other_function() is called with the await keyword, my_coroutine() is paused. When some_other_function() is done, my_coroutine() is resumed where it left off.

Python Web Scraper

Requirements of Massive Web Scraping:

Efficiency: Scrape 10,000 pages for each website
Scalability: Easy to expand to other websites or other information on the same page
Automation: Minimize human intervention
Traceability: Access to progress

Difficulties of Web Scraping:

Site map: How to obtain paths to target webpages
Information per page: How to access webpage content
Non-human restriction: E.g., Cloudflare

Solutions:

Obtain sitemap:
- API calls: Examine network of websites
- Third-party tools: pro-sitemaps.com
- Graph search: TO DO

Access webpage content:

Tools	Speed	Universality	IP Rotation	Multi-threading Support
Beautiful Soup	Fast	Low	No	Risky
Selenium (headless browser)	Slow (20-30s/page)	Medium	No	Risky
Smart Proxy (Residential)	Fast	Medium	Yes	Secured
Smart Proxy (web scrape)	Slow (30-40s/page)	High	Yes	Secured
Cloudflare local solver	Slow (30s+/page)	Very High	Yes	Multiple container

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Concurrent-and-Parallel-Programming		Concurrent-and-Parallel-Programming
web-scrawler		web-scrawler
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python concurrency

Multiprocessing

Multithreading

Multiprocessing vs Multithreading

Asynchronous Programming

asyncio I/O and aiohttp

Python Concurrency Implementations

Multiprocessing

Python Multi-threading

Basic Threading Example

Custom Thread Class Example

Threading with Lock Management

YAML

AsyncIO & Multi-threading

Python `aiohttp` module

Coroutine

Python Web Scraper

Requirements of Massive Web Scraping:

Difficulties of Web Scraping:

Solutions:

Code Structure:

About

Releases

Packages

Languages

zhiyuan8/Python-concurrency

Folders and files

Latest commit

History

Repository files navigation

Python concurrency

Multiprocessing

Multithreading

Multiprocessing vs Multithreading

Asynchronous Programming

asyncio I/O and aiohttp

Python Concurrency Implementations

Multiprocessing

Python Multi-threading

Basic Threading Example

Custom Thread Class Example

Threading with Lock Management

YAML

AsyncIO & Multi-threading

Python aiohttp module

Coroutine

Python Web Scraper

Requirements of Massive Web Scraping:

Difficulties of Web Scraping:

Solutions:

Code Structure:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Python `aiohttp` module

Packages