Here are some typical jobs for the multiprocessing module:
- Parallelize a CPU-bound task with Process or Pool objects.
- Parallelize an I/O-bound task in a Pool with threads using the (oddly named) dummy module.
- Share pickled work via a Queue .
- Share state between parallelized workers, including bytes, primitive datatypes, dictionaries, and lists.

By using processes we run a number of Python interpreters in parallel, each with a private memory space with its own GIL, and each runs in series

Main componets of the multiprocessing module
- **Process** - A forked copy of the current process; this creates a new process identifier and the task runs as an independent child process in the operating system. You can start and query the state of the Process and provide it with a target method to run.
- **Pool** - Wraps the Process or threading.Thread API into a convenient pool of workers that share a chunk of work and return an aggregated result.
- **Queue** - A FIFO queue allowing multiple producers and consumers.
- **Pipe** - A uni- or bidirectional communication channel between two processes.
- **Manager** - A high-level managed interface to share Python objects between processes.
- **ctypes** - Allows sharing of primitive datatypes (e.g., integers, floats, and bytes) between processes after they have forked.
- **Synchronization primitives** - Locks and semaphores to synchronize control flow between processe

## Estimating Pi Using Processes and Threads

In [1]:
"""Estimate Pi using blocks of serial work on 1 CPU"""
import time
import numpy as np


def estimate_nbr_points_in_circle(nbr_samples):
    # set random seed for numpy in each new process
    # else the fork will mean they all share the same state
    np.random.seed()
    xs = np.random.uniform(0, 1, int(nbr_samples))
    ys = np.random.uniform(0, 1, int(nbr_samples))
    estimate_inside_quarter_unit_circle = (xs * xs + ys * ys) <= 1
    nbr_trials_in_quarter_unit_circle = np.sum(
        estimate_inside_quarter_unit_circle)
    return nbr_trials_in_quarter_unit_circle


nbr_samples_in_total = 1e7 # le8 causes memory error in the later example

nbr_parallel_blocks = 4
nbr_samples_per_worker = nbr_samples_in_total / nbr_parallel_blocks
print("Making {} samples per worker".format(nbr_samples_per_worker))

t1 = time.time()
nbr_in_circle = 0
for npb in range(nbr_parallel_blocks):
    nbr_in_circle += estimate_nbr_points_in_circle(nbr_samples_per_worker)
print("Took {}s".format(time.time() - t1))
pi_estimate = float(nbr_in_circle) / nbr_samples_in_total * 4
print("Estimated pi", pi_estimate)
print("Pi", np.pi)

Making 2500000.0 samples per worker
Took 1.5402233600616455s
Estimated pi 3.141894
Pi 3.141592653589793


In [2]:
from multiprocessing import Pool

def estimate_nbr_points_in_quarter_circle(nbr_samples):
    np.random.seed()
    xs = np.random.uniform(0, 1, int(nbr_samples))
    ys = np.random.uniform(0, 1, int(nbr_samples))
    estimate_inside_quarter_unit_circle = (xs * xs + ys * ys) <= 1
    nbr_trials_in_quarter_unit_circle = np.sum(
        estimate_inside_quarter_unit_circle)
    return nbr_trials_in_quarter_unit_circle

nbr_samples_in_total = 1e7 # 1e8 causes memory error 
nbr_parallel_blocks = 4

pool = Pool()

nbr_samples_per_worker = nbr_samples_in_total / nbr_parallel_blocks
print("Making {} samples per worker".format(nbr_samples_per_worker))

# confirm we have an integer number of jobs to distribute
assert nbr_samples_per_worker == int(nbr_samples_per_worker)
nbr_samples_per_worker == int(nbr_samples_per_worker)
map_inputs = [nbr_samples_per_worker] * nbr_parallel_blocks
t1 = time.time()
results = pool.map(estimate_nbr_points_in_quarter_circle, map_inputs)
pool.close()
print("Dart throws in unit circle per worker:", results)
print("Took {}".format(time.time() - t1))
nbr_in_circle = sum(results)
combined_nbr_samples = sum(map_inputs)

pi_estimate = float(nbr_in_circle) / combined_nbr_samples * 4
print("Estimated pi", pi_estimate)
print("Pi", np.pi)

Making 2500000.0 samples per worker
Dart throws in unit circle per worker: [1963029, 1964054, 1963190, 1962623]
Took 1.0072505474090576
Estimated pi 3.1411584
Pi 3.141592653589793


### GIL Battle
David Beazley explains GIL Battle in ["Understanding the Python GIL."](http://www.dabeaz.com/GIL/) 
- Threads in Python are great for I/O-bound tasks, 
- but they’re a poor choice for CPU-bound problems.
- a single-core system with multiple threads has no “GIL battle.”
- http://www.dabeaz.com/GIL/gilvis/fourthread.html 
<img src="http://apprize.info/python/high/high.files/image076.jpg",width=400>

In [None]:
from threading import Thread
import time

def countdown(n):
    while n > 0:
        n -= 1

COUNT = 10000000

t1 = Thread(target=countdown,args=(COUNT/4,))
t2 = Thread(target=countdown,args=(COUNT/4,))
t3 = Thread(target=countdown,args=(COUNT/4,))
t4 = Thread(target=countdown,args=(COUNT/4,))
start = time.time()
t1.start();t2.start(); t3.start(); t4.start()
t1.join();t2.join(); t3.join(); t4.join()
end = time.time()
print (end-start)