## Mulitprocessing 

### Runing a function (do_something) synchrously

In [16]:
import time

start = time.perf_counter()


def do_something():
    print(f'Sleeping 1 second...')
    time.sleep(1)
    return f'Done Sleeping...'

do_something()
do_something()

finish = time.perf_counter()

print(f'Finished in {round(finish-start, 2)} second(s)')

Sleeping 1 second...
Sleeping 1 second...
Finished in 2.0 second(s)


### If we do not need to run tasks synchronously, we can
* use multiprocessing module to split these tasks up onto other cpus and run them at the same time
  + can gain for both CPU and IO bound tasks
* CPU bound tasks use a lot of CPU 
* IO-bound tasks wait for input/output operations without using CPU that much
  + file system and network operations such as downloading files
  + multi-threading would not gain much for CPU bound tasks because threads are running the same process

#### creating multiple processes, each for one task using multiprocessing module
* create two process objects by multiprocessing.Process
  + using the do_something function name as the target of multiprocessing.Process()
  + start the processes by p1.start() and p2.start()
* notice that even though both processes slept for 1 seconds at the same time, it printed that it finished in 0.01 s
  + this is because before p1.start() and p2.start() started, the program executes finish=time.per_counter() and print statement
  + therefore, the program first kicking off p1 and p2 processes, and then executes statements after them
* Since we want the program to wait until p1 and p2 are finished, and then calculate the time and print, we use join
  + p1.join() and p2.join() means p1 and p2 will be finished before program moves on 
* when we run the following code, p1 and p2 starts at the same time, slept for 1 s, and the entire code finished in 1 s  

In [17]:
import multiprocessing
import time

start = time.perf_counter()


def do_something():
    print(f'Sleeping 1 second...')
    time.sleep(1)
    print(f'Done Sleeping...')

p1 = multiprocessing.Process(target=do_something)
p2 = multiprocessing.Process(target=do_something)

p1.start()
p2.start()

p1.join()
p2.join()

finish = time.perf_counter()

print(f'Finished in {round(finish-start, 2)} second(s)')

Sleeping 1 second...
Sleeping 1 second...
Done Sleeping...
Done Sleeping...
Finished in 1.03 second(s)


#### Demonstrate a lot more processes started at the same time

In [18]:
import multiprocessing
import time

start = time.perf_counter()


def do_something():
    print(f'Sleeping 1 second...')
    time.sleep(1)
    print(f'Done Sleeping...')

processes = []

for _ in range(10):
    p = multiprocessing.Process(target=do_something)
    p.start()
    processes.append(p)

for process in processes:
    process.join()
    
finish = time.perf_counter()

print(f'Finished in {round(finish-start, 2)} second(s)')

Sleeping 1 second...
Sleeping 1 second...
Sleeping 1 second...Sleeping 1 second...
Sleeping 1 second...

Sleeping 1 second...Sleeping 1 second...
Sleeping 1 second...
Sleeping 1 second...
Sleeping 1 second...

Done Sleeping...
Done Sleeping...
Done Sleeping...
Done Sleeping...Done Sleeping...

Done Sleeping...
Done Sleeping...
Done Sleeping...Done Sleeping...Done Sleeping...


Finished in 1.12 second(s)


### use functions that accept arguments
* unlike multi-thread, to pass arguments to multi-processing process, arguments must be able to be serialized by pickle
  + converting python objects to a format that can be de-constructured and re-constructed in anothe python script

In [19]:
import multiprocessing
import time

start = time.perf_counter()


def do_something(seconds):
    print(f'Sleeping {seconds} second(s)...')
    time.sleep(seconds)
    print(f'Done Sleeping...')

processes = []

for _ in range(10):
    p = multiprocessing.Process(target=do_something, args=[1.5])
    p.start()
    processes.append(p)

for process in processes:
    process.join()
    
finish = time.perf_counter()

print(f'Finished in {round(finish-start, 2)} second(s)')

Sleeping 1.5 second(s)...
Sleeping 1.5 second(s)...Sleeping 1.5 second(s)...Sleeping 1.5 second(s)...
Sleeping 1.5 second(s)...
Sleeping 1.5 second(s)...


Sleeping 1.5 second(s)...Sleeping 1.5 second(s)...Sleeping 1.5 second(s)...


Sleeping 1.5 second(s)...
Done Sleeping...
Done Sleeping...
Done Sleeping...
Done Sleeping...
Done Sleeping...
Done Sleeping...
Done Sleeping...Done Sleeping...Done Sleeping...


Done Sleeping...
Finished in 1.61 second(s)


#### using concurrent.futures
* if we want to execute the funtion once at a time, use submit method
  + it schedule a function to be executed and returns a future object
  + submit() submits each function once at a time
  + a future object encapsulates the execution of our function and allows us to check on it afte it's been scheduled
    - we can check if it is running or done and its result with the returned value of function
    - future.result() will wait until the results are returned    

In [20]:
import concurrent.futures
import time

start = time.perf_counter()


def do_something(seconds):
    print(f'Sleeping {seconds} second(s)...')
    time.sleep(seconds)
    return 'Done Sleeping...'

with concurrent.futures.ProcessPoolExecutor() as executor:
    f1 = executor.submit(do_something, 1)
    f2 = executor.submit(do_something, 1)
    print(f1.result())
    print(f2.result())
    
finish = time.perf_counter()

print(f'Finished in {round(finish-start, 2)} second(s)')


Sleeping 1 second(s)...Sleeping 1 second(s)...

Done Sleeping...
Done Sleeping...
Finished in 1.04 second(s)


#### use multiple tasks in concurrent.futures by as_completed
* collect all future objects by list comprehension of executor.submit within a range of 10
* feed the future objects list to concurrent.futures.as_completed, which generates a list of future objects
  + retrieve the results of each future object

In [21]:
import concurrent.futures
import time

start = time.perf_counter()


def do_something(seconds):
    print(f'Sleeping {seconds} second(s)...')
    time.sleep(seconds)
    return 'Done Sleeping...'

with concurrent.futures.ProcessPoolExecutor() as executor:
    # define a list of future objects
    results = [executor.submit(do_something, 1) for _ in range(10)]
    
    # retrieve result values when future objects are completed
    for f in concurrent.futures.as_completed(results):
        print(f.result())
    
finish = time.perf_counter()

print(f'Finished in {round(finish-start, 2)} second(s)')


Sleeping 1 second(s)...Sleeping 1 second(s)...
Sleeping 1 second(s)...Sleeping 1 second(s)...


Sleeping 1 second(s)...
Sleeping 1 second(s)...Sleeping 1 second(s)...

Sleeping 1 second(s)...
Done Sleeping...
Done Sleeping...
Done Sleeping...
Done Sleeping...
Sleeping 1 second(s)...
Sleeping 1 second(s)...
Done Sleeping...
Done Sleeping...
Done Sleeping...
Done Sleeping...
Done Sleeping...
Done Sleeping...
Finished in 3.06 second(s)


In [22]:
import concurrent.futures
import time

start = time.perf_counter()


def do_something(seconds):
    print(f'Sleeping {seconds} second(s)...')
    time.sleep(seconds)
    return f'Done Sleeping...{seconds}'

with concurrent.futures.ProcessPoolExecutor() as executor:
    secs = [5, 4, 3, 2, 1]
    # define a list of future objects
    results = [executor.submit(do_something, sec) for sec in secs]
    
    # retrieve result values when future objects are completed
    for f in concurrent.futures.as_completed(results):
        print(f.result())
    
finish = time.perf_counter()

print(f'Finished in {round(finish-start, 2)} second(s)')


Sleeping 5 second(s)...Sleeping 4 second(s)...Sleeping 3 second(s)...
Sleeping 2 second(s)...


Sleeping 1 second(s)...
Done Sleeping...2
Done Sleeping...3
Done Sleeping...1
Done Sleeping...4
Done Sleeping...5
Finished in 5.05 second(s)


#### map
* submit method submits one function at a time. 
  + If we want to submit the entire function list, we need to using list comprehension or a loop
* we can use map method to run the function over each item in the input iterable 
  + do_something will be executed on each of the element in secs list
* when we use submit method, it returns a future object, map method returns the results
  + map returns the results in order that they were started
  + we can iterate over these results by a for loop
* if the program raises exception, it won't raise it while running the process
  + exception will be raised when result is retrieved from the results iteration, where you can put error handling code
  + program will wait until results are completed before the context manager finishes

In [23]:
import concurrent.futures
import time

start = time.perf_counter()


def do_something(seconds):
    print(f'Sleeping {seconds} second(s)...')
    time.sleep(seconds)
    return f'Done Sleeping...{seconds}'


with concurrent.futures.ProcessPoolExecutor() as executor:
    secs = [5, 4, 3, 2, 1]
    results = executor.map(do_something, secs)

    for result in results:
        print(result)

finish = time.perf_counter()

print(f'Finished in {round(finish-start, 2)} second(s)')

Sleeping 5 second(s)...
Sleeping 4 second(s)...Sleeping 3 second(s)...Sleeping 2 second(s)...


Sleeping 1 second(s)...
Done Sleeping...5
Done Sleeping...4
Done Sleeping...3
Done Sleeping...2
Done Sleeping...1
Finished in 5.05 second(s)
