# Threading vs Multiprocessing

Threads and multiple processes are two common approaches to run code in parallel to achieve multitasking and speed up the program.

## Process

Process is an instance of a program

Advantages
* Takes advantage of multiple CPUs and cores
* Separate memory space-->memory is not shared between processes
* Great for CPU-bound processing
* New process is started independently from other processes
* Killable/interruptable

Disadvantages
* Heavyweight
* Starting a process is slower than starting a thread
* More memory

In [1]:
from multiprocessing import Process
import os
import time

Create dummy function to show how to use multiprocessor and threading

In [2]:
def square_numbers():
    for i in range(10):
        i*i
        # wait some time to show the different processes
        time.sleep(0.1)


Show the simple usage of multiprocessor

In [3]:

processes=[]
# count the number of CPU in the machine
num_processes=os.cpu_count()
print('total number of CPU is: '+str(num_processes))

# create processes
for i in range(num_processes):
    p=Process(target=square_numbers)
    processes.append(p)

# start the process
for p in processes:
    p.start()

#join
for p in processes:
    p.join()

print('end main')


total number of CPU is: 8
end main


## Threads

Threads is an entity within a process that can be schedule. A process can spawn multiple threads.

Advantages
* Lightweight
* All threads within a process share the same memory
* Starting a thread is faster than starting a process
* Great for I/O-bound tasks

Disadvantages
* Only one thread at a time
* No effect for CPU-bound tasks
* Not interruptable/killable

In [4]:
from threading import Thread
import os
import time

Show the simple usage of threading

In [5]:
threads=[]
num_threads=10

# create processes
for i in range(num_threads):
    t=Thread(target=square_numbers)
    threads.append(t)

# start the process
for t in threads:
    t.start()

#join
for t in threads:
    t.join()

print('end main')

end main


More example of Multithreading. Use Lock to prevent another thread to access the code at the same time (race condition).

In [8]:
from threading import Lock

# define global variable
database_value = 0

#define target function
def increase(lock):
    global database_value 

    lock.acquire()

    local_copy=database_value
    # processing
    local_copy+=1
    time.sleep(0.1)
    database_value=local_copy
    # release lock every time!
    lock.release()

if __name__=="__main__":

    lock=Lock()

    print('start value',database_value)

    # comma to tell python that argument is a tuple
    thread1=Thread(target=increase,args=(lock,))
    thread2=Thread(target=increase,args=(lock,))

    thread1.start()
    thread2.start()

    thread1.join()
    thread2.join()
    
    print('end value',database_value)

    print('end main')

start value 0
end value 2
end main


Use lock as a contect manager.

In [9]:
from threading import Lock

# define global variable
database_value = 0

#define target function
def increase(lock):
    global database_value 

    with lock:
        local_copy=database_value
        # processing
        local_copy+=1
        time.sleep(0.1)
        database_value=local_copy

if __name__=="__main__":

    lock=Lock()

    print('start value',database_value)

    # comma to tell python that argument is a tuple
    thread1=Thread(target=increase,args=(lock,))
    thread2=Thread(target=increase,args=(lock,))

    thread1.start()
    thread2.start()

    thread1.join()
    thread2.join()
    
    print('end value',database_value)

    print('end main')

start value 0
end value 2
end main


Queues are excellent for thread-safe/process-safe data exchanges and data processing both in a multithreaded and a multiprocessing environment.

In [11]:
from queue import Queue
import time

def worker(q,lock):
    while True:
        value=q.get()

        #processing...
        print(f'in {current_thread().name} got {value}')
        q.task_done()

if __name__=="__main__":

    # create Queue object
    q=Queue()

    num_threads=10
    for i in range(num_threads):
        thread=Thread(target=worker)
        #use deamon thread
        thread.daemon=True
        thread.start()

    print('end main')

1
end main
