### 2. Multithreading

> 1. What is a thread?  
> &rarr; Thread is the smallest unit (taks or program) that is scheduled to be done by an Operating System.
  
> 2. When it is good for?   
> &rarr; It is good for Network and I/O bound problem such as download images from a website.  
> &rarr; Especially good to use on tasks that run pretty much independent of each other.
  
> 3. Thread properties:  
    One or multi threads can run parallel on a process (CPU).  
    Threads in same process (CPU) share the state and space (memory).  
    All threads should communicate back to the main thread.

In [1]:
import threading
from threading import Thread
import time

#### Get main thread

In [3]:
print('main thread = ', threading.main_thread())

main thread =  <_MainThread(MainThread, started 10188)>


#### Get list of all active thread

In [4]:
for thread in threading.enumerate():
    print('Thread name is {}'.format(thread.getName()))

Thread name is MainThread
Thread name is Thread-4
Thread name is Thread-5
Thread name is IPythonHistorySavingThread
Thread name is Thread-3


#### Get number of active thread

In [5]:
threading.active_count()

5

#### Use Timer() to run a new thread at a specific time

In [6]:
def to_do():
    print("It will print after 3 seconds!")

thread = threading.Timer(3, to_do)
thread.start()

It will print after 3 seconds!


#### Thread: simple example

In [8]:
def make_sleep(i):
    print('Thread {} going to sleep for 3 seconds'
          .format(threading.current_thread()))
    time.sleep(3)
    print('Thread {} is now wake up'
          .format(threading.current_thread()))

number_of_threads = 2
for i in range(number_of_threads):
    th = Thread(target=make_sleep,  # target is 'target' function
                args=(i,))          # args: arguments of target function.
                                    # in this case, args is a tuple with extra comma
    th.start()  # initialize the thread that has been bounded to the target function
    print("Current Thread count {} "
          .format(threading.active_count()))


Thread <Thread(Thread-11, started 13552)> going to sleep for 3 seconds
Current Thread count 6 
Thread <Thread(Thread-12, started 13972)> going to sleep for 3 seconds
Current Thread count 7 
Thread <Thread(Thread-11, started 13552)> is now wake up
Thread <Thread(Thread-12, started 13972)> is now wake up


### Normal code vs. Multitheading

In [None]:
import time
from threading import Thread
def do_expensive_computation(values):
    results = []
    for v in values:
        time.sleep(v)  # Assuming this part will take time
        results.append(v*2) # Just an example to return something 
    return results    

values = [1, 2, 3, 4]

#### Normal case

In [None]:
start_time = time.time()
results = do_expensive_computation(values)
end_time=time.time()
print('results = ', results)
print('Total running time = {}'.format(end_time - start_time))

# OUTPUT:
# results =  [2, 4, 6, 8]
# Total running time = 10.0024

#### Single threading: Part 1. Cannot get return values

In [None]:
start_time = time.time()
th = Thread(target=do_expensive_computation, 
            args=(values,))
th.start()
results = th.join()
end_time=time.time()
print('results = ', results)
print('Total running time = {}'.format(end_time - start_time))

# OUTPUT:
# results =  None  <-- CANNOT have a return value
# Total running time = 10.0033: No improvement due to run on ONLY one thread

#### Single threading: Part 2. Using Queue to get return values

In [None]:
import queue
start_time = time.time()

que = queue.Queue()
th = Thread(target=lambda q, 
            arg1: q.put(do_expensive_computation(arg1)), 
            args=(que, values,))

th.start()
th.join()
results = que.get()

end_time=time.time()
print('results = ', results)
print('Total running time = {}'.format(end_time - start_time))

# OUTPUT:
# results =  [2, 4, 6, 8]
# Total running time = 10.004

#### Multithreading: PART 1. Note about joint()

In [None]:
start_time = time.time()
threads = []  # create a list of threads
list_que = []

list_values = [[1, 4], [2, 3]]  # change values above to list of values
                                # Here we run TWO thread so we divide it as two sublist.
que = queue.Queue()
for values in list_values: 
    th = Thread(target=lambda q, 
                arg1: q.put(do_expensive_computation(arg1)), 
                args=(que, values,))
    th.start()
    threads.append(th)
    list_que.append(que)

# Now join the output
for th in threads:
    th.join()
 
# NOTE: we get que from list_que
results = [que.get() for que in list_que]

end_time=time.time()
print('results = ', results)
print('Total running time = {}'.format(end_time - start_time))

# OUTPUT:
# results =  [[4, 6], [2, 8]]
# Total running time = 5.002: By using 2 thread with smart divide the list of values. It speeds up 2 times.

#### Multithreading: PART 2. NOTE about joint()

> If not dividing the list of values smartly, it will take longer time than PART 1  
> Because of calling join() prevents the program from progressing until all process are finished.

In [None]:
import queue
start_time = time.time()
threads = []  # create a list of threads
list_que = []

list_values = [[1, 2], [3, 4]]  # we run TWO thread and we simply divide it as two sublist.

que = queue.Queue()
for values in list_values: 
    th = Thread(target=lambda q, 
                arg1: q.put(do_expensive_computation(arg1)), 
                args=(que, values,))
    th.start()
    threads.append(th)
    list_que.append(que)

# Now join the output
for th in threads:
    th.join()
 
# NOTE: we get que from list_que
results = [que.get() for que in list_que]

end_time=time.time()
print('results = ', results)
print('Total running time = {}'.format(end_time - start_time))

# OUTPUT:
# results =   [[2, 4], [6, 8]]
# Total running time = 7.005: Because first thread takes around 3s but the second thread takes around 7s

#### Multithreading: PART 3. Using ThreadPool()
> Note:  Tensorflow uses a thread pool to transform data in parallel. https://www.tensorflow.org/api_docs/python/tf/data/Dataset#interleave

In [None]:
from multiprocessing.pool import ThreadPool
start_time = time.time()

list_values = [[1, 2], [3, 4]]  # we run TWO thread and we simply divide it as two sublist.
num_of_threads = len(list_values)
results = []

pool = ThreadPool(num_of_threads)

for i in range(0, num_of_threads):
    values = list_values[i] 
    results.append(pool.apply_async(do_expensive_computation, (values,))) 

results = [r.get() for r in results]
pool.close()
pool.join()
end_time=time.time()
print('results = ', results)
print('Total running time = {}'.format(end_time - start_time))

# OUTPUT:
# results =   [[2, 4], [6, 8]]
# Total running time = 7.006: Because first thread takes around 3s but the second thread takes around 7s