# Multithreading

## 1  Process and Thread

### 1.1 Process
**Process(task): is a program in execution**

Each process is an `independent entity to which system resources` such as CPU time, memory, etc. are allocated and each process is executed in `a separate address space`. 

> 一个进程是一个正在运行的程序, 是资源（CPU、内存等）分配的基本单位

### 1.2 Thread

**A thread is a `separate` flow of execution that can run `concurrently(并发）` to solve a problem in the process.** 

This means that your program will have two things happening at once.

For example

* a Web browser uses one thread to load an image from the Internet while using another thread to format and display text.

More exactly it is `Thread of Execution` which is the **smallest** unit of processing.

>线程: 进程中可以并发运行的独立执行流,是CPU调度和分派的基本单位(程序执行时的最小单位)

### 1.3 The difference between processes and threads


**The `key` difference between processes and threads** is that 

* `multiple` threads `share(共享)` parts of their state. 

* a thread is a particular `execution path(运行路径)` of a process. 
>
>多个线程共享进程资源，每个线程有自己的堆栈和局部变量共享状态
>
>线程包含在进程之中,是进程的一个运行路径

![thread](./img/linux/ThreadDiagram.jpg)



## 2 Python threading module

**Python threading module**

Python includes sophisticated tools for managing concurrent operations using processes and threads. Even many relatively simple programs can be made to run faster by applying techniques for running parts of the job concurrently using these modules.

The **threading** module includes a high-level, object-oriented API for working with concurrency from Python. 

Thread objects run **concurrently** within the same process and share memory. 

### 2.1 Thread Objects

The simplest way to use a Thread is to instantiate it with 

* a **target** function and 

* call **start()** to let it begin working.

The following example's output is five lines with **Worker** on each.

In [None]:
import threading
import time
def worker():
    """thread worker function"""
    time.sleep(0.5)
    print('Worker')

print("Main Begin")
for i in range(5):
    t = threading.Thread(target=worker)
    t.start()
print("Main End")  

### 2.2 arguments

It is useful to be able to spawn a thread and pass **arguments** that tell it which work to do.

* `Any type of object` can be passed as `argument` to the thread. 

The next example passes a number, which the thread then prints.

* The integer argument is now included in the message printed by each thread.


In [None]:
import threading
import time

def worker(num):
    """thread worker function"""
    time.sleep(0.5)
    print('The worker:{:d}'.format(num))

print("Main Begin")

threads=[]
for i in range(5):
    t = threading.Thread(target=worker, args=(i,))
    threads.append(t)
    
for item in threads:
    item.start()
print("Main End")    

### 2.3  join() Method 

In some situations, we will have to **wait for the finalization of a thread**. 

**Join** thread: The function returns when the thread execution has `completed`.

We can use the `join()` method of the Thread class. 

When we call this method using a thread object, it `suspends` the execution of the calling thread until the object called finishes its execution.

>主线程等待子线程的终止
>
>也就是说主线程的代码块中，如果碰到了t.join()方法，此时主线程需要等待（阻塞）
>
>等子线程结束了(Waits for this thread to die.),才能继续执行t.join()之后的代码块。


In [None]:
import threading
import time

def worker(num):
    """thread worker function"""
    time.sleep(0.5)
    print('The worker:{:d}'.format(num))

print("Main Begin")

threads=[]
for i in range(5):
    t = threading.Thread(target=worker, args=(i,))
    threads.append(t)
    
for item in threads:
    item.start()
    item.join()
print("Main End")  

## 3 Thread in C++

### 3.1 std::thread

In C++, threads are created using the **std::thread** class.



In [None]:
%%file ./demo/src/demo_thread.cpp

#include <iostream>
#include <thread> 

using namespace std;

void worker(int a)
{ 
    cout << "the worker thread: " <<a<< endl;
}

int main()
{ cout <<"Main Begin"<<endl;
  // Create and execute the thread
  thread t1(worker,1);
  cout <<"Main End"<<endl;
  return 0;
}

In [None]:
!g++ -o ./demo/bin/demo_thread ./demo/src/demo_thread.cpp

In [None]:
!.\demo\bin\demo_thread

###  3.2 std::thread::join



In [None]:
%%file ./demo/src/demo_thread_join.cpp

// example for thread::join
#include <iostream>      
#include <thread>        
#include <chrono>        
 
void pause_thread(int n) 
{
  std::this_thread::sleep_for (std::chrono::seconds(n));
  std::cout << "pause of " << n << " seconds ended\n";
}
 
int main() 
{
  std::cout << "Spawning 3 threads...\n";
  std::thread t1 (pause_thread,1);
  std::thread t2 (pause_thread,2);
  std::thread t3 (pause_thread,3);
  std::cout << "Done spawning threads. Now waiting for them to join:\n";

  t1.join();
  t2.join();
  t3.join();
  
  std::cout << "All threads joined!\n";
  return 0;
}

In [None]:
!g++ -o ./demo/bin/demo_thread_join ./demo/src/demo_thread_join.cpp

In [None]:
!.\demo\bin\demo_thread_join

## 4 Parallelism and Parallel Computing 


### 4.1 Concurrency vs Parallelism

Concurrency(并发) means multiple tasks which start, run, and complete within the same time frame, in no specific order

Parallelism(并行) is the task of running multiple computations at the same moment. It requires multiple CPU units or cores.

><font color="blue">并发</font>：在**一个时间段**内，有多个任务都处于已启动、运行和运行完成之间. **宏观上**看起来多个任务都在**同一时刻运行**，但多个任务指令在**交织**着运行。
>
><font color="blue">并行</font>：在**同一时刻**，有多个任务**同时**在执行
>
>并发是在**同一时段**发生,并行是在**同一时刻**发生

Concurrent = `Two` queues and `one` coffee machine.

Parallel = `Two` queues and two `coffee` machines.

 ![](./img/linux/con_and_par.jpg)

The term **parallelism** means that an application **splits** its tasks up into smaller **subtasks** which can be processed in parallel, 


### 4.2 Parallel Computing

In the simplest sense, parallel computing is the simultaneous use of **multiple**compute resources to solve a computational problem:

* A problem is broken into discrete parts that can be solved concurrently

* Each part is further broken down to a series of instructions

* Instructions from each part execute simultaneously on different processors

* An overall control/coordination mechanism is employed

![merge sort](./img/linux/parallel_problem.gif)

### 4.3 Parallelizing merge sort

#### 4.3.1 Parallelizing a problem

When you face a complex issue, the first thing to be done is to **decompose** the problem in order to identify parts of it that may be handled **independently**.

In general, the <font color="blue">parallelizable parts</font> in a solution are in pieces that can be **divided** and distributed for them to be processed by <font color="blue">different workers</font>.

The technique of **dividing and conquering（分治）** involves **splitting** the domain `recursively` until **an indivisible unit** of the complete issue is found and solved. 

The **merge sort**(归并排序）algorithm can be resolved by using this approach



#### 4.3.2 Parallelizing merge sort with multi-thread

In [40]:
%%file ./demo/include/merge_sort.h

#ifndef MERGE_SORT_H
#define MERGE_SORT_H

#include <stdlib.h>

#ifdef __cplusplus
extern "C"
{
#endif

void merge(int a[], int iLeft, int middle, int iRight, int work[]);
void merge_sort_range(int a[], int iLeft, int iRight, int work[]);    
void merge_sort(int a[], int size);

#ifdef __cplusplus

}
#endif

#endif



Overwriting ./demo/include/merge_sort.h


In [41]:
%%file ./demo/src/merge_sort.c

#include "merge_sort.h"

// Merge two halves in  [iLeft,middle], [middle+1, iRight]
void merge(int a[], int iLeft, int middle, int iRight, int work[])
{
    int size = iRight - iLeft + 1;
    // 1 comparing the values of elements
    //  Look at the first element of each list, and move the smaller of the two to the end of the result list.
    int iL = iLeft;
    int iR = middle+1;
    int iResult = 0;
    while (iL <= middle && iR <= iRight)
    {
         if (a[iL] <= a[iR])
         {
             work[iResult++] = a[iL++];
          }
          else
          {
             work[iResult++] = a[iR++];
          }
   }
  
  // 2 Copy the remaining left or right into work
   while (iL <= middle)
         work[iResult++] = a[iL++];
   while (iR <= iRight)
         work[iResult++] = a[iR++];
 
 
   // 3 Copy the work back to the original array
   for (iResult = 0, iL = iLeft; iResult < size; ++iResult, ++iL)
   {
      a[iL] = work[iResult];
   }
}

// Sort the given array in [iLeft, iRight]
void merge_sort_range(int a[], int iLeft, int iRight, int work[])
{
   if ((iRight - iLeft) >= 1) { 
      // more than 1 elements, divide and sort
      // Divide into left and right half
      int middle = (iRight + iLeft) / 2;   // truncate
     
      // Recursively sort each half
      merge_sort_range(a, iLeft, middle, work);
      merge_sort_range(a, middle+1, iRight, work);
 
      // Merge two halves
      merge(a, iLeft,middle, iRight, work);
   }
}
 
// Sort the given array of size
void merge_sort(int a[], int size)
{
   int *work;
   work = (int *)malloc(sizeof(int) * size);
   merge_sort_range(a, 0, size - 1, work);
   free(work); 
}


Overwriting ./demo/src/merge_sort.c


In [42]:
%%file ./demo/src/para_merge_sort.cpp
/*
 Parallelizing merge sort
*/
#include <iostream>
#include <thread>
#include <ctime>
#include <atomic>
#include <string>
#include <cstring>
#include<algorithm>

#include "merge_sort.h"


using namespace std;

atomic<int> CPU(8); //Maximum number of threads

void parallel_merge_sort_range(int a[], int l, int r, int b[])
{
    int minparallelsize =1000;
    //int minparallelsize =10;
    if (l >= r)
        return;
    int mid = (l + r) / 2;

    thread LeftThread;
    thread RightThread;
    if (CPU > 0 && (r-l)>minparallelsize )
    {
        //cout <<"Left Thread "<<CPU <<endl;
        CPU--;
        LeftThread = thread(parallel_merge_sort_range, a, l, mid, b);
    }
    else
    {
        //cout <<"Left "<<CPU <<endl;
        merge_sort_range(a, l, mid, b);
    }   
   
    if (CPU > 0 &&(r-l)>minparallelsize)
    {
        //cout <<"Right Thread "<<CPU <<endl;
        CPU--;
        RightThread = thread(parallel_merge_sort_range, a, mid + 1, r, b);
    }
    else
    {
        //cout <<"Right "<<CPU <<endl;
        merge_sort_range(a, mid + 1, r, b);
    }   
    if (LeftThread.joinable())
        LeftThread.join();
    if (RightThread.joinable())
        RightThread.join();
    merge(a, l, mid,r, b);
    CPU++;
    //cout <<"After Merge "<<CPU <<endl;
}

void parallel_merge_sort(int a[], int size)
{
    int *b;
    b = (int *)malloc(sizeof(int) * size);
    parallel_merge_sort_range(a, 0, size - 1, b);
    free(b);
}

void sortingtimes(string sortname, void (*f)(int *a, int size), int *a, int size)
{
    clock_t StartTime, EndTime;
    StartTime = clock();
    f(a, size);
    EndTime = clock();
    cout << "The "<<sortname<<" run times: " << (double)(EndTime - StartTime) << endl;
    cout <<"\t";
    for (int i = 0; i <20; i++)
        cout << a[i] << " ";
    cout << endl;
}


int main()
{
    const int size = 10000000;//1e7 
    int *v;
    int *a;
    v = (int *)malloc(sizeof(int) * size);
    a = (int *)malloc(sizeof(int) * size);
 
    cout <<"SIZE "<<size<<endl;
    srand((unsigned)time(NULL));
    for (int i = 0; i < size; i++)
        v[i] = (int)rand() % size+i;

    cout <<"\t";
    for (int i = 0; i < 20; i++)
        cout << v[i] << " ";
    cout << endl;

    // Merge Sort
    memcpy(a, v, sizeof(int) * size);
    sortingtimes("Merge Sort",merge_sort,a,size);

    // Parallel Merge Sort
    memcpy(a, v, sizeof(int) * size);
    sortingtimes("Parallel Merge Sort",parallel_merge_sort,a,size);
    
    // std::sort
    memcpy(a, v, sizeof(int) * size);
    clock_t StartTime, EndTime;
    StartTime = clock();
    sort(a,a+size);
    EndTime = clock();
    cout << "The std::sort run times: " << (double)(EndTime - StartTime) << endl;
    cout <<"\t";
    for(int i=0;i<20;i++)
        cout<<a[i]<<" ";
    free(a);
    free(v);
}



Overwriting ./demo/src/para_merge_sort.cpp


In [43]:
!g++ -O3 -o ./demo/bin/para_merge_sort ./demo/src/para_merge_sort.cpp ./demo/src/merge_sort.c -I./demo/include

In [44]:
!.\demo\bin\para_merge_sort 

SIZE 10000000
	10691 25439 21949 76 7712 17630 1531 16611 31470 7233 23277 17465 12519 26600 10182 26056 28269 6715 23704 16230 
The Merge Sort run times: 1114
	76 337 364 389 625 636 659 799 801 834 863 872 885 890 898 915 920 968 969 981 
The Parallel Merge Sort run times: 473
	1146186 1148891 1144741 1161860 1148891 1152771 1153223 1161860 1162674 1166390 3772370 3775717 3776141 3777986 3778813 3780236 3780769 3783021 3784460 3785207 
The std::sort run times: 609
	76 337 364 389 625 636 659 799 801 834 863 872 885 890 898 915 920 968 969 981 
