### Homework 07: Concurrency

## Due Date: Apr 5, 2021, 04:00pm

#### Firstname Lastname: WenxinZhang

#### E-mail: wz2164@nyu.edu

#### Enter your solutions and submit this notebook


In [1]:
from time import time
import numpy as np
import math
from multiprocess import Pool
from queue import Queue
from threading import Thread, Lock
import logging  
import requests

---

**Problem 1** **(60 Points)**

Let us consider the Gamma function, or the Euler integral of the second kind: 

$$\Gamma(x) = \int_{0} ^ \infty t ^{x - 1} e^{-t} dt, $$

and in this HW we consider real $x > 0$.

(Here is more on the Gamma function https://en.wikipedia.org/wiki/Gamma_function .
It is not needed for this HW assignment.) 

**1.1 (Points 15)**: 

Write a function (in the cell below) that sequentially calculates the given Gamma integral.


# Answer to 1.1

In [2]:
def calculate_gamma(x, bound_1, bound_2, number_of_steps):
    # sequential version to calculate Gamma(x):
    # where we approximate the given integral,
    # like this a discrete sum in number_of_steps
    # equidistant points on the interval [bound_1, bound_2]
    
    # return Gamma(x)
    gamma = 0
    for t in np.linspace(bound_1, bound_2, number_of_steps):
        gamma += ((bound_2-bound_1)/number_of_steps) * t ** (x-1) * math.exp(-t) 
    return gamma

**1.2 (Points 5)** 

Evaluate, $\Gamma(6)$ by using `calculate_gamma(x, bound_1, bound_2, number_of_steps)` and the error of this computation.


As arguments, use `x=6, bound_1=0, bound_2=1000, number_of_steps=10_000_000`. We know that $\Gamma(x) = x!$, so $\Gamma(6) = 5! = 120$. 


# Answer to 1.2

In [3]:
x = 6 
bound_1 = 0
bound_2 = 1000
number_of_steps = 10_000_000

In [4]:
# sequentially calculate the given Gamma integral
start = time()
evaluated_gamma = calculate_gamma(x=x, bound_1=bound_1, bound_2=bound_2, number_of_steps=number_of_steps)
print(f'The evaluated gamma equals to {evaluated_gamma}')
print(f'The error of this computation equals to {evaluated_gamma-120}')
end = time()
print(f'The time needed to sequentially calculate the given Gamma integral is: {end-start} seconds')

The evaluated gamma equals to 119.99998799994694
The error of this computation equals to -1.2000053061456128e-05
The time needed to sequentially calculate the given Gamma integral is: 9.172012090682983 seconds


---

Write two functions to calculate $\Gamma(x)$ by using:



**1.3.1 (Points 15)**
**threading** with N=4 threads; 

**1.3.2 (Points 15)**
**multiprocessing** with N=4 processes. 


**1.3.3 (Points 10)** 
Compare the times of the three versions and write a short explanation of what you are observing.

How does the answer change when N=8 and why?

    

# Answer to 1.3.1

**1.3.1 (Points 15)**
**threading** with N=4 threads; 

In [5]:
lock = Lock()
gamma = 0 
def sum_chunk_thread(q):
    while True:
        global gamma
        x, bound_1, bound_2, number_of_steps = q.get()
        for t in np.linspace(bound_1, bound_2, number_of_steps):
            lock.acquire()
            gamma += ((bound_2-bound_1)/number_of_steps) * t ** (x-1) * math.exp(-t) 
            lock.release()
        q.task_done()
    return gamma

In [6]:
# N = 4
# Threading
num_thread = 4
chunks = [[x, i*(bound_2-bound_1)/num_thread, (i+1)*(bound_2-bound_1)/num_thread, int(number_of_steps/num_thread)] for i in range(num_thread)]


q = Queue()
start = time()
for i in range(num_thread):
    worker = Thread(target=sum_chunk_thread, args=(q,), name='thread_' + str(i))
    worker.setDaemon(True)  
    worker.start()         
    

for chunk in chunks:
    q.put(chunk)
q.join()
end = time()


print(f'The evaluated gamma equals to {gamma}')
print(f'The error of this computation equals to {gamma-120}')
print(f'The time needed to sequentially calculate the given Gamma integral is: {end-start} seconds')  

The evaluated gamma equals to 119.99995199994225
The error of this computation equals to -4.800005774541205e-05
The time needed to sequentially calculate the given Gamma integral is: 11.765336990356445 seconds


# Answer to 1.3.2

**1.3.2 (Points 15)**
**multiprocessing** with N=4 processes. 

In [7]:
def sum_chunk_multiprocesses(input):
    x, bound_1, bound_2, num_of_steps = input
    gamma = 0
    for t in np.linspace(bound_1, bound_2, number_of_steps):
        gamma += ((bound_2-bound_1)/number_of_steps) * t ** (x-1) * math.exp(-t) 
    return gamma

In [8]:
# N = 4
# Multiprocessing
num_process = 4
chunks = [[x, i*(bound_2-bound_1)/num_process, (i+1)*(bound_2-bound_1)/num_process, int(number_of_steps/num_process)] for i in range(num_process)]


start = time()
with Pool(num_process) as pool:
    results = pool.map(sum_chunk_multiprocesses, chunks)
end = time()


print(f'The evaluated gamma equals to {sum(results)}')
print(f'The error of this computation equals to {sum(results)-120}')
print(f'The time needed to sequentially calculate the given Gamma integral is: {end-start} seconds') 

The evaluated gamma equals to 119.99998799977556
The error of this computation equals to -1.2000224444363994e-05
The time needed to sequentially calculate the given Gamma integral is: 11.823247194290161 seconds


# Answer to 1.3.3

**1.3.3 (Points 10)** 
Compare the times of the three versions and write a short explanation of what you are observing.

How does the answer change when N=8 and why?

### Compare the times of the three versions: 

- The sequentially calculated version requires nearly 9.21 seconds.
- The threading version (with N = 4 threads) requires nearly 11.77 seconds.
- The multiprocessing version (with N = 4 processes) requires nearly 11.82 seconds. 
---
- I observe that neiter the threading version nor the multiprocessing version increases the speed of calculating gammma.
- This phenomenon can be explained by the fact that both of these are not really parallel. Since python has a GIL, this code is cocurrent but not parallel.

### How does the answer change when N = 8: 
- The threading version (with N = 8 threads) requires nearly 12.39 seconds.
- The multiprocessing version (with N = 8 processes) requires nearly 25.33 seconds. 
--- 
- I obeserve that the time required for the threading version and the multiprocessing version both increase.
- This phenomenon can be explained by the fact that the new task(N=8) is CPU bound. Thus using the threading or processing module in Python with a GIL could actually result in reduced performance.


In [9]:
# N = 8
# Threading
num_thread = 8
chunks = [[x, i*(bound_2-bound_1)/num_thread, (i+1)*(bound_2-bound_1)/num_thread, int(number_of_steps/num_thread)] for i in range(num_thread)]


q = Queue()
start = time()
for i in range(num_thread):
    worker = Thread(target=sum_chunk_thread, args=(q,), name='thread_' + str(i))
    worker.setDaemon(True)  
    worker.start()         
    

for chunk in chunks:
    q.put(chunk)
q.join()
end = time()


print(f'The evaluated gamma equals to {gamma}')
print(f'The error of this computation equals to {gamma-120}')
print(f'The time needed to sequentially calculate the given Gamma integral is: {end-start} seconds')  

The evaluated gamma equals to 239.99985599982531
The error of this computation equals to 119.99985599982531
The time needed to sequentially calculate the given Gamma integral is: 12.390151977539062 seconds


In [10]:
# N = 8
# Multiprocessing
num_process = 8
chunks = [[x, i*(bound_2-bound_1)/num_process, (i+1)*(bound_2-bound_1)/num_process, int(number_of_steps/num_process)] for i in range(num_process)]


start = time()
with Pool(num_process) as pool:
    results = pool.map(sum_chunk_multiprocesses, chunks)
end = time()


print(f'The evaluated gamma equals to {sum(results)}')
print(f'The error of this computation equals to {sum(results)-120}')
print(f'The time needed to sequentially calculate the given Gamma integral is: {end-start} seconds') 

The evaluated gamma equals to 119.99998799954126
The error of this computation equals to -1.2000458738725683e-05
The time needed to sequentially calculate the given Gamma integral is: 25.326115131378174 seconds


---

**Problem 2 (40 points)**

__Website uptime__ is the time that a website or web service is available to the users over a given period.

The task is to build an application that checks the uptime of websites. 

- The application should go over a list of website URLs and checks if those websites are up.
- Instead of performing a classic HTTP GET request, it performs a HEAD request so that it does not affect traffic significantly.
- If the HTTP status is in the danger ranges (400+, 500+), a message is casted. 

Here are some useful functions:

In [11]:
#### _website uptimer_ ####
class WebsiteDownException(Exception):
    pass
 
def ping_website(address, timeout=20):
    """
    Check if a website is down. A website is considered down 
    if either the status_code >= 400 or if the timeout expires
     
    Throw a WebsiteDownException if any of the website down conditions are met
    """
    try:
        response = requests.head(address, timeout=timeout)
        if response.status_code >= 400:
            logging.warning("Website %s returned status_code=%s" % (address, response.status_code))
            raise WebsiteDownException()
    except requests.exceptions.RequestException:
        logging.warning("Timeout expired for website %s" % address)
        raise WebsiteDownException()
         
def check_website(address):
    """
    Utility function: check if a website is down, if so, notify the user
    """
    try:
        ping_website(address)
    except WebsiteDownException:
        print('The websie ' + address + ' is down')

---

You need a website list to try our system out. Create your own list or use the following one. 

---

In [12]:
WEBSITE_LIST = [
    'http://amazon.co.uk',
    'http://amazon.com',
    'http://facebook.com',
    'http://google.com',
    'http://google.fr',
    'http://google.es',
    'http://google.co.uk',
    'http://gmail.com',
    'http://stackoverflow.com',
    'http://github.com',
    'http://heroku.com',
    'http://really-cool-available-domain.com',
    'http://djangoproject.com',
    'http://rubyonrails.org',
    'http://basecamp.com',
    'http://trello.com',
    'http://shopify.com',
    'http://another-really-interesting-domain.co',
    'http://airbnb.com',
    'http://instagram.com',
    'http://snapchat.com',
    'http://youtube.com',
    'http://baidu.com',
    'http://yahoo.com',
    'http://live.com',
    'http://linkedin.com',
    'http://netflix.com',
    'http://wordpress.com',
    'http://bing.com',
]

---

A serial version of the _website uptimer_ can be written as: 

---


In [13]:
# The serial version
start = time()
for address in WEBSITE_LIST:
    check_website(address)
end = time()        
 
    
print(f'Website uptimer of the serial version takes {end-start} seconds')



The websie http://facebook.com is down




The websie http://google.com is down




The websie http://google.fr is down




The websie http://google.es is down




The websie http://google.co.uk is down




The websie http://gmail.com is down




The websie http://really-cool-available-domain.com is down




The websie http://another-really-interesting-domain.co is down




The websie http://instagram.com is down




The websie http://youtube.com is down




The websie http://netflix.com is down




The websie http://wordpress.com is down
Website uptimer of the serial version takes 148.788743019104 seconds


You should build two versions of the **website uptimer**, by using:

**2.1 (Points 15)**
**threading** with N=4 threads; 

**2.2 (Points 15)**
**multiprocessing** with N=4 processes. 


**2.3 (Points 10)** 

Compare the times of the three versions and write a short explanation of what you are observing.

How does the answer change when N=8 and why?


# Answer to 2.1

**2.1 (Points 15)**
**threading** with N=4 threads; 

In [14]:
def check_websites(q):
    while True:
        address=q.get()
        try:
            ping_website(address)
        except WebsiteDownException:
            print('The websie ' + address + ' is down')
        q.task_done()

In [15]:
# N = 4
# Threading
start = time()
q = Queue()
num_threads = 4
for _ in range(num_threads):
    worker = Thread(target=check_websites, args=(q, ))
    worker.setDaemon(True)
    worker.start()


for address in WEBSITE_LIST:
    q.put(address)
q.join()
end = time()


print(f'Website uptimer takes {end-start} seconds using threading with N = 4 threads')



The websie http://facebook.com is down
The websie http://google.com is down




The websie http://google.fr is down
The websie http://google.es is down




The websie http://really-cool-available-domain.com is down




The websie http://another-really-interesting-domain.co is down




The websie http://google.co.uk is down
The websie http://gmail.com is down




The websie http://netflix.com is down
The websie http://wordpress.com is down




The websie http://instagram.com is down




The websie http://youtube.com is down
Website uptimer takes 60.049705028533936 seconds using threading with N = 4 threads


# Answer to 2.2


**2.2 (Points 15)**
**multiprocessing** with N=4 processes. 

In [16]:
# N = 4
# Multiprocessing
start = time()
num_process = 4
with Pool(num_process) as p:
     results = p.map(check_website, WEBSITE_LIST)      
end = time()     

 
print(f'Website uptimer takes {end-start} seconds using multiprocessing with N = 4 processes')



The websie http://really-cool-available-domain.com is down




The websie http://another-really-interesting-domain.co is down




The websie http://google.fr is down




The websie http://google.co.uk is down




The websie http://facebook.com is down




The websie http://google.com is down




The websie http://instagram.com is down




The websie http://netflix.com is down




The websie http://wordpress.com is down




The websie http://google.es is down




The websie http://gmail.com is down




The websie http://youtube.com is down
Website uptimer takes 80.59258794784546 seconds using multiprocessing with N = 4 processes


# Answer to 2.3

**2.3 (Points 10)** 

Compare the times of the three versions and write a short explanation of what you are observing.

How does the answer change when N=8 and why?

### Compare the times of the three versions:

- The serial version requires nearly 148.80 seconds.
- The threading version (with N = 4 threads) requires nearly 60.05 seconds.
- The multiprocessing version (with N = 4 processes) requires nearly 80.59 seconds. 
---
- I observe that both the threading version and the multiprocessing version increase the speed of the website uptimer, while threading even performs better. 
- This phenomenon can be explained by the fact that: although python has a GIL, this HEAD request problem is actually an input/output bound task and the majority of the time is spent waiting for the network. Hence, threading and multiprocessing can provide a large speed increase. Here, though multiprocessing can speed up the task to some extent,  in each process, it still needs to wait until it gets the response from the website, so that it can send the request to the next website. While in threading, once we send the request and wait for the response, the next thread is ready to go. 

### How does the answer change when N=8 and why?

- The threading version (with N = 8 threads) requires nearly 22.14 seconds.
- The multiprocessing version (with N = 8 processes) requires nearly 42.47 seconds. 
---
- I obeserve that the time required for the threading version and the multiprocessing version both decrease.
- This phenomenon can be explained by the fact that the new task(N=8) can reduce more time in the HEAD request process, since more threads and processes can work concurrently.

In [17]:
# N = 8
# Threading
start = time()
q = Queue()
num_threads = 8
for _ in range(num_threads):
    worker = Thread(target=check_websites, args=(q, ))
    worker.setDaemon(True)
    worker.start()


for address in WEBSITE_LIST:
    q.put(address)
q.join()
end = time()


print(f'Website uptimer takes {end-start} seconds using threading with N = 8 threads')



The websie http://google.com is down




The websie http://really-cool-available-domain.com is down




The websie http://another-really-interesting-domain.co is down




The websie http://netflix.com is down
The websie http://wordpress.com is down




The websie http://google.es is down
The websie http://gmail.com is down
The websie http://facebook.com is down
The websie http://google.co.uk is down
The websie http://google.fr is down




The websie http://instagram.com is down




The websie http://youtube.com is down
Website uptimer takes 22.139533042907715 seconds using threading with N = 8 threads


In [18]:
# N = 8
# Multiprocessing
start = time()
num_process = 8
with Pool(num_process) as p:
     results = p.map(check_website, WEBSITE_LIST)      
end = time()     

 
print(f'Website uptimer takes {end-start} seconds using multiprocessing with N = 8 processes')



The websie http://really-cool-available-domain.com is down




The websie http://another-really-interesting-domain.co is down




The websie http://google.fr is downThe websie http://google.es is downThe websie http://facebook.com is downThe websie http://google.co.uk is downThe websie http://gmail.com is down








The websie http://google.com is down




The websie http://wordpress.com is down




The websie http://netflix.com is down




The websie http://instagram.com is down




The websie http://youtube.com is down
Website uptimer takes 42.47116804122925 seconds using multiprocessing with N = 8 processes
