### Homework 07: Concurrency

## Due Date: Apr 5, 2023, 11:59pm

#### Firstname Lastname: Ching-Tsung (Deron) Tsai

#### E-mail: ct2840@nyu.edu

#### Enter your solutions and submit this notebook


---

**Problem 1** **(60 Points)**

Let us consider the Gamma function, or the Euler integral of the second kind: 

$$\Gamma(x) = \int_{0} ^ \infty t ^{x - 1} e^{-t} dt, $$

and in this HW we consider real $x > 0$.

(Here is more on the Gamma function https://en.wikipedia.org/wiki/Gamma_function .
It is not needed for this HW assignment.) 

**1.1 (Points 15)**: 

Write a function (in the cell below) that sequentially calculates the given Gamma integral.


In [6]:
import numpy as np
import math 
from time import time
def calculate_gamma(x, bound_1, bound_2, number_of_steps):
    # sequential version to calculate Gamma(x):
    # where we approximate the given integral,
    # like this a discrete sum in number_of_steps
    # equidistant points on the interval [bound_1, bound_2]
    start_t = time()
    output = 0
    step = (bound_2-bound_1)/number_of_steps     # size of step 
    ts = np.linspace(bound_1, bound_2, number_of_steps)  # the input t
    for t in ts:
        output += step * (t**(x-1)*math.exp(-t))
    print('Time spent without concurrency: %f,\nEstimation: %f'%((time() - start_t), output))
    return output


**1.2 (Points 5)** 

Evaluate, $\Gamma(6)$ by using `calculate_gamma(x, bound_1, bound_2, number_of_steps)` and the error of this computation.


As arguments, use `x=6, bound_1=0, bound_2=1000, number_of_steps=10_000_000`. We know that $\Gamma(x) = x!$, so $\Gamma(6) = 5! = 120$. 


In [7]:
gamma_estimate = calculate_gamma(x=6, bound_1=0, bound_2=1000, number_of_steps=10_000_000)
print('The error is %f'%(120-gamma_estimate))

Time spent without concurrency: 6.003440,
Estimation: 119.999988
The error is 0.000012


---

Write two functions to calculate $\Gamma(x)$ by using:



**1.3.1 (Points 15)**
**threading** with N=4 threads; 

**1.3.2 (Points 15)**
**multiprocessing** with N=4 processes. 


**1.3.3 (Points 10)** 
Compare the times of the three versions and write a short explanation of what you are observing.

How does the answer change when N=8 and why?

    

In [8]:
def calculate_gamma_small(x, steps):  # calculate gamma in smaller range
    output = 0
    number_of_steps = len(steps)
    bound_1, bound_2 = steps[0], steps[-1]
    step = (bound_2-bound_1)/number_of_steps     # size of step 
    ts = np.linspace(bound_1, bound_2, number_of_steps)  # the input t
    for t in ts:
        output += step * (t**(x-1)*math.exp(-t))
    return output
from functools import partial
calculate_gamma_small_new = partial(calculate_gamma_small, 6) # set fix input x

In [9]:
# inputs:
num_threads = 4
# seperate the steps based on num_threads:
bound_1=0
bound_2=1000
number_of_steps=10_000_000
new_bounds = np.array_split(np.linspace(bound_1, bound_2, number_of_steps), num_threads)
# 1.3.1 threading:
start_t = time()
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=num_threads) as ex:
    results_mt = ex.map(calculate_gamma_small_new, new_bounds)
print('Time spent with multithread: %f,\nEstimation: %f'%((time() - start_t), sum(results_mt)))

Time spent with multithread: 5.872165,
Estimation: 119.999952


In [10]:
# 1.3.2 multiprocessing:
import multiprocess as mp

# multiprocessing:
start_t = time()
with mp.Pool(num_threads) as p:
    results_mp = p.map(calculate_gamma_small_new, new_bounds)
print('Time spent with multiprocessing: %f,\nEstimation: %f'%((time() - start_t), sum(results_mp)))

Time spent with multiprocessing: 1.623876,
Estimation: 119.999952


### 1.3.3 (a):
Using the result from above chunks, multiprocessing had a significant improvement with time=1.571588 while multithreading has similar time efficiency (5.377434 -> 5.266205)

### 1.3.3 (b):

The below samples had similar result with number of threads = 4. I thinks the time is too short to see a difference

In [11]:
calculate_gamma(x=6, bound_1=0, bound_2=1000, number_of_steps=10_000_000)
num_threads = 8
start_t = time()
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=num_threads) as ex:
    results_mt = ex.map(calculate_gamma_small_new, new_bounds)
print('Time spent with multithread: %f,\nEstimation: %f'%((time() - start_t), sum(results_mt)))

# multiprocessing:
start_t = time()
with mp.Pool(num_threads) as p:
    results_mp = p.map(calculate_gamma_small_new, new_bounds)
print('Time spent with multiprocessing: %f,\nEstimation: %f'%((time() - start_t), sum(results_mp)))



Time spent without concurrency: 5.531827,
Estimation: 119.999988
Time spent with multithread: 5.427624,
Estimation: 119.999952
Time spent with multiprocessing: 1.620782,
Estimation: 119.999952


---

**Problem 2 (40 points)**

__Website uptime__ is the time that a website or web service is available to the users over a given period.

The task is to build an application that checks the uptime of websites. 

- The application should go over a list of website URLs and checks if those websites are up.
- Instead of performing a classic HTTP GET request, it performs a HEAD request so that it does not affect traffic significantly.
- If the HTTP status is in the danger ranges (400+, 500+), a message is casted. 

Here are some useful functions:

In [12]:
#### _website uptimer_ ####

import time
import logging
import requests
 
class WebsiteDownException(Exception):
    pass
 
def ping_website(address, timeout=20):
    """
    Check if a website is down. A website is considered down 
    if either the status_code >= 400 or if the timeout expires
     
    Throw a WebsiteDownException if any of the website down conditions are met
    """
    try:
        response = requests.head(address, timeout=timeout)
        if response.status_code >= 400:
            logging.warning("Website %s returned status_code=%s" % (address, response.status_code))
            raise WebsiteDownException()
    except requests.exceptions.RequestException:
        logging.warning("Timeout expired for website %s" % address)
        raise WebsiteDownException()
         
def check_website(address):
    """
    Utility function: check if a website is down, if so, notify the user
    """
    try:
        ping_website(address)
    except WebsiteDownException:
        print('The website ' + address + ' is down')
        

---

You need a website list to try our system out. Create your own list or use the following one. 

---

In [13]:
WEBSITE_LIST = [
    'http://amazon.co.uk',
    'http://amazon.com',
    'http://facebook.com',
    'http://google.com',
    'http://google.fr',
    'http://google.es',
    'http://google.co.uk',
    'http://gmail.com',
    'http://stackoverflow.com',
    'http://github.com',
    'http://heroku.com',
    'http://really-cool-available-domain.com',  
    'http://djangoproject.com',
    'http://rubyonrails.org',
    'http://basecamp.com',
    'http://trello.com',
    'http://shopify.com',
    'http://another-really-interesting-domain.co', 
    'http://airbnb.com',
    'http://instagram.com',
    'http://snapchat.com',
    'http://youtube.com',
    'http://baidu.com',
    'http://yahoo.com',
    'http://live.com',
    'http://linkedin.com',
    'http://netflix.com',
    'http://wordpress.com',
    'http://bing.com',
]

---

A serial version of the _website uptimer_ can be written as: 

---


In [14]:
import time
 
start_time = time.time()
 
for address in WEBSITE_LIST:
    check_website(address)
         
end_time = time.time()        
 
print("Time for multithreading: %ssecs" % (end_time - start_time))



The website http://really-cool-available-domain.com is down




The website http://another-really-interesting-domain.co is down
Time for multithreading: 3.830073833465576secs


You should build two versions of the **website uptimer**, by using:

**2.1 (Points 15)**
**threading** with N=4 threads; 

**2.2 (Points 15)**
**multiprocessing** with N=4 processes. 


**2.3 (Points 10)** 

Compare the times of the three versions and write a short explanation of what you are observing.

How does the answer change when N=8 and why?


In [15]:
# 2.1 multithreading
from queue import Queue
from threading import Thread
from time import time
def check_website_thread(q):
    """
    Utility function: check if a website is down, if so, notify the user
    """
    while True:
        address = q.get()
        check_website(address)
        q.task_done()
start_time = time()
q = Queue()
num_threads = 4
for _ in range(num_threads):
    worker = Thread(target=check_website_thread, args=(q,))
    worker.setDaemon(True)
    worker.start()
for address in WEBSITE_LIST:  
    q.put(address)
q.join()
print("Time for multiprocessing: %ssecs" % (time() - start_time))



The website http://really-cool-available-domain.com is down




The website http://another-really-interesting-domain.co is down
Time for multiprocessing: 0.920421838760376secs


In [16]:
# 2.2 multiprocessing:
start_time = time()
with mp.Pool(num_threads) as p:    # don't need to close
    results_mp = p.map(check_website, WEBSITE_LIST) 
print("Time for Serial: %ssecs" % (time() - start_time))




The website http://really-cool-available-domain.com is down




The website http://another-really-interesting-domain.co is down




The website http://baidu.com is down
Time for Serial: 1.205902099609375secs


### 2.3

Both multithreading & multiprocessing improves the result with better time efficiency in multithreading and multithreading performs the best. When turning threads to 8, both of them have greate improvement. This is because most of the time we spent were waiting for website response. Multithreading thus can have great performance


In [17]:
start_time = time()
q = Queue()
num_threads = 8
for _ in range(num_threads):
    worker = Thread(target=check_website_thread, args=(q,))
    worker.setDaemon(True)
    worker.start()
for address in WEBSITE_LIST:  
    q.put(address)
q.join()
print("Time for multithreading: %ssecs" % (time() - start_time))

start_time = time()
with mp.Pool(num_threads) as p:    # don't need to close
    results_mp = p.map(check_website, WEBSITE_LIST) 
print("Time for multiprocessing: %ssecs" % (time() - start_time))



The website http://really-cool-available-domain.com is down
The website http://another-really-interesting-domain.co is down
Time for multithreading: 0.7011837959289551secs




The website http://really-cool-available-domain.com is down




The website http://another-really-interesting-domain.co is down
Time for multiprocessing: 0.6960799694061279secs
