# Monte-Carlo Estimate of $\pi$

We want to estimate the number $\pi$ using a [Monte-Carlo method](https://en.wikipedia.org/wiki/Pi#Monte_Carlo_methods) exploiting that the area of a quarter circle of unit radius is $\pi/4$ and that hence the probability of any randomly chosen point in a unit square to lie in a unit circle centerd at a corner of the unit square is $\pi/4$ as well.  So for N randomly chosen pairs $(x, y)$ with $x\in[0, 1)$ and $y\in[0, 1)$, we count the number $N_{circ}$ of pairs that also satisfy $(x^2 + y^2) < 1$ and estimage $\pi \approx 4 \cdot N_{circ} / N$.

[<img src="https://upload.wikimedia.org/wikipedia/commons/8/84/Pi_30K.gif" 
     width="50%" 
     align=top
     alt="PI monte-carlo estimate">](https://en.wikipedia.org/wiki/Pi#Monte_Carlo_methods)

## Core Lessons

- short Dask recap (assuming that `LocalCluster`, `Client`, and `dask.array` are familiar)
- Scaling (local) clusters
- Adaptive (local) clusters

## Set up a local cluster

In [1]:
from dask.distributed import LocalCluster, Client

In [2]:
cluster = LocalCluster(n_workers=1, threads_per_worker=1, memory_limit=1e9)
client = Client(cluster)
client

0,1
Client  Scheduler: tcp://127.0.0.1:41403  Dashboard: http://127.0.0.1:8787/status,Cluster  Workers: 1  Cores: 1  Memory: 1000.00 MB


## The Monte Carlo Method

In [3]:
import dask.array as da
import numpy as np

In [4]:
def calc_pi_mc(size_in_bytes):
    """Calculate PI using a Monte Carlo estimate."""
    xy = da.random.uniform(0, 1,
                           size=(int(size_in_bytes / 8 / 2), 2),
                           chunks=(100e6 / 8, 2))
    
    in_circle = ((xy ** 2).sum(axis=-1) < 1)
    pi = 4 * in_circle.mean()

    return pi.compute()

In [5]:
def print_pi_stats(size, pi, time_delta, num_workers):
    """Print pi, calculate offset from true value, and print some stats."""
    print(f"{size / 1e9} GB\n"
          f"\tMC pi: {pi : 13.11f}"
          f"\tErr: {abs(pi - np.pi) : 10.3e}\n"
          f"\tWorkers: {num_workers}"
          f"\t\tTime: {time_delta : 7.3f}s")

## The actual calculations

We loop over different volumes of double-precision random numbers and estimate $\pi$ as described above.

In [6]:
from time import time

In [7]:
for size in (1e9 * n for n in (1, 2, 3)):
    
    start = time()
    pi = calc_pi_mc(size)
    elaps = time() - start

    print_pi_stats(size, pi,
                   time_delta=elaps,
                   num_workers=len(cluster.workers))

1.0 GB
	MC pi:  3.14127948800	Err:  3.132e-04
	Workers: 1		Time:   5.114s
2.0 GB
	MC pi:  3.14166908800	Err:  7.643e-05
	Workers: 1		Time:  10.141s
3.0 GB
	MC pi:  3.14187464533	Err:  2.820e-04
	Workers: 1		Time:  15.123s


## Scaling the Cluster

We increase the number of workers by 2 and the re-run the experiments.

In [8]:
from time import sleep

In [9]:
new_num_workers = 2 * len(cluster.workers)

print(f"Scaling from {len(cluster.workers)} to {new_num_workers} workers.")

cluster.scale(new_num_workers)

sleep(3)

Scaling from 1 to 2 workers.


In [10]:
client

0,1
Client  Scheduler: tcp://127.0.0.1:41403  Dashboard: http://127.0.0.1:8787/status,Cluster  Workers: 2  Cores: 2  Memory: 2.00 GB


In [11]:
for size in (1e9 * n for n in (1, 2, 3)):
    
    start = time()
    pi = calc_pi_mc(size)
    elaps = time() - start
    print_pi_stats(size, pi, time_delta=elaps,
                   num_workers=len(cluster.workers))

1.0 GB
	MC pi:  3.14160531200	Err:  1.266e-05
	Workers: 2		Time:   3.171s
2.0 GB
	MC pi:  3.14142736000	Err:  1.653e-04
	Workers: 2		Time:   5.252s
3.0 GB
	MC pi:  3.14184736000	Err:  2.547e-04
	Workers: 2		Time:   8.289s


## Automatically Scaling the Cluster

We want each calculation to take approximately the same time irrespective of the actual work load.

_**Watch** how the cluster will scale down to the minimum a few (three!) seconds after being made adaptive._

In [17]:
# Check docstring of distributed.Adaptive for keywords
ca = cluster.adapt(
    minimum=1, maximum=4,
    target_duration="10s",
    scale_factor=1);

sleep(4)  # Allow for scale-down

In [18]:
client

0,1
Client  Scheduler: tcp://127.0.0.1:41403  Dashboard: http://127.0.0.1:8787/status,Cluster  Workers: 1  Cores: 1  Memory: 1000.00 MB


Repeat the calculation from above with larger work loads.  (And watch the dash board!)

In [None]:
for size in (n * 1e9 for n in (2, 4, 8)):
    
    start = time()
    pi = calc_pi_mc(size)
    elaps = time() - start
    
    print_pi_stats(size, pi, time_delta=elaps,
                   num_workers=len(cluster.workers))
    
    sleep(4)  # allow for scale-down time

2.0 GB
	MC pi:  3.14154998400	Err:  4.267e-05
	Workers: 2		Time:   6.208s
4.0 GB
	MC pi:  3.14154673600	Err:  4.592e-05
	Workers: 3		Time:   8.414s
8.0 GB
	MC pi:  3.14142765600	Err:  1.650e-04
	Workers: 4		Time:  12.693s


## Complete listing of software used here

In [15]:
%pip list

Package            Version 
------------------ --------
asn1crypto         0.24.0  
attrs              19.1.0  
backcall           0.1.0   
bleach             3.1.0   
bokeh              1.2.0   
certifi            2019.3.9
cffi               1.12.3  
chardet            3.0.4   
Click              7.0     
cloudpickle        1.2.1   
conda              4.6.14  
cryptography       2.7     
cytoolz            0.9.0.1 
dask               1.2.2   
dask-jobqueue      0.5.0   
decorator          4.4.0   
defusedxml         0.5.0   
distributed        1.28.1  
docrep             0.2.7   
entrypoints        0.3     
heapdict           1.0.0   
idna               2.8     
ipykernel          5.1.1   
ipython            7.5.0   
ipython-genutils   0.2.0   
ipywidgets         7.4.2   
jedi               0.13.3  
Jinja2             2.10.1  
jsonschema         3.0.1   
jupyter-client     5.2.4   
jupyter-core       4.4.0   
jupyterlab         0.35.6  
jupyterlab-server  0.2.0   
locket             0

In [16]:
%conda list --explicit

# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
@EXPLICIT
https://conda.anaconda.org/conda-forge/linux-64/ca-certificates-2019.6.16-hecc5488_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libgcc-ng-8.2.0-hdf63c60_1.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/libgfortran-3.0.0-1.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libstdcxx-ng-8.2.0-hdf63c60_1.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/pandoc-2.7.3-0.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/jpeg-9c-h14c3975_1001.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libffi-3.2.1-hd88cf55_4.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/libsodium-1.0.16-h14c3975_1001.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/lz4-c-1.8.3-he1b5a44_1001.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/ncurses-6.1-he6710b0_1.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/openblas-0.3.