# Monte-Carlo Estimate of $\pi$

We want to estimate the number $\pi$ using a [Monte-Carlo method](https://en.wikipedia.org/wiki/Pi#Monte_Carlo_methods) exploiting that the area of a quarter circle of unit radius is $\pi/4$ and that hence the probability of any randomly chosen point in a unit square to lie in a unit circle centerd at a corner of the unit square is $\pi/4$ as well.  So for N randomly chosen pairs $(x, y)$ with $x\in[0, 1)$ and $y\in[0, 1)$, we count the number $N_{circ}$ of pairs that also satisfy $(x^2 + y^2) < 1$ and estimage $\pi \approx 4 \cdot N_{circ} / N$.

[<img src="https://upload.wikimedia.org/wikipedia/commons/8/84/Pi_30K.gif" 
     width="50%" 
     align=top
     alt="PI monte-carlo estimate">](https://en.wikipedia.org/wiki/Pi#Monte_Carlo_methods)

## Core Lessons

- short Dask recap (assuming that `LocalCluster`, `Client`, and `dask.array` are familiar)
- Scaling (local) clusters
- Adaptive (local) clusters

## Set up a local cluster

In [1]:
from dask.distributed import LocalCluster, Client

cluster = LocalCluster(n_workers=2, threads_per_worker=1, memory_limit=1e9)
client = Client(cluster)
client

0,1
Client  Scheduler: tcp://127.0.0.1:36821  Dashboard: http://127.0.0.1:8787/status,Cluster  Workers: 2  Cores: 2  Memory: 2.00 GB


## The Monte Carlo Method

In [2]:
import dask.array as da
import numpy as np


def calc_pi_mc(size_in_bytes):
    """Calculate PI using a Monte Carlo estimate."""
    xy = da.random.uniform(0, 1,
                           size=(int(size_in_bytes / 8 / 2), 2),
                           chunks=(100e6 / 8, 2))
    
    in_circle = ((xy ** 2).sum(axis=-1) < 1)
    pi = 4 * in_circle.mean()

    return pi.compute()


def print_pi_stats(size, pi, time_delta, num_workers):
    """Print pi, calculate offset from true value, and print some stats."""
    print(f"{size / 1e9} GB\n"
          f"\tMC pi: {pi : 13.11f}"
          f"\tErr: {abs(pi - np.pi) : 10.3e}\n"
          f"\tWorkers: {num_workers}"
          f"\t\tTime: {time_delta : 7.3f}s")

## The actual calculations

We loop over different volumes of double-precision random numbers and estimate $\pi$ as described above.

In [3]:
from time import time, sleep

In [4]:
for size in (1e9 * n for n in (2, 4, 8, 16)):
    
    start = time()
    pi = calc_pi_mc(size)
    elaps = time() - start

    print_pi_stats(size, pi,
                   time_delta=elaps,
                   num_workers=len(cluster.workers))

2.0 GB
	MC pi:  3.14162057600	Err:  2.792e-05
	Workers: 2		Time:   3.544s
4.0 GB
	MC pi:  3.14150860800	Err:  8.405e-05
	Workers: 2		Time:   6.864s
8.0 GB
	MC pi:  3.14157203200	Err:  2.062e-05
	Workers: 2		Time:  13.447s
16.0 GB
	MC pi:  3.14164922400	Err:  5.657e-05
	Workers: 2		Time:  26.375s


## Scaling the Cluster

We increase the number of workers by 2 and the re-run the experiments.

In [5]:
new_num_workers = 2 * len(cluster.workers)

print(f"Scaling from {len(cluster.workers)} to {new_num_workers} workers.")

cluster.scale(new_num_workers)

sleep(10)

Scaling from 2 to 4 workers.


In [6]:
client

0,1
Client  Scheduler: tcp://127.0.0.1:36821  Dashboard: http://127.0.0.1:8787/status,Cluster  Workers: 4  Cores: 4  Memory: 4.00 GB


In [7]:
for size in (1e9 * n for n in (2, 4, 8, 16)):
    
    start = time()
    pi = calc_pi_mc(size)
    elaps = time() - start
    print_pi_stats(size, pi, time_delta=elaps,
                   num_workers=len(cluster.workers))

2.0 GB
	MC pi:  3.14176985600	Err:  1.772e-04
	Workers: 4		Time:   2.259s
4.0 GB
	MC pi:  3.14155955200	Err:  3.310e-05
	Workers: 4		Time:   3.654s
8.0 GB
	MC pi:  3.14164058400	Err:  4.793e-05
	Workers: 4		Time:   7.145s
16.0 GB
	MC pi:  3.14152887600	Err:  6.378e-05
	Workers: 4		Time:  14.237s


## Automatically Scaling the Cluster

We want Dask to choose a cluster size between 1 and 16 workers.

_**Watch** how the cluster will scale down to the minimum a few (three!) seconds after being made adaptive._

In [8]:
# Check docstring of distributed.Adaptive for keywords
ca = cluster.adapt(
    minimum=1, maximum=16);

sleep(4)  # Allow for scale-down

In [9]:
client

0,1
Client  Scheduler: tcp://127.0.0.1:36821  Dashboard: http://127.0.0.1:8787/status,Cluster  Workers: 1  Cores: 1  Memory: 1000.00 MB


Repeat the calculation from above with larger work loads.  (And watch the dash board!)

In [10]:
for size in (n * 1e9 for n in (2, 4, 8, 16, 32)):
    
    start = time()
    pi = calc_pi_mc(size)
    elaps = time() - start
    
    print_pi_stats(size, pi, time_delta=elaps,
                   num_workers=len(cluster.workers))
    
    sleep(4)  # allow for scale-down time

2.0 GB
	MC pi:  3.14182329600	Err:  2.306e-04
	Workers: 4		Time:   3.909s
4.0 GB
	MC pi:  3.14164428800	Err:  5.163e-05
	Workers: 12		Time:   3.906s
8.0 GB
	MC pi:  3.14159373600	Err:  1.082e-06
	Workers: 14		Time:   4.087s
16.0 GB
	MC pi:  3.14156051200	Err:  3.214e-05
	Workers: 16		Time:   6.652s
32.0 GB
	MC pi:  3.14160921000	Err:  1.656e-05
	Workers: 16		Time:   8.946s


## Complete listing of software used here

In [11]:
%pip list

Package            Version          
------------------ -----------------
asciitree          0.3.3            
aspy.yaml          1.2.0            
backcall           0.1.0            
bokeh              1.1.0            
certifi            2019.3.9         
cfgv               1.6.0            
cftime             1.0.3.4          
Click              7.0              
cloudpickle        1.0.0            
cycler             0.10.0           
cytoolz            0.9.0.1          
dask               1.2.0            
dask-jobqueue      0.4.1+32.g9c3371d
decorator          4.4.0            
distributed        1.27.1           
docrep             0.2.5            
fasteners          0.14.1           
heapdict           1.0.0            
identify           1.4.3            
importlib-metadata 0.13             
ipykernel          5.1.1            
ipython            7.5.0            
ipython-genutils   0.2.0            
jedi               0.13.3           
Jinja2             2.10.1           
j

In [12]:
%conda list --explicit

# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
@EXPLICIT
https://conda.anaconda.org/conda-forge/linux-64/git-lfs-2.7.2-0.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/ca-certificates-2019.3.9-hecc5488_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libgcc-ng-8.2.0-hdf63c60_1.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libgfortran-ng-7.3.0-hdf63c60_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libstdcxx-ng-8.2.0-hdf63c60_1.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/bzip2-1.0.6-h14c3975_1002.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/expat-2.2.5-hf484d3e_1002.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/icu-58.2-hf484d3e_1000.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/jpeg-9c-h14c3975_1001.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/libffi-3.2.1-he1b5a44_1006.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/