# Monte-Carlo Estimate of $\pi$

We want to estimate the number $\pi$ using a [Monte-Carlo method](https://en.wikipedia.org/wiki/Pi#Monte_Carlo_methods) exploiting that the area of a quarter circle of unit radius is $\pi/4$ and that hence the probability of any randomly chosen point in a unit square to lie in a unit circle centerd at a corner of the unit square is $\pi/4$ as well.  So for N randomly chosen pairs $(x, y)$ with $x\in[0, 1)$ and $y\in[0, 1)$, we count the number $N_{circ}$ of pairs that also satisfy $(x^2 + y^2) < 1$ and estimage $\pi \approx 4 \cdot N_{circ} / N$.

[<img src="https://upload.wikimedia.org/wikipedia/commons/8/84/Pi_30K.gif" 
     width="50%" 
     align=top
     alt="PI monte-carlo estimate">](https://en.wikipedia.org/wiki/Pi#Monte_Carlo_methods)

## Core Lessons

- Recap adaptive clusters
- Tuning the adaptivity

## Set up a Slurm cluster

In [1]:
from dask.distributed import Client
from dask_jobqueue import SLURMCluster
import os

cluster = SLURMCluster(
    cores=24,
    processes=2,
    memory="100GB",
    shebang='#!/usr/bin/env bash',
    queue="medium",
    walltime="00:30:00",
    death_timeout="15s",
    interface="ib0")

client = Client(cluster)
client

0,1
Client  Scheduler: tcp://10.246.201.1:39656  Dashboard: http://10.246.201.1:8787/status,Cluster  Workers: 0  Cores: 0  Memory: 0 B


## The job scripts

In [2]:
print(cluster.job_script())

#!/usr/bin/env bash

#SBATCH -J dask-worker
#SBATCH -n 1
#SBATCH --cpus-per-task=24
#SBATCH --mem=94G
#SBATCH -t 00:30:00
JOB_ID=${SLURM_JOB_ID%;*}



/home/shkifmwr/miniconda3/envs/dask_jobqueue_workshop/bin/python -m distributed.cli.dask_worker tcp://10.246.201.1:39656 --nthreads 12 --nprocs 2 --memory-limit 50.00GB --name dask-worker--${JOB_ID}-- --death-timeout 15s --interface ib0



## Scale the cluster to two nodes

In [3]:
cluster.scale(4)

## The Monte Carlo Method

In [4]:
import dask.array as da
import numpy as np


def calc_pi_mc(size_in_bytes, chunksize_in_bytes=200e6):
    """Calculate PI using a Monte Carlo estimate."""
    
    size = int(size_in_bytes / 8)
    chunksize = int(chunksize_in_bytes / 8)
    
    xy = da.random.uniform(0, 1,
                           size=(size / 2, 2),
                           chunks=(chunksize / 2, 2))
    
    in_circle = ((xy ** 2).sum(axis=-1) < 1)
    pi = 4 * in_circle.mean()

    return pi


def print_pi_stats(size, pi, time_delta, num_workers):
    """Print pi, calculate offset from true value, and print some stats."""
    print(f"{size / 1e9} GB\n"
          f"\tMC pi: {pi : 13.11f}"
          f"\tErr: {abs(pi - np.pi) : 10.3e}\n"
          f"\tWorkers: {num_workers}"
          f"\t\tTime: {time_delta : 7.3f}s")

## The actual calculations

We loop over different volumes of double-precision random numbers and estimate $\pi$ as described above.

In [5]:
from time import time, sleep

In [6]:
for size in (1e9 * n for n in (1, 10, 100)):
    
    start = time()
    pi = calc_pi_mc(size).compute()
    elaps = time() - start

    print_pi_stats(size, pi, time_delta=elaps,
                   num_workers=len(cluster.scheduler.workers))

1.0 GB
	MC pi:  3.14176032000	Err:  1.677e-04
	Workers: 4		Time:   0.933s
10.0 GB
	MC pi:  3.14154535040	Err:  4.730e-05
	Workers: 4		Time:   1.632s
100.0 GB
	MC pi:  3.14158865152	Err:  4.002e-06
	Workers: 4		Time:   8.331s


## Scaling the Cluster to twice its size

We increase the number of workers by 2 and the re-run the experiments.

In [7]:
new_num_workers = 2 * len(cluster.scheduler.workers)

print(f"Scaling from {len(cluster.scheduler.workers)} to {new_num_workers} workers.")

cluster.scale(new_num_workers)

sleep(3)

Scaling from 4 to 8 workers.


In [9]:
client

0,1
Client  Scheduler: tcp://10.246.201.1:39656  Dashboard: http://10.246.201.1:8787/status,Cluster  Workers: 4  Cores: 48  Memory: 200.00 GB


## Re-run same experiments with doubled cluster

In [None]:
for size in (1e9 * n for n in (1, 10, 100)):
    
    start = time()
    pi = calc_pi_mc(size).compute()
    elaps = time() - start

    print_pi_stats(size, pi,
                   time_delta=elaps,
                   num_workers=len(cluster.scheduler.workers))

## Automatically scale the cluster towards a target duration

Previously, we have seen how to let Dask figure out the optimal cluster size.  Here, we'll target a wall time of 30 seconds.

_**Watch** how the cluster will scale down to the minimum a few seconds after being made adaptive._

In [None]:
ca = cluster.adapt(
    minimum=2, maximum=30,
    target_duration="360s",  # measured in CPU time per worker
                             # -> 30 seconds at 12 cores / worker
    scale_factor=1.0  # prevent from scaling up because of CPU or MEM need
);

sleep(10)  # Allow for scale-down

In [None]:
client

## Repeat the calculation from above with larger work loads

(And watch the dash board!)

In [None]:
for size in (n * 1e9 for n in (200, 400, 800)):
    
    
    start = time()
    pi = calc_pi_mc(size, min(size / 1000, 500e6)).compute()
    elaps = time() - start

    print_pi_stats(size, pi, time_delta=elaps,
                   num_workers=len(cluster.scheduler.workers))
    
    sleep(20)  # allow for scale-down time

## Complete listing of software used here

In [None]:
%pip list

In [None]:
%conda list --explicit