# Overview
Apache Spark is a distributed compute environment. One of the classic use cases for distributed compute is running Monte Carlo simulations.

In this notebook we explore the common Hello, World! example you'll find in most spark tutorials. This example is commonly referred to as Spark Pi. It uses a Monte Carlo simulation to compute the value of pi.

# 1. The Spark Pi Problem
We are going to run the Spark Pi example which uses a "Monte Carlo Method" and the "Circle Method" to approximate the value of pi. 
In short; We will generate a large number or random points within a unit square and determine the ratio of the points within the unit circle; This will give us an approximation for the value of pi.

Recall that the area of a circle is defined as:
$$ A_c = \pi r^2 $$
Considering we are dealing with a unit circle, we have $r = 0.5$, and therfore

$$ A_c = 0.5^2 \pi  = 0.25\pi = \frac{\pi}{4}$$
Recall that the area of a square is defined as:
$$ A_s = l^2 = 1^2 = 1 $$
If we divide the area of the circle (smaller) by the area of the square (larger) we have the following equality:

$$ \frac{A_c}{A_s} = \frac{\pi / 4}{1} = \frac{\pi}{4}$$
And therefore we can say:
$$ \pi = 4 \frac{A_c}{A_s} $$
With this equation we can derive the value of pi using the area of the circle and the square.


We can approximate the ratio of these areas using a set of random numbers and a bit of logic.


If we generate uniform random variables we can treat them as points on a discrete grid.
The number of grid points that fall in the circle compared to the total number of points approximates the ratio of the area of the circle and the square respectively.

$$ \frac{num \ points \ in  \ circle}{num \ of \ points} \approx \frac{A_c}{A_s} $$

As the number of random points increases, we converge to the true areas and thus the true value of pi.

<center><img src='images/Convergence of Monte Carlo.gif' width="300px"/></center>

We can determine which poitns are inside the circle vs the ones that are not by using the Pythagorean Theorem.
Given a triangle, we can determine the length of a side if we know the length of the other two sides.
$$ A^2 + B^2 = C^2 $$

$$ C = \sqrt{A^2 + B^2} $$
If we compare the hypotinuse with the radius of a circle we will be able to determine whether or not a point is within a circle or not

<center><img src='images/Circle Method Pythagorean Diameter.png' width="300px"/></center>

The criteria for being inside the circle thus becomes:

$$ r \le \sqrt{X^2 + Y^2} $$

Because we are dealing with a unit circle, $r = 1; \sqrt{1} = 1$ , thus we can also say:

$$ r \le X^2 + Y^2 $$

# 2. The Spark Pi Code
First we create the SparkContext

In [3]:
from spark_helper import create_spark_context
spark_app_name = "spark-jupyter-win"
docker_image = "tschneider/pyspark:v5"
k8_master_ip = "15.4.7.11"
sc = create_spark_context(spark_app_name, docker_image, k8_master_ip)

Then we define a function to run the monte carlo and submit code to the spark cluster

In [4]:
# Define a function to generate a pair or random numbers and determine whether they corespond to a point within a circle
import random

def monte_carlo_trial(var):
    # Generate random variables for x and y
    x, y = random.random(), random.random()
    # Calculate whether or not the point is inside the circle
    inside_circle =  x*x + y*y < 1
    # Return the value
    return inside_circle

# Set the number of trials for the monte carlo simulation
number_of_trials = 10000

# Use the SparkContext to apply the monte carlo trials in parrallel and count the positive results
count = sc.parallelize(range(0, number_of_trials)).filter(monte_carlo_trial).count()

# Compute the value of pi based on the information from the monte carlo simulation
pi = 4 * count / number_of_trials

# Print the value of pi
print(pi)

3.1808


In [6]:
# Cleanup
sc.stop()