# Time & Speed optimize Notebook Code
https://towardsdatascience.com/speed-up-jupyter-notebooks-20716cbe2025

## Approximating $\pi$ with Monte Carlo integration
A Monte Carlo simulation is a method for estimating an answer to a problem by randomly generating samples. They are primarily suited for calculating a “brute force” approximation to the solution of a system which may be of high dimension, such as DeepMind’s AlphaGo Zero where Monte Carlo tree search was being utilized.

We will define a slow method which evaluates pi using random generated data points and then look for ways to optimize. Remember that the area covered by a circle with radius 1 inscribed in a square, equals exactly to a quarter of pi.

We get the value of $\pi$ by taking the ratio of area of circle to area of the square,

$$\frac{area\,of\,circle}{area\,of\,square}\equiv \frac{point\,in\,circle}{total\,points}$$


## Code
$$ I = \int_0^1\int_0^1 f(x,y)dxdy = \frac{\pi}{4}$$
where
$$f(x,y) = \{{1\,if\,x^2+y^2\leq 1}\\{0\,else}$$

### Normal Function

In [12]:
from random import random

def estimate_pi(n=1e7) -> "area":
  """Estimate pi with monte carlo simulation.
    
  Arguments:
    n: number of simulations
  """
  in_circle = 0
  total = n
    
  while n != 0:
    prec_x = random()
    prec_y = random()
    if pow(prec_x, 2) + pow(prec_y, 2) <= 1:
      in_circle += 1 # inside the circle
    n -= 1
        
  return 4 * in_circle / total

### Recursive Function

In [13]:
from random import random

def estimate_pi_recursive(n=1e7) -> "area":
  def helper(in_circle, n):
    if n == 0: return in_circle 
    if random()**2 + random()**2 <= 1:
      return helper(in_circle+1, n-1)
    return helper(in_circle, n-1)
        
  in_circle = helper(0, n)
  return 4 * in_circle / n

## Timing
###  `%time`

In [14]:
%time estimate_pi()

CPU times: user 4.8 s, sys: 0 ns, total: 4.8 s
Wall time: 4.8 s


3.1415212

### `%timeit`
with `-r` denoting number of runs and `-n` number of loops. What we get is:

In [15]:
%timeit -r 2 -n 5 estimate_pi()

5.06 s ± 36 ms per loop (mean ± std. dev. of 2 runs, 5 loops each)


## Profile
### CProfiler

In [16]:
%prun estimate_pi()

 

         40000004 function calls in 9.068 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    5.496    5.496    9.068    9.068 <ipython-input-12-4ca324407b38>:3(estimate_pi)
 20000000    2.496    0.000    2.496    0.000 {built-in method builtins.pow}
 20000000    1.076    0.000    1.076    0.000 {method 'random' of '_random.Random' objects}
        1    0.000    0.000    9.068    9.068 {built-in method builtins.exec}
        1    0.000    0.000    9.068    9.068 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

In [17]:
%prun -D pi.prof estimate_pi()

 
*** Profile stats marshalled to file 'pi.prof'. 


         40000004 function calls in 9.060 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    5.490    5.490    9.060    9.060 <ipython-input-12-4ca324407b38>:3(estimate_pi)
 20000000    2.495    0.000    2.495    0.000 {built-in method builtins.pow}
 20000000    1.075    0.000    1.075    0.000 {method 'random' of '_random.Random' objects}
        1    0.000    0.000    9.060    9.060 {built-in method builtins.exec}
        1    0.000    0.000    9.060    9.060 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

In [18]:
%prun -s cumulative estimate_pi()

 

         40000004 function calls in 9.185 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    9.185    9.185 {built-in method builtins.exec}
        1    0.000    0.000    9.185    9.185 <string>:1(<module>)
        1    5.568    5.568    9.185    9.185 <ipython-input-12-4ca324407b38>:3(estimate_pi)
 20000000    2.529    0.000    2.529    0.000 {built-in method builtins.pow}
 20000000    1.088    0.000    1.088    0.000 {method 'random' of '_random.Random' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

## Line Profiler

In [19]:
!pip install line_profiler
%load_ext line_profiler

The line_profiler extension is already loaded. To reload it, use:
  %reload_ext line_profiler


In [20]:
%lprun -f estimate_pi estimate_pi()

Timer unit: 1e-06 s

Total time: 32.4931 s
File: <ipython-input-12-4ca324407b38>
Function: estimate_pi at line 3

Line #      Hits         Time  Per Hit   % Time  Line Contents
     3                                           def estimate_pi(n=1e7) -> "area":
     4                                             """Estimate pi with monte carlo simulation.
     5                                               
     6                                             Arguments:
     7                                               n: number of simulations
     8                                             """
     9         1          1.0      1.0      0.0    in_circle = 0
    10         1          1.0      1.0      0.0    total = n
    11                                               
    12  10000001    4609664.0      0.5     14.2    while n != 0:
    13  10000000    5229776.0      0.5     16.1      prec_x = random()
    14  10000000    4893684.0      0.5     15.1      prec_y = random()
    15  1

In [21]:
!pip install py-heat-magic
%load_ext heat

The heat extension is already loaded. To reload it, use:
  %reload_ext heat


In [25]:
%reload_ext heat

%%heat

%lprun -f estimate_pi estimate_pi()

SyntaxError: invalid syntax (<ipython-input-25-8a7aad335bb9>, line 3)

## Optimize
### Pythonic
Remember that every call to a function is associated with overhead time, thus the vast majority of calls in the loop is something which boggles us down. The while loop is just incrementing a counter by one, if a certain condition is met. To abbreviate the code, we introduce the sum()method, a generator expression and removal of pow().

In [26]:
from random import random

def estimate_pi(n=1e7) -> "area":
  """Estimate pi with monte carlo simulation.
  
  Arguments:
    n: number of simulations
  """
  return 4 * sum(1 for _ in range(int(n)) if random()**2 + random()**2 <= 1) / n

In [27]:
%lprun -f estimate_pi estimate_pi()

Timer unit: 1e-06 s

Total time: 5.91442 s
File: <ipython-input-26-7f18539ec1cd>
Function: estimate_pi at line 3

Line #      Hits         Time  Per Hit   % Time  Line Contents
     3                                           def estimate_pi(n=1e7) -> "area":
     4                                             """Estimate pi with monte carlo simulation.
     5                                             
     6                                             Arguments:
     7                                               n: number of simulations
     8                                             """
     9         1    5914423.0 5914423.0    100.0    return 4 * sum(1 for _ in range(int(n)) if random()**2 + random()**2 <= 1) / n

In [28]:
%timeit -r 2 -n 5 estimate_pi()

3.13 s ± 2.04 ms per loop (mean ± std. dev. of 2 runs, 5 loops each)


### Vectorization
If the option is available we should avoid looping code altogether. Especially in data science we’re familiar with NumPy and pandas, highly optimized libraries for numerical computation. A big advantage in NumPy are arrays internally based on C arrays which are stored in a contiguous block of memory (data buffer-based array).

In [44]:
import numpy as np

def estimate_pi(n=10000000) -> "area":
  """Estimate pi with monte carlo simulation.
  
  Arguments:
    n: number of simulations
  """
  xy = np.random.rand(n, 2)
  inside = np.sum(xy[:, 0]**2 + xy[:, 1]**2 <= 1)
  return 4 * inside / n

In [30]:
%timeit estimate_pi()

333 ms ± 4.78 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


## Memory Profiling
### `%memit`

In [35]:
!pip install memory_profiler
%load_ext memory_profiler



In [36]:
%memit estimate_pi()

peak memory: 237.92 MiB, increment: 152.69 MiB


In [45]:
%mprun -f estimate_pi estimate_pi()

ERROR: Could not find file <ipython-input-44-da149c15bd54>
NOTE: %mprun can only be used on functions defined in physical files, and not in the IPython environment.





In [46]:
import numpy as np

def estimate_pi(n=10000000) -> "area":
  """Estimate pi with monte carlo simulation.
    
  Arguments:
    n: number of simulations
  """
  return np.sum(np.random.random(n)**2 + np.random.random(n)**2 <= 1) / n * 4

In [47]:
%timeit estimate_pi()

298 ms ± 3.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


## Out of Memory
Creating the array comes with a restriction to available system memory. The allocation process scales linearly with the input parameter n . If, for example, we would try to set the number of simulations to 1e10 our kernel would crash while creating the array. Luckily, this method is not requiring one large array, instead we split it into smaller chunks which enables us to scale the number of simulations independent to system memory.

In [49]:
import numpy as np

def estimate_pi_mem_block(n=10000000) -> "area":
  """Estimate pi with monte carlo simulation.
   
  Arguments:
      n: number of simulations
  """
  size = 10000000 # 1e7
  n_blocks, remainder = divmod(n, size) 
  memory_blocks = [size] * n_blocks + [remainder]
  inside = sum(np.sum(np.random.random(block)**2 + np.random.random(block)**2 <= 1) for block in memory_blocks)
  return 4 * inside / n

Here we’re predefine the size of the array (line 9) which equals to 80 MB. We split the number of simulations to handle 80 MB numpy array and increment inside each at the time.

If we simulated beforehand with setting nto1e10 we would try to allocate an array of 80 GB (!) in size, simply not feasible on a standard machine. With the updated method, calculation time scales on equal terms as n .

## Optimize with a different algorithm
### Chudnovsky algorithm
While there are many ways to calculate pi with high-digit precision [3], a very fast method is Chudnovsky algorithm which was published by the Chudnovsky brothers in 1989 and appears in the following form:
$$ \frac{1}{\pi} =  12\sum_{k=0}^\infty \frac{(-1)^k(6k)!(13591409+545140134k))}{(3k)!(k!)^3640320^{3k+3/2}}$$

In [52]:
from decimal import Decimal as Dec, getcontext as gc

def chudnovsky_pi(maxK=70, prec=1008, disp=1007): # parameter defaults chosen to gain 1000+ digits within a few seconds
    gc().prec = prec
    K, M, L, X, S = 6, 1, 13591409, 1, 13591409
    for k in range(1, maxK+1):
        M = (K**3 - 16*K) * M // k**3 
        L += 545140134
        X *= -262537412640768000
        S += Dec(M * L) / X
        K += 12
    pi = 426880 * Dec(10005).sqrt() / S
    pi = Dec(str(pi)[:disp]) # drop few digits of precision for accuracy
    print("PI(maxK=%d iterations, gc().prec=%d, disp=%d digits) =\n%s" % (maxK, prec, disp, pi))
    return pi

In [53]:
%timeit -r 1 -n 1000 chudnovsky_pi()

PI(maxK=70 iterations, gc().prec=1008, disp=1007 digits) =
3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067982148086513282306647093844609550582231725359408128481117450284102701938521105559644622948954930381964428810975665933446128475648233786783165271201909145648566923460348610454326648213393607260249141273724587006606315588174881520920962829254091715364367892590360011330530548820466521384146951941511609433057270365759591953092186117381932611793105118548074462379962749567351885752724891227938183011949129833673362440656643086021394946395224737190702179860943702770539217176293176752384674818467669405132000568127145263560827785771342757789609173637178721468440901224953430146549585371050792279689258923542019956112129021960864034418159813629774771309960518707211349999998372978049951059731732816096318595024459455346908302642522308253344685035261931188171010003137838752886587533208381420617177669147303598253490428755468731159562863882353787