In [3]:
# setup
from IPython.core.display import display,HTML
display(HTML('<style>.prompt{width: 0px; min-width: 0px; visibility: collapse}</style>'))
display(HTML(open('rise.css').read()))

# imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(style="whitegrid", font_scale=1.5, rc={'figure.figsize':(12, 6)})


# CMPS 2200
# Introduction to Algorithms

## Cost models


Today's agenda:

- Three different ways to model computational cost
  + Random Access Machine
  + Parallel Random Access Machine
  + Language Based Models

Recall first lecture:

In [None]:
def linear_search(mylist, key):        #   cost         number of times run
    for i,v in enumerate(mylist):      #   c1               n
        if v == key:                   #   c2               n
            return i                   #   c3               0
    return -1                          #   c4               1

$\hbox{Cost(linear-search, } n) = c_1n + c_2n + c_4 = O(n)$

## machine-based cost model

- define the cost of each instruction  
- runtime is sum of costs of each instruction

## Random Access Machine model

- "Simple" operations (+, *, =, if) take exactly one time step
  - no matter the size of operands
- loops consist of many single-step operations
- memory access takes one step
  - unbounded memory
  - each cell holds integers of unbounded size
- assumes sequential execution
- one input tape, one output tape

<br>


To compute the run time of an algorithm, we:
1. compute the number of steps
2. determine the number of steps per second our machine can perform
3. time = steps / steps per second

<br><br>

All models make incorrect assumptions. What are some that the RAM model makes?


- multiplication does not take the same time as addition
- cache is faster than RAM

<br>

Not terrible assumptions since we are interested in asymptotic analysis.

E.g., if cache lookup is half the time of RAM lookup, then we're "only" off by a factor of 2.  
E.g. $2n \in O(n)$


## PRAM: Parallel Random Access Machine

- The RAM model extended to have 
  - multiple processors $P_0, P_1, P_2 \ldots$
  - unbounded, **shared** memory cells $M[0], M[1], M[2] \ldots$
  - any processor $P_i$ can access any memory cell $M[j]$ in one time step
  
![pram.jpg](figures/pram.jpg)  
[source](https://www.tutorialspoint.com/parallel_algorithm/parallel_random_access_machines.htm)

<br>
Assumes some control mechanisms to deal with race conditions/synchronization.

<br><br>
To compute the runtime:
- compute time for the slowest processor


<br><br>
Two drawbacks of this model:

1. Mapping data to each processor can be tricky
2. Nested parallelism is hard to specify in this model (e.g., recursive fork-join)

## Language based models

- Define a language to specify algorithms
- Assign a cost to each expression
- Cost of algorithm is sum of costs for each expression

## SPARC model

- Language based model for SPARC

Recall our definitions of **work** and **span**

> **work**: total number of primitive operations performed by an algorithm

> **span**: longest sequence of dependencies in computation
- time to run with an infinite number of processors
- measure of how "parallelized" an algorithm is 
- also called: *critical path length* or *computational depth*

<br>

**intuition**:  
**work**: total energy consumed by a computation  
**span**: minimum possible time that the computation requires

<br>
        
**work**: $T_1$ = time using one processor  
**span**: $T_\infty$ = time using $\infty$ processors

For a given SPARC expression $e$, we will analyze the work $W(e)$ and span $S(e)$

**value** (v): irreducible unit of computation
- e.g.: $\mathbb{N}$, *true*, -, *and*
- *functions* are also values (it is a functional language)

Assumptions:

**values have unit work and span**  
$W(v)=S(v)=1$  
$W(\hbox{lambda}\: p \: . \: e) = S(\hbox{lambda} \: p \: . \: e) = 1$

<br>

**operators have unit work and span**  
$W(e_1 \: \hbox{op} \: e_2) = W(e_1) + W(e_2) + 1$



## Composition

<img src="figures/composition.png" width="50%"/>


-   $(e_1, e_2)$: Sequential Composition

    -   Add work and span

-   $e_1 || e_2$: Parallel Composition

    -   Add work but **take the maximum span**
    
    

### Rules of composition


|        $\mathbf{e}$   |        $\mathbf{W(e)}$         |        $\mathbf{S(e)}$         |
| --------------------- | ------------------------------ | ------------------------------ |
|     $(e_1, e_2)$      |     $1 + W(e_1) + W(e_2)$      |     $1 + S(e_1) + S(e_2)$      |
|    $(e_1 || e_2)$     |     $1 + W(e_1) + W(e_2)$      |   $1 + \max(S(e_1), S(e_2))$   |
|   `let val` $x=e_1$ `in` $e_2$ `end`  |  $1 + W(e_1) + W(e_2[\hbox{Eval}(e_1)/x])$       |     $1 + S(e_1) +     S(e_2[\hbox{Eval}(e_1)/x])$  |
| $\{f(x)\mid x\in A\}$ |   $1+\sum_{x\in A} W(f(x))$    |   $1+\max_{x\in A} S(f(x))$    |



$W([\hbox{Eval}(e_1)/x]e_2)$: all free occurrences of $x$ in $e_2$ are replaced with $\hbox{Eval}(e_1)$


**function application**  
$W(e_1 e_2) = W(e_1) + W(e_2) + W([\hbox{Eval}(e_2)/x]e_3) + 1$  
wh