In [1]:
# setup
from IPython.core.display import display,HTML
display(HTML('<style>.prompt{width: 0px; min-width: 0px; visibility: collapse}</style>'))
display(HTML(open('rise.css').read()))

# imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(style="whitegrid", font_scale=1.5, rc={'figure.figsize':(12, 6)})


# CMPS 6610
# Algorithms

## Cost of sequence functions


Today's agenda:  

- Review of subsequence functions
- Work/Span of functions


In [47]:
# Review of primitive functions.

def tabulate(f, n):
    return [f(i) for i in range(n)]

def my_map(f, a):
    return [f(x) for x in a]

def my_filter(f, a):
    return [x for x in a if f(x)]

def iterate(f, x, a):
    if len(a) == 0:
        return x
    else:
        return iterate(f, f(x, a[0]), a[1:])

def flatten(sequences):
    return iterate(plus, [], sequences)

def reduce(f, id_, a):
    if len(a) == 0:
        return id_
    elif len(a) == 1:
        return a[0]
    else:
        return f(reduce(f, id_, a[:len(a)//2]),
                 reduce(f, id_, a[len(a)//2:]))

def scan(f, id_, a):
    return (
            [reduce(f, id_, a[:i+1]) for i in range(len(a))],
             reduce(f, id_, a)
           )

def plus(x, y):
    return x + y


The cost of these functions depends on the concrete data structure used to represent the sequence.

E.g., for an **array**:

\begin{array}{lcc}  
\mbox{Operation} & \mbox{Work} & \mbox{Span}  
\\  
\mathit{length}~a  
&
1  
&
1  
\\  
\mathit{nth}~a~i  
& 1  
& 1  
\\   
\mathit{singleton}~x  
&  
1  
&   
1  
\\  
\mathit{empty}  
&  
1  
&   
1  
\\  
\mathit{isSingleton}~x  
&  
1  
&   
1  
\\  
\mathit{isEmpty}~x  
&  
1  
&   
1  
\\   
\mathit{tabulate}~f~n  
& 1 + \displaystyle\sum_{i=0}^n W\left({f(i)}\right)  
& 1 + \displaystyle\max_{i=0}^n S\left({f(i)}\right)   
\\[2mm]  
\mathit{map}~f~a  
& 1 + \displaystyle\sum_{x \in a}  W\left({f(x)}\right)  
& 1 + \displaystyle\max_{x \in a}  S\left({f(x)}\right)   
\\[2mm]  
\mathit{filter}~f~a  
& 1 + \displaystyle\sum_{x \in a} W\left({f(x)}\right)  
& \lg \lvert a \rvert + \displaystyle\max_{x \in a} S\left({f(x)}\right)   
\\[2mm]  
\mathit{subseq}~a~(i,j)  
& 1  
& 1   
\\[2mm]  
\mathit{append}~a~b  
& 1 + \lvert a \rvert+\lvert b \rvert  
& 1   
\\[2mm]  
\mathit{flatten}~a  
& 1 + \lvert a \rvert + \sum_{x \in a} |x|  
& 1 + \lg \lvert a \rvert   
\\[2mm]  
\mathit{update}~a~(i,x)   
& 1 + \lvert a \rvert  
& 1   
\\[2mm]  
\mathit{inject}~a~b   
& 1 + \lvert a \rvert + \lvert b \rvert  
& \lg(\mathsf{degree}(b))  
\\[2mm]  
\mathit{ninject}~a~b   
& 1 + \lvert a \rvert + \lvert b \rvert  
& 1   
\\[2mm]  
\mathit{collect}~f~a  
& 1 + W\left({f}\right) \cdot \lvert a \rvert \lg \lvert a \rvert  
& 1 + S\left({f}\right) \cdot \lg^2 \lvert a \rvert  
\\[2mm]  
\mathit{iterate}~f~x~a  
&  
1 + \displaystyle\sum_{f(y,z) \in \mathcal{T}(-)} W\left({f(y,z)}\right)  
&  
1 + \displaystyle\sum_{f(y,z) \in \mathcal{T}(-)} S\left({f(y,z)}\right)  
\\[2mm]  
\mathit{reduce}~f~x~a   
&   
1 + \displaystyle\sum_{f(y,z) \in \mathcal{T}(-)} W\left({f(y,z)}\right)  
&  
\lg \lvert a \rvert \cdot  \displaystyle\max_{f(y,z) \in \mathcal{T}(-)} S\left({f(y,z)}\right)  
\\[2mm]  
\mathit{scan}~f~x~a  
& \lvert a \rvert  
& \lg \lvert a \rvert  
\end{array}

Why does filter require logarithmic span?

```python
def my_filter(f, a):
    return [x for x in a if f(x)]
```

<br>


$S(filter \: f \: a) =  \lg \lvert a \rvert + \displaystyle\max_{x \in a} S\left({f(x)}\right)$

We need to account for the work to create the return array.

<br><br>
We can't do this in constant span, because the location of one value depends on the location of other values.
<br><br>

$filter \:\: positive \:\: [-1,3,-2,4,-5,6] \rightarrow [3,4,6]$

<br><br>
**idea:** Make a first, parallel pass to create boolean values indicating if the value will be copied to the new array.

$[false,true,false,true,false,true]$

Use **scan** to determine indices in the new array. 

<br>
We'll see a version of this on the lab this week.


### All Contiguous Subsequences

Given a sequence $a$, generate all contiguous subsequences.

<br><br>


$\langle a \langle i, \ldots,j \rangle : 0 \le i < |a|, i \le j < |a| \rangle$

$\equiv$

$\langle a \langle i, \ldots,j \rangle : 0 \le i \le j < |a| \rangle$

$\equiv$

$flatten \langle \: \langle a[i \ldots i+j]: i \le j < |a| \rangle : 0 \le i < |a| \rangle$

$\equiv$

$flatten (tabulate (\mathtt{lambda} \:  i \: . tabulate ( \mathtt{lambda} \: j \: . \: a[i \ldots i+j])(|a| - i - 1)|a|)$

In [48]:
# sequential solution

def all_contiguous_subseq(a):
    for i in range(len(a)):
        for j in range(i+1, len(a)+1):
            yield a[i:j]
            
list(all_contiguous_subseq([1,2,3,4,5]))

[[1],
 [1, 2],
 [1, 2, 3],
 [1, 2, 3, 4],
 [1, 2, 3, 4, 5],
 [2],
 [2, 3],
 [2, 3, 4],
 [2, 3, 4, 5],
 [3],
 [3, 4],
 [3, 4, 5],
 [4],
 [4, 5],
 [5]]

In [52]:
# nested tabulate
a = [1,2,3,4,5]
flatten(
    tabulate(lambda i: 
             tabulate(lambda j: a[i:i+j+1],
                      len(a)-i),
         len(a))
)

[[1],
 [1, 2],
 [1, 2, 3],
 [1, 2, 3, 4],
 [1, 2, 3, 4, 5],
 [2],
 [2, 3],
 [2, 3, 4],
 [2, 3, 4, 5],
 [3],
 [3, 4],
 [3, 4, 5],
 [4],
 [4, 5],
 [5]]

### analysis of All Contiguous Subsequences

How many calls to `a[i:i+j+1]` (i.e., `subseq`)?

If $|a|=n$,

$$ \sum_{i=1}^n = \frac{n(n-1)}{2}  \in O(n^2) $$

Work and span of `subseq` is O(1) (**why?**)

Therefore, total work is $O(n^2)$.

<br>

Span of inner `tabulate` is $O(1)$, and outer `tabulate` is also $O(1)$.

<br>

`flatten` at the end requires $O(\lg n)$ span.

Therefore, total span is $O(\lg n)$


### How would work/span differ using a singly linked list?

E.g., accessing element $i$ costs?



$O(i)$ work to access the $i$th element of a singly linked list.

<br><br>

Because of this, there is little opportunity for parallelism.

E.g.

$\mathit{map}~f~a$


span is $1 + \displaystyle\sum_{x \in a}  S\left({f(x)}\right)   $


<br>

compared to the costs for an array implementation:

span is $ 1 + \displaystyle\max_{x \in a}  S\left({f(x)}\right)$
