In [1]:
# setup
from IPython.core.display import display,HTML
display(HTML('<style>.prompt{width: 0px; min-width: 0px; visibility: collapse}</style>'))
display(HTML(open('rise.css').read()))

# imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(style="whitegrid", font_scale=1.5, rc={'figure.figsize':(12, 6)})


# CMPS 2200
# Introduction to Algorithms

## Sequences


Week's agenda:

- Today: Abstract Data Types, cost specifications
- Sequences, formally



## Abstract Data Types

> interface consisting of a collection of functions (and possibly values) on a given type, and without reference to the implementation

distinguished from a **data structure,** which contains the actual implementations.

ADT: what  
data structure: how  

ADT often also includes **cost specification** (e.g., value lookup is $O(1)$, search is $O(n)$, etc).


![adt](figures/adt.png)

### how to choose the "right" implementation?

## Sequences

- one of the most popular, common ADTs
- many useful functions for parallel algorithms


<br><br>

Simple to express by example:

$\langle 10, 20, 40 \rangle$

- We'll spend some time defining it more formally so the semantics are precise.

- We'll then define primitive operations over sequences that can be composed to solve a wide array of problems involving sequences.

<br>

First, a quick refresher of sets, relations, and functions...


## Set

> **set**: collection of distinct objects  

- each element of a set appears exactly once
- set with no elements is empty set: $\{\}$ or $\emptyset$
- can be specified by a **set comprehension**

E.g., Cartesian product of sets $A$ and $B = \{(i,j) : i \in A, j \in B\}$
- " tuples $i$ and $j$ *such that* $i$ is in $A$ and $j$ is in $B$ "

## Relation

> A binary **relation** $R$ from a set $A$ to a set $B$ is a subset of the Cartesian product of $A$ and $B$.  

- $R \subseteq A \times B$
- **domain** of $R$ is the set $\{a : (a,b) \in R\}$
- **range** of $R$ is the set $\{b : (a,b) \in R\}$

## Function

>  A **function** or **mapping** from $A$ to $B$ is a relation $R \subset A \times B$ such that: 

- $|R| = |$domain$(R)|$
- that is, for every $a$ in the domain of $R$, there is only one $b$ in the range of $R$ such that $(a,b) \in R$

$A$ is the **domain** and $B$ is the **co-domain**.

## Sequence

> A **sequence** is a function whose domain is a contiguous set of natural numbers starting at zero.

An $\alpha$ **sequence** is a function from $\mathbb{N}$ to $\alpha$ with domain $\{0, \ldots, n-1\}$ for some $n \in \mathbb{N}$

- $\alpha$ specifies the type of the sequence elements

<br>

E.g., $X$ and $Y$ are equivalent sequences:

$ X = \{(0, $ '$a$'$), (1, $ '$b$'$), (2, $ '$c$'$)\} \equiv \langle $'$a$'$, \: $'$b$'$, \: $'$c$'$\rangle$

$ Y = \{(1, $ '$b$'$), (2, $ '$c$'$), (0, $ '$a$'$)\} \equiv \langle $'$a$'$, \: $'$b$'$, \: $'$c$'$\rangle$

<br>

but $Z$ is not a sequence. why not?

$ Z = \{(0, $ '$a$'$), (2, $ '$c$'$)\} $


> with domain $\{0, \ldots, n-1\}$

- length
- indexing
- empty
- singleton
- isEmpty
- isSingleton


## Tabulate

**formal definition**:   
$tabulate \: (f : \: \mathbb{N} \rightarrow \alpha)\: (n :\: \mathbb{N}) : \: \mathbb{S}_\alpha = \langle f(0), f(1), \ldots, f(n-1) \rangle$

$tabulate$ is a function that takes as input:
- another function $f$
- a natural number $n$

and returns a sequence of length $n$ by applying $f$ to each element in $\langle 0, \ldots, n-1 \rangle$

<br>

**SPARC syntax**: 


$tabulate \: (\mathtt{lambda} \: i \: . \: e)\: e_n \equiv \langle e : \: 0 \le i < e_n \rangle $

- the second expression is a **sequence comprehension**
  - $e_n$ is a SPARC expression whose value is a natural number
  

<br>

e.g.

$tabulate \: fib \:\: 9 \equiv \langle fib \: i : \: 0 \le i < 9 \rangle \Rightarrow \langle 0, 1, 1, 2, 3, 5, 8, 13, 21, 34 \rangle$

In [40]:
def tabulate(f, n):
    return [f(i) for i in range(n)]

tabulate(lambda x: x**2, 10)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

### Each call to f(i) can be done in parallel!

In [42]:
from multiprocessing.pool import ThreadPool

def parallel_tabulate(f, n, nthreads=5):
    with ThreadPool(nthreads) as pool:
        results = []
        # launch all tasks
        for i in range(n): 
            results.append(pool.apply_async(f, [i]))
        # wait for all to finish
        return [r.get() for r in results]
    
list(parallel_tabulate(lambda x: x**2, 10))

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

## Map

-  like $tabulate$, but applies $f$ to *elements* of sequence, rather than integers.

**formal definition**: 

$ map \: (f : \alpha \rightarrow \beta)(a : \mathbb{S}_\alpha) : \mathbb{S}_\beta = \{(i, f(x)) : (i, x) \in a\}$

$map$ is a function that takes as input:
- another function $f : \alpha \rightarrow \beta$
- a sequence $a$ of type $\mathbb{S}_\alpha$

and returns a sequence of type $\mathbb{S}_\beta$ with length $n$ by applying $f$ to each element $x \in a$

<br>

**SPARC syntax**: 

$\langle e : p \in e_s \rangle \equiv map\:(\mathtt{lambda} \: p \: . \: e)\: e_s$

<br>

e.g., assume $a = \langle 2, 4, 5, 7\rangle$

$map\: (\mathtt{lambda} \: x \: . \: x^2)\: a \equiv \langle x^2 : x \ \in a \rangle \Rightarrow \langle  4, 16, 25, 49 \rangle$

In [45]:
def my_map(f, a):
    return [f(x) for x in a]

my_map(lambda x: x**2, [4, 16, 25, 49])

[16, 256, 625, 2401]

In [48]:
# In fact, map is built into python:
list(map(lambda x: x**2, [4, 16, 25, 49]))

[16, 256, 625, 2401]

## Filter

like $map$, but $f$ is a boolean function, and the returned list contains elements where $f(x)$ is True.

**formal definition**:   
(convoluted because we have to make sure the returned sequence has domain $\{0, \ldots, n-1\}$)

$ filter \: (f : \alpha \rightarrow \mathbb{B})(a : \mathbb{S}_\alpha) : \mathbb{S}_\alpha = \{\: (\: \vert \: \{(j,y) \in a | j < i \land f(y)\} \: \vert, \: x) : (i,x) \in a | f(x) \: \}$

$filter$ is a function that takes as input:
- another function $f : \alpha \rightarrow \mathbb{B}$
- a sequence $a$ of type $\mathbb{S}_\alpha$

and returns a sequence of type $\mathbb{S}_\alpha$ with length $\le n$ by applying $f$ to each element $x \in a$ and retaining only those where $f(x)$ is $\mathtt{True}$.

<br>

**SPARC syntax**: 

$\langle x \in e_s \: \vert \: e \rangle \equiv filter\:(\mathtt{lambda}\: x \: . \: e)\: e_s$

<br>

e.g., assume $a = \langle 2, 4, 5, 7\rangle$

$filter\: \mathtt{isEven} \: a \equiv \langle x : x \in a \: \vert \: \mathtt{isEven}\: x \rangle \Rightarrow \langle  4, 16, 25, 49 \rangle$

In [51]:
def my_filter(f, a):
    return [x for x in a if f(x)]

my_filter(lambda x: x%2==0, [4, 16, 25, 49])

[4, 16]

In [54]:
# like map, this also already exists...
list(filter(lambda x: x%2==0, [4, 16, 25, 49]))

[4, 16]

## Quick hits

### Subsequence

- $a[e_i \ldots e_j] \equiv subseq(a, \: e_i, \: e_j-e_i+1)$
- subsequence starting at location $i$ with length $j$
- e.g., $subseq(\langle 1,2,3,4,5,6 \rangle, 2, 3) \Rightarrow \langle 3, 4, 5 \rangle$

### Append

- $append(a,b)$ appends sequence $b$ after sequence $a$
- shorthand: $a +\!\!+ \: b$
- e.g., $\langle 1,2,3 \rangle +\!\!+ \: \langle 4, 5 \rangle \Rightarrow \langle 1,2,3,4,5 \rangle $


### Flatten

- append two or more sequences.
- $flatten \langle \langle 1,2,3 \rangle, \langle 4 \rangle, \langle 5, 6 \rangle \rangle \Rightarrow \langle 1,2,3,4,5,6 \rangle$

### Update

- $update (a, (i, x))$ updates location $i$ of sequence $a$ to have value $x$
- e.g., $a = \langle 1,2,3,4,5,6 \rangle$
- $update \: a \: (2, 99) \Rightarrow \langle 1,2,\mathbf{99},4,5,6 \rangle$
- How can we ensure data persistence here?

### Inject

- update multiple locations at once
- e.g., $a = \langle 1,2,3,4,5,6 \rangle$
- $inject \: a \: \langle (2, 99), (4, 100) \rangle \Rightarrow \langle 1,2,\mathbf{99},4,\mathbf{100},6 \rangle$  



- what if we want to parallelize $inject$?

## Nondeterministic Inject

Can we just parallelize each update?

<br><br>
$a = \langle 1,2,3,4,5,6 \rangle$

$inject \: a \: \langle (2, 99), (2, 100) \rangle \Rightarrow $ ???

<br>

$ninject \: a \: \langle (2, 99), (2, 100) \rangle \Rightarrow$ 

$\langle 1,2,\mathbf{99},4,5,6 \rangle$  **OR**
$\langle 1,2,\mathbf{100},4,5,6 \rangle$

<br>

essentially ignore race conditions



## Iterate

- Iterate over a sequence and accumulate a result that changes at each step (e.g., "running sum")

$iterate \ (f : \alpha \times \beta \rightarrow \alpha) (x : \alpha) (a : \mathbb{S}_\beta) : \alpha$

$iterate$ is a function that takes as input:
- another function $f : \alpha \times \beta \rightarrow \alpha$
- an initial result $x$
- a sequence $a$ of type $\mathbb{S}_\beta$

and returns a value of type $\alpha$ that is the result of applying $f(x,a)$ to each element of the sequence.


<br>

$iterate \: f \: x \: a =
\begin{cases}
x & \hbox{if} \: |a| = 0\\
iterate \: f \:\: f(x, a[0]) \:\:\: a[1 \ldots |a|-1]& \hbox{otherwise}
\end{cases}
$


e.g.

$iterate \:\: + \:\:\: 0 \:\:\: \langle 2,5,1,6 \rangle \Rightarrow 14$

In [84]:
def iterate(f, x, a):
    print('calling %s x=%s a=%s' % (f.__name__, x, a))
    if len(a) == 0:
        return x
    else:
        return iterate(f, f(x, a[0]), a[1:])

def plus(x, y):
    return x + y

iterate(plus, 0, [2,5,1,6])

calling plus x=0 a=[2, 5, 1, 6]
calling plus x=2 a=[5, 1, 6]
calling plus x=7 a=[1, 6]
calling plus x=8 a=[6]
calling plus x=14 a=[]


14

## Iterate Prefixes

- also returns the intermediate values computed by $iterate$

$ iteratePrefixes \: f \: x \: a = \\
~~\mathtt{let} \: g (b,x)\: y = (b +\!\!+x, f(x,y))\\
~~\mathtt{in} \: iterate\: g (\langle \rangle, x)\: a \: \mathtt{end}
$

In [85]:
def iterate_prefixes(f, x, a):
    def g(b, y):
        print('\tb=%s y=%s' % (b, y))
        # b[1] has running sum
        # y has next number
        r = f(b[1], y)
        return (b[0] + [r], r)
    return iterate(g, ([], x), a)
    

iterate_prefixes(plus, 0, [2,5,1,6])

calling g x=([], 0) a=[2, 5, 1, 6]
	b=([], 0) y=2
calling g x=([2], 2) a=[5, 1, 6]
	b=([2], 2) y=5
calling g x=([2, 7], 7) a=[1, 6]
	b=([2, 7], 7) y=1
calling g x=([2, 7, 8], 8) a=[6]
	b=([2, 7, 8], 8) y=6
calling g x=([2, 7, 8, 14], 14) a=[]


([2, 7, 8, 14], 14)

## Problem: Rightmost Positive

> Given a sequence of integers $a$, for each element in $a$ find the rightmost positive number to its left.

E.g., 

$rpos \: \langle 1, 0, -1, 2, 3, 0, -5, 7 \rangle \Rightarrow \langle -\infty, 1, 1, 1, 2, 3, 3, 3 \rangle$

 ($-\infty$ if no positive element to the left)
 
 Let's design a solution using $iterate$

In [87]:
def extend_positive(result, x):
    # result = (last_positive_value, sequence)
    # x = new element
    if x > 0:
        return (x, result[1] + [result[0]])
    else:
        return (result[0], result[1] + [result[0]])
    
extend_positive((0, [1, 0, -1, 2]), 1)

iterate(extend_positive, (-1e100, []), [1,0,-1,2,3,0,-5,7])

calling extend_positive x=(-1e+100, []) a=[1, 0, -1, 2, 3, 0, -5, 7]
calling extend_positive x=(1, [-1e+100]) a=[0, -1, 2, 3, 0, -5, 7]
calling extend_positive x=(1, [-1e+100, 1]) a=[-1, 2, 3, 0, -5, 7]
calling extend_positive x=(1, [-1e+100, 1, 1]) a=[2, 3, 0, -5, 7]
calling extend_positive x=(2, [-1e+100, 1, 1, 1]) a=[3, 0, -5, 7]
calling extend_positive x=(3, [-1e+100, 1, 1, 1, 2]) a=[0, -5, 7]
calling extend_positive x=(3, [-1e+100, 1, 1, 1, 2, 3]) a=[-5, 7]
calling extend_positive x=(3, [-1e+100, 1, 1, 1, 2, 3, 3]) a=[7]
calling extend_positive x=(7, [-1e+100, 1, 1, 1, 2, 3, 3, 3]) a=[]


(7, [-1e+100, 1, 1, 1, 2, 3, 3, 3])

do iterate and reduce and introduce problem today.

tomorrow introduce scan and solve