In [1]:
# setup
from IPython.core.display import display,HTML
display(HTML('<style>.prompt{width: 0px; min-width: 0px; visibility: collapse}</style>'))
display(HTML(open('../rise.css').read()))

# imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(style="whitegrid", font_scale=1.5, rc={'figure.figsize':(12, 6)})


# CMPS 2200
# Introduction to Algorithms

## Graph contractions


So far, we have seen pretty limited parallelism in our graph algorithms:

- BFS: Span=$O(d \lg^2 n)$, where $d$ is the **diameter** of graph
  - So, serial in the worst case.
- DFS, Dijkstra, Prim, Kruskal: serial
- Bellman-Ford: Update shortest paths to each node in parallel, span = $O(|V| \lg |V|)$
- Johnson: can run Dijkstra in parallel for each source, span=$O(|E| \lg |E|)$ 



<br>

Is there any way to expose more parellelism?

<br>

What would a divide and conquer algorithm look like for graphs?


Could we partition the graph and combine solutions to each partition?

<center>
    <img src="figures/partition.jpg"/>
</center>

<br> <br>


- Want nearly equal-sized partitions -- this is hard to compute!

- The edges between partitions will make it difficult to solve subproblems independently.




Recall **contraction**, which we used to implement `scan`:

![figures/scan.png](figures/scan.png)


1. Reduce problem size by a constant factor (e.g., half as large). (**contraction**)
2. Solve this smaller problem.
3. Expand solution to solve the larger problem. (**expansion**)

In the next few lectures, we'll develop contraction algorithms to achieve better parallelism in graph algorithms.

## Graph partitions

How can we partition a graph?

How do we represent the partitions?

<br><br>

Recall the notion of **graph cut** we introduced to describe the light-edge property of the MST problem:

A **graph cut** of a graph $(G,V)$ is a partitioning of vertices $V_1 \subset V$, $V_2 = V - V_1$.

Each vertex set $V_i \subset V$ defines a **vertex-induced subgraph** consisting of edges where both endpoints are in $V_i$.

For example:

<center>
    <img src="figures/cut1.jpg"/>
</center>

In this partition, we have:

- $G_1 = (V_1, E_1)~~~~V_1=\{a,b,c,d\}, E_1 = \{(a,b), (a,c), (b,d)\}$
- $G_2 = (V_2, E_2)~~~~V_2=\{e,f\}, E_2 = \{(e,f)\}$


The **cut edges** are those that join the two subgraphs, e.g., $\{(b,e), (d,f)\}$.


<br><br>


### graph partition

a collection of graphs $\{G_1 = (V_1, E_1), \ldots, G_{k} = (V_{k}, E_{k})\}$ such that 

- $\{V_1, \ldots, V_{k}\}$ is a set partition of $V$.
- $\{G_1, \ldots, G_{k}\}$ are vertex-induced subgraphs of $G$ with respect to $\{V_1, \ldots, V_{k}\}$.


We refer to each subgraph $G_i$ as a **block** or **part** of $G$.

For a given partition, we have two types of edges in $E$:

- **cut edges:** an edge $(v_1,v_2)$ such that $v_1 \in V_i$, $v_2 \in V_j$ and $V_i \ne V_j$
- **internal edges:** an edge $(v_1,v_2)$ such that $v_1 \in V_i$, $v_2 \in V_j$ and $V_i = V_j$



## Graph Contraction: Intuition

![figures/contract.png](figures/partition2.png)


**contract step**:

- partition $G$ into subgraphs $\{G_1 \ldots G_k\}$
- Assign one vertex of each subgraph as a **super vertex**
  - e.g., $a$, $d$, $g$ are super vertices of first contraction step
- drop internal edges
- reroute internal edges to connect super vertices
  - e.g., $(a,g)$ is added in first contraction step because $(b,g)$ exists in first graph
  
**recursive step**:

- Solve problem for each subgraph in the partition
- base case: stop when no more edges in the graph

**expansion step**:

- combine solutions to subgraphs to compute result for original input graph



### Which partitions should we choose?



We want partitions that:

- respect the connectivity of the original graph.
  - i.e., vertices in the same partition should be connected
  
  
- shrinks the graph by a constant fraction (geometric decrease)
  - to ensure a logarithmic span ($\lg n$ rounds of contraction)
  
  
We'll look at different ways of partitioning in a moment. For now, let's look at an example of how contraction works to solve a specific problem without worrying too much about the details of the partitioning.

In [3]:
# graph contraction
# we still need to specify partition_graph_f!

def contract_graph(vertices, edges, partition_graph_f):
    if len(edges) == 0:
        return vertices, edges
    else:
        # partition the graph
        # vertex_map is a dict from vertex->super_vertex
        # e.g., {'a': 'a', 'b': 'a', 'c': 'a'...} in above example
        new_vertices, vertex_map = partition_graph_f(vertices, edges)
        # keep only cut eges
        new_edges = set([(vertex_map(e[0]), vertex_map(e[1]))
                          for e in edges if vertex_map(e[0]) != vertex_map(e[1])])
        return contract_graph(new_vertices, new_edges, partition_graph_f)

vertices = set(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i'])
edges = set([('a', 'b'), ('a', 'c'), ('b', 'c'), ('b', 'g'), 
             ('d', 'e'), ('d', 'f'), ('e', 'f'),
             ('h', 'i'), ('i', 'g'), ('h', 'g')])

contract_graph(vertices, edges, partition_graph_f)

## Contraction example: Number of connected components

Recall in lab-10 we compute the number of connected components in a graph.

<center>
<img src="figures/components.png" width=30%/>
</center>

How did we do this? What was the worst-case span of our approach?

Now let's think how we might do this with graph contraction.

![figures/contract.png](figures/partition2.png)

What does the connectivity in the contracted graph tell us about the connectivity in the original graph?

- Since $a$, $b$, $c$ are placed in the same partition, we know they are connected.


- Since $a$ and $g$ are connected in the second graph, then every node in the $g$ partition is reachable from every node in 
the $a$ partition.


- Similarly, since $d$ is not connected to $a$ or $g$, then we know that no node in the $d$ partition is reachable from any node in either the $a$ or $g$ partition.


<br>

What does the final contracted graph tell us about the number of connected components?

In [None]:
# graph contraction
# we still need to specify partition_graph_f!

def num_components(vertices, edges, partition_graph_f):
    if len(edges) == 0:
        # base case: return the number of super vertices in the final partition
        return len(vertices)
    else:
        new_vertices, vertex_map = partition_graph_f(vertices, edges)
        # keep only cut eges
        # can use filter here to do in parallel: O(log|E|) span
        new_edges = set([(vertex_map(e[0]), vertex_map(e[1]))
                          for e in edges if vertex_map(e[0]) != vertex_map(e[1])])
        return contract_graph(new_vertices, new_edges, partition_graph_f)

vertices = set(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i'])
edges = set([('a', 'b'), ('a', 'c'), ('b', 'c'), ('b', 'g'), 
             ('d', 'e'), ('d', 'f'), ('e', 'f'),
             ('h', 'i'), ('i', 'g'), ('h', 'g')])

num_components(vertices, edges, partition_graph_f)

While we have not yet specified `partition_graph_f`, let's assume it
- returns a graph of size $|V|/2$
- has $O(\lg |V|)$ span.

What is the recurrence for the full algorithm to compute `num_components`?

$S(|V|) = S(|V|/2) + \lg (|V|)$

which evaluates to?

but, we can't have the same node in two partitions.

In [50]:
from collections import Counter
import random
# random.seed(42)

def edge_contract(vertices, edges):
    # sample each edge with 50% chance
    sampled_edges = [e for e in edges if random.choice([True, False])]
    print('%d/%d sampled edges' % (len(sampled_edges), len(edges)))
    print(sorted(sampled_edges))
    # count how often each vertex appears in the sampled edges.
    # could do this in parallel (map-reduce) in O(log |V|) span
    vertex_counts = Counter()
    for p in sampled_edges:
        vertex_counts.update(p)
    print('\nvertex counts in sampled edges:', vertex_counts.items())        
    # now, do a filter to get those edges where both vertices
    # appear only once in sampled_edges
    valid_edges = [e for e in sampled_edges if vertex_counts[e[0]] == 1 and vertex_counts[e[1]]==1]
    print('\nkeeping these valid edges:', valid_edges)
    
    vertex_map = dict()
    for e in valid_edges:
        vertex_map[e[0]] = e[1]
    # put all the rest in a singleton partition
    for v in vertices:
        if v not in vertex_map:
            vertex_map[v] = v
    return set(vertex_map.keys()), vertex_map

edge_contract(vertices, edges)

3/10 sampled edges
[('b', 'g'), ('d', 'f'), ('e', 'f')]

vertex counts in sampled edges: dict_items([('d', 1), ('f', 2), ('e', 1), ('b', 1), ('g', 1)])

keeping these valid edges: [('b', 'g')]


({'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i'},
 {'b': 'g',
  'e': 'e',
  'a': 'a',
  'g': 'g',
  'd': 'd',
  'f': 'f',
  'h': 'h',
  'c': 'c',
  'i': 'i'})

- Can we make graph search more parallelizable?
  - E.g., $\lg n$ span?
  
- We'd like to do divide and conquer, but this won't work. Why?

- Instead, we'll do contraction (recall contraction implementation for scan)

- Let's take the problem of Graph Connectivity:
- Graph Connectivity
  - recall we solved this in lab 10
  (figure)
  - 
  - BFS has span proportional to diameter
      - need to solve each connected component sequentially

- Graph partitioning
  - subgraphs
  - vertex-induced subgraphs
  - 

- The graph connectivity problem is to partition an undirected graph into its components (maximal connected subgraphs)


- Graph contraction
  - shrink size of graph and solve connectivity of that smaller graph
  - solve different components in parallel
  - similar idea to divide and conquer, but for graphs
  - we'll also use this idea to solve other graph problems, like spanning trees
  
- contract: $V \mapsto V^1, V^2$
  - $V_1, V_2$ are partitions of the graph
  - each is connected, but may not be maximal (figure)
  - use a "representative node" for each partition 
  - contract code
  
Types of contraction
- Edge Contraction: Only pairs of vertices connected by an edge are contracted.
- Star Contraction: Vertices around a “center star” collapse to the “star”
- Tree Contraction: disjoint trees within the graph are identified and vertices in a tree are collapsed to the root.

Edge contraction
- find "disjoint" edges -- no vertices shared.


- walk through example

- show that it fails on stars. for tomorrow.

Lab: do a edge contraction and/or graph partitioning

## Edge Contraction

- problem: pick edges such that each vertex has at most one neighbor

- greedy algorithm: pick one edge at a time, removing edges connected to either node.

- problem: not parallel

Not going to get any parallelism if our contraction step is serial!

### Parallel edge matching



For which type of graph would this do poorly?