In [1]:
# setup
from IPython.core.display import display,HTML
display(HTML('<style>.prompt{width: 0px; min-width: 0px; visibility: collapse}</style>'))
display(HTML(open('../rise.css').read()))

# imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(style="whitegrid", font_scale=1.5, rc={'figure.figsize':(12, 6)})


# CMPS 6610
# Algorithms

## Introduction to Graphs


Today's agenda:

- Overview of graphs

## Terminology

A graph $G$ consists of
- a set of $n$ vertices/nodes $V$ 
- a set of $m$ edges $E \subseteq V \times V$

A **graph** is way to represent objects and their relations
  - **Node:** represents an object
  - **Edge:** represents a relation between two nodes. 
  - **Neighbor:** Two nodes are *neighbors* if they are connected by an edge.
- **Directed Graph:** Represents asymmetric (one-way) relationships
- **Undirected Graph:** Represents symmetric relationships



<img src="figures/graph.png" width="50%"/>

[Source](https://github.com/iit-cs579/main/blob/master/read/ek-02.pdf)

Examples of **directed** and **undirected** graphs?

<br><br><br><br><br>

## Graph examples

- social networks (Twitter, Facebook, real-world)
- Web link graph
- transportation
- gas pipelines
- disease spread

<img src="https://statnet.org/nme/movie.gif"/> [source](https://statnet.org/nme/)

- 6-degrees of Kevin Bacon

<img src="figures/kevin.jpg" width="50%"/>

[Source](https://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon)


<img src="figures/giant.png" width="50%"/>

[Source](https://www.cis.upenn.edu/~mkearns/teaching/NetworkedLife/teensex.pdf)

## Graphs queries

What are some things you might want to know about a graph? E.g., consider Facebook friendship graph, transport network, etc.



- Who is friends with Justin Beiber?
  - find **neighbors** of a node: $~~~~~~~~~~~~~~~~~~~~~~~~~~N(v) =\{v_i ~ \mid (v, v_i) \in E ~~ \hbox{or} ~~(v_i, v) \in E\}$

- How popular is Justin Beiber?
  - **degree**: number of neighbors of a node:  $~~~~~~|N(v)|$


- Do I know someone who is friends with Justin?
  - **path**: a sequence of nodes in which each consecutive pair are neighbors
  - **reachability**: $a$ is reachable from $b$ if there is a path between them


- Am I in the same clique as Justin?
  - a graph is **connected** if there is a *path* between each pair of nodes
  - **connected component:** A maximal subset of nodes such that each pair of nodes is connected 


<img src="figures/components.png" width=30%/>


[Source](https://github.com/iit-cs579/main/blob/master/read/ek-02.pdf)

### What data structures can we use to represent a graph?

### Adjacency matrix

<img src="figures/graph.png" width="50%"/>


$
\begin{bmatrix}
  & A & B & C & D\\
A & 0 & 1 & 0 & 0\\
B & 1 & 0 & 1 & 1\\
C & 0 & 1 & 0 & 1\\
D & 0 & 1 & 1 & 0\\
\end{bmatrix}
\begin{bmatrix}
  & A & B & C & D\\
A & 0 & 1 & 0 & 0\\
B & 0 & 0 & 1 & 0\\
C & 0 & 0 & 0 & 1\\
D & 0 & 1 & 0 & 0\\
\end{bmatrix}
$

In [2]:
import numpy as np
from tabulate import tabulate
labels = ['A', 'B', 'C', 'D']
nodes = [0,1,2,3]
edges = [(0,1), (1,2), (2, 3), (3, 2)]

def make_graph(nodes, edges, directed=False):
    graph = np.zeros((len(nodes), len(nodes)), dtype=int)
    for e in edges:
        graph[e[0], e[1]] = 1
        if not directed: # add reverse direction
            graph[e[1], e[0]] = 1
    return graph
        
graph = make_graph(nodes, edges, directed=False)
print('undirected')
print(tabulate(graph, labels))
      
digraph = make_graph(nodes, edges, directed=True)
print('\ndirected')
print(tabulate(digraph, labels))

undirected
  A    B    C    D
---  ---  ---  ---
  0    1    0    0
  1    0    1    0
  0    1    0    1
  0    0    1    0

directed
  A    B    C    D
---  ---  ---  ---
  0    1    0    0
  0    0    1    0
  0    0    0    1
  0    0    1    0


In [3]:
def neighbors_adjacency(graph, node, labels):
    result = []
    i = labels.index(node) 
    for j in graph[i]:
        if graph[i,j] != 0:
            result.append(labels[j])
    return labels
    
neighbors_adjacency(graph, 'B', labels)

['A', 'B', 'C', 'D']

Runtime to access neighbors of a node?

$O(|V|)$

### Why is this space inefficient?

If there are $|V|$ nodes, what is the maximum number of edges?




$$\frac{|V|(|V|-1)}{2} \in O(|V|^2)$$


....but, if a graph is dense, then there's not really any value in representing the data as a graph.

Luckily, most real-world graphs are extremely sparse.

- E.g., you are probably not friends with over 1,000 people.

### it's a small world after all...

![small](figures/small.png)

Source: [*Collective dynamics of 'small-world' networks.* Duncan J. Watts & Steven H. Strogatz](http://www.nature.com/nature/journal/v393/n6684/pdf/393440a0.pdf)

## Edge Sets

We could simply store the graph as a set of edges.

In [4]:
edges = set([('A', 'B'),
             ('B', 'C'),
             ('C', 'D'),
             ('D', 'B')])
edges

{('A', 'B'), ('B', 'C'), ('C', 'D'), ('D', 'B')}

What's the space requirement?

$O(|E|)$

How can we access the neighbors of a node?

In [5]:
def neighbors_set(edges, node):    
    # assuming an undirected graph
    result = []
    for e in edges:
        if e[0] == node:
            result.append(e[1])
        elif e[1] == node:
            result.append(e[0])
    return result

neighbors_set(edges, 'B')

['C', 'A', 'D']

What is the work/span of accessing neighbors using an edge set?

Work: $O(|E|)$

Span: $O(\lg |E|)$ (using filter)

<br><br>

Can we do better?



# Map of Neighbors

We can use a hashmap (`dict`) to store the neighbors of each node.


In [6]:
graph = {
            'A': {'B'},
            'B': {'A', 'B', 'D'},
            'C': {'B', 'D'},
            'D': {'B', 'C'}
        }
graph

{'A': {'B'}, 'B': {'A', 'B', 'D'}, 'C': {'B', 'D'}, 'D': {'B', 'C'}}

What is work of accessing neighbors?

In [7]:
def neighbors_map(graph, node):
    return graph[node]

neighbors_map(graph, 'B')

{'A', 'B', 'D'}

Constant time to lookup the neighbors of a node.

But, to enumerate all the neighbors, it could take $|V|$ in the worst-case.

So, same as the Adjacency Matrix, but with $O(|E|)$ space, instead of $O(|V|^2)$

## Graph Search

One of the fundamental operations over graphs

- Start at a *source* node *s*
- Visit all *reachable* nodes 
  - $t$ is reachable from $s$ if there is a path between them
- For efficiency, visit each node at most once.


<br><br>
Can be used to solve a number of problem:

- Is the graph *connected*?
- Is node $t$ reachable from node $s$
- Shortest path from $s$ to $t$


## Generic Graph Search

Consider the task of crawling every web page reachable from a starting page $s$.

How would you do this?



At any point in the search, a vertex can be in one of three sets:

- **visited**: the set of vertices already visited
- **frontier**: the unvisited neighbors of the visited vertices
- **unseen**: everything else

We can then describe a generic search algorithm as follows:

> while vertices remain to be visited:
- visit some unvisited nodes in the frontier
- update the three sets


<br><br>
This week, we'll look at two common search algorithms, **breadth-first search** and **depth-first search**.

## Vertex Hopping

Just to get an idea of the problem, we'll start with an inefficient way of searching called **vertex hopping**

<br><br>

>loop until all nodes visited:
- pick a node
- visit all its neighbors

In [8]:
def hop(graph):
    for v in graph:
        print('searching with %s' % v)
        for neighbor in graph[v]:
            print('...found %s' % neighbor)
            
hop(graph)

searching with A
...found B
searching with B
...found A
...found D
...found B
searching with C
...found D
...found B
searching with D
...found C
...found B


This of course ignores the idea of **paths** in a graph.

However, we could still build on this to solve the reachability problem. How?