In [1]:
# setup
from IPython.core.display import display,HTML
display(HTML('<style>.prompt{width: 0px; min-width: 0px; visibility: collapse}</style>'))
display(HTML(open('../rise.css').read()))

# imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(style="whitegrid", font_scale=1.5, rc={'figure.figsize':(12, 6)})


# CMPS 2200
# Introduction to Algorithms

## Breadth-first Search


## Breadth-first search (BFS)

<center>
<img src="https://upload.wikimedia.org/wikipedia/commons/4/46/Animated_BFS.gif" width=25%/>
</center>

**Input:**
- graph $G$
- source vertex $s$

<br><br>
1. Visit all nodes that are neighbors of $s$
2. Visit nodes that are neighbors of neighbors of $s$
- ...


<br><br>
- Nodes at level $i$ have a path distance of $i$  from $s$
- BFS proceeds one level at a time, until there are no new neighbors to visit.

<br><br>
What variables will we need to keep track of?

<center>
<img src="figures/bfs_1.png" width=30%/>
</center>

- `visited` ($X$): the nodes already visited, so we don't visit them more than once
- `frontier` ($F$): the nodes to visit next.

At iteration $i$:

- `visited` contains all nodes with distance less than $i$ from $s$
- `frontier` contains all nodes with distance exactly $i$ from $s$
  - these are all the unvisited neighbors of `visited`.
  
How do we update `visited` and `frontier` at each iteration?

1. To update `visited`, we add any new values encountered in the frontier:
  - $X_{i+1} = X_i \cup F_i$


2. To update `frontier`, we take the neighborhood of $F_i$ and remove any vertices that have already been visited:
  - $F_{i+1} = N(F_i) \setminus X_{i+1}$
  - $N(F_i)$ are the neighbors of the nodes in $F_i$

<br>

e.g. for $i=1$:
- $X_1 = \{a\}$
- $F_1 = \{b,c\}$
- update:
    - $X_2 = \{a\} \cup \{b,c\} = \{a,b,c\}$
    - $F_2 = \{a, d, e, f, g\} \setminus \{a,b,c\} = \{d,e,f,g\}$
    


In [18]:
from functools import reduce


def bfs(graph, source, depth):
    visited = set()
    frontier = set([source])
    while len(frontier) > 0:
        # update visited
        visited_new = frontier | visited
        print('visiting', (visited_new - visited))
        visited = visited_new
        # update frontier
        frontier_neighbors = reduce(set.union, [graph[f] for f in frontier])
        frontier = frontier_neighbors - visited

# same as example above
graph = {
            'A': {'B', 'C'},
            'B': {'A', 'D', 'E'},
            'C': {'A', 'F', 'G'},
            'D': {'B'},
            'E': {'B', 'H'},
            'F': {'C'},
            'G': {'C'},
            'H': {'E'}
        }
bfs(graph, 'A')

visiting {'A'}
visiting {'C', 'B'}
visiting {'D', 'E', 'F', 'G'}
visiting {'H'}


## Work/Span of BFS

- Since we have no recurrence, we will instead simply add up costs of each level.
- But, work done at each level varies depending on how many nodes it contains.

What we do know:

- Every reachable node appears in the frontier exactly **once**
- Likewise, each edge is processed exactly **once**
- Therefore work is $O(|V| + |E|)$

In [None]:
# improve work using sequences.
# TODO: finish
def bfs(graph, source, depth):
    visited = set()
    frontier = set([source])
    while len(frontier) > 0:
        # update visited
        visited_new = frontier | visited
        print('visiting', (visited_new - visited))
        visited = visited_new
        # update frontier
        frontier_neighbors = reduce(set.union, [graph[f] for f in frontier])
        frontier = frontier_neighbors - visited


Notice that we didn't have a recurrence because we didn't use divide and conquer.

Why is it hard to do divide and conquer on a graph?

## Correctness of BFS

TODO: finish

How can we keep track of the distance each node is from the source?

TODO: finish

How can we keep track of the paths?

TODO: finish