-
Notifications
You must be signed in to change notification settings - Fork 2
Graph Note
class Graph {
boolean[][] adjacentMatrix;
}
class Graph {
Weight[][] adjacentMatric;
}Space: O(|V|^2) Pros: Lookup edge for a certain pair - O(1) Cons: Possible sparse matrix if |E| << |V|
class Vertex {
int id;
List<Vertex> neighbors;
}
class Graph {
List<Vertex> vertices;
}Pros: space: O(|V| + |E|) Cons: Time complexity is O(v) to check whether there is an edge from a node to the other.
Use adjacent list to compress the space needed.
Some optimaztion about lookup for a certain pair of vertices, if there is any edge connecting them.
class Vertex {
int id;
Set<Vertex> neighbors;
}
class Vertex {
int id;
Map<Vertex, Weight> neighbors;
}
class Graph {
Set<Vertex> vertices;
}
Some common scenarios:
- Using integers to denote each of the nodes in the graph.
Map<Integer, List/Set<Integer>>
vertex adjacent list(edges)
Map<Integer, Map<Integer, Integer>>
vertex neighbor, weight [{n1, n2}, {n1, n4}, {n2, n3}, {n3, 4}]
class Graph {
List<Edge> edges;
}Space: O(|E|)
A graph is connected only if it meets the requirement: From any vertex vi, there exists a path from vi to any other vertex vj in the graph.
It does not related to whether the graph is directed or not
Tips: A connected graph can be represented by one of the nodes in graph. (GraphNode seed)
A very important type of graph is Directed Acyclic Graph(DAG)
- they can both traverse all vertices reachable from starting node.
- They can both find a path from a start vertex to a goal vertex. (DFS can find all paths from starting node.)
- backtracking: related to "transition procedure" (path) instead of "state" (vertex) But
- from some problems, DFS can do but BFS can not. (eg: Backtracking)
- for the same problem, DFS can be more efficient than BFS, or BFS can be more efficient than DFS, according to different types of the input graphs.
Starting from a node, traverse all the reachable nodes using depth first heuristric, and each node is only visited once.
- mark visited when first time entering the node
- try to mark visited the node before all its decendants.
- If we don't mark visited, it may cause cycle or duplicates.
Mark the node as visited when first time entering the node.
enum State {
UNVISITED, VISITED;
}
class Vertex {
State state = UNVISITED;
List<Vertex> neighbors;
}
void dfs(Vertex v) {
if (v.state == VISITED) return;
v.state = VISITED; // first time entering the node
for (Vertex n : v.neighbors) {
dfs(n);
}
}- Time: O(|V| + |E|)
- 每个node只会被visited一次 《=》每个node只会被expand一次O(|V|
- 在expand一个node的时候,从这个node出发的所有edge都被generate一次 => 总共generate的次数 == edge的总数 O(|E|
- 每个node可能被generate多次,但是只有第一次有效(mark visited),并且会继续从这个node expand
Why/When do we care about if a node is visited once or more times?
- Usually we only cares about the "if the vertex is reachable", not which exact path reaching the vertex.
- If we care about different possible paths, it is a different story.
Comparing to DFS on Binary Tree (a special kind of graph), why we don't need to mark visited in this case?
- because for each node in the tree, there is <= 1 incoming edges, there is no alternate paths reaching the node, in another word, during DFS, there is only one way to reach the node so that marking visited is unnecessary.
Starting N1, after visited all the nodes reachable from N1, then we mark N1 visited once and only once.
- Only after visited all the reachable nodes from n, we can make n visited
- we need some additional state for each node in "visiting": a special marker of the nodes on the current DFS path to avoid duplicately visiting nodes.
unvisited
visiting
visited
class Vertex {
State state = UNVISITED;
List<Vertex> neighbors;
}
enum State {
UNVISITED, VISITING, VISITED;
}
void travers(Vertex v) {
dfs(v);
}
void dfs(Vertex v) {
if (v.state == VISITED) return;
if (v.state == VISITING) return; // there is a cycle.
v.state = VISITING; // first time entering the node
for (Vertex n : v.neighbors) {
dfs(n);
}
v.state = VISITED
}Time: O(|V| + |E|) - polynomial at most |V|^2.
- for each node, only do dfs once and only once.
- for each edge, only do dfs once and only once.
In what scenario, we will need method 2 instead of method 1?
- We would like to follow the order of mark the node visited last.
- We would like to know if there is any cycle in the graph. If we don't care about distinguishing the cycle, we can use method 1.
Mark Visited 1 Mark Visited 2
Tree do not need to mark visit at all: XXX XXX
DAG necessary not need visiting
Graph possibly with cycle OK (can not detect cycle) OK(can detect cycle)
One last thing: Starting from one node x, DFS can only help you traverse all the nodes reachable from x.
So, what we need to do:
class Graph {
List<Vertex> vertices;
}
void traverse(Graph graph) {
for (Vertex v : graph.vertices) {
//if v is not visited -> there is new (weak) connected compontent.
dfs(v); // method 1 & method 2 both applicable
}
}
Recursion:
- aspect1 implementation: function calling itself.
- aspect2 pure recursion: using subproblem to solve larger problem.
Implementation of DFS.
- can use recursion
- or use explicit stack
Backtracking vs. DFS
- Find all paths starting from root in a binary tree. See Mark visited method 3.
We care about the path, instead of if the vertex is reachable in this case.
- find all possible paths from the start node, on the path there is no duplicate vertices.
- 这个mark visited只是mark在当前DFS path上的node。 为了保证,在当前这条DFS路径上没有cycle*
enum State {
UNVISITED, VISITED;
}
class Vertex {
List<Vertex> neighbors;
State state = State.UNVISITED;
}
void allPaths(Vertex v) {
if (v.state == VISITED) return;
v.state = VISITED;
for (Vertex n : v.neighbors) {
allPaths(n);
}
v.state = UNVISITED;
} Mark Visited Mark Visited 1,2
Tree no need (reason: no cycle), O(n) O(n) = O(|V| + |E|) : **|E| = |V| - 1**
DAG no need (reason: no cycle). worst O(b^V) O(|E| + |V|)
Graph with cycle "visited" necessary, worst O(b^|V|) O(|E| + |V|)
b = branching factor
for any possible pair of nodes n1, n2 in this graph, there exists a path n1 -> n2.
for each pair (nx, ny) --> determine if ny is reachable from nx -> starting from nx do DFS 1,2 or BFS to see if ny is reachable.
for each nx: determine if all the other nodes are reachable. **using DFS mark visited 1, 2 or BFS. **
O(|V| * (|V| + |E|))
Connected graph: choose an arbitrary node n:
- all the other nodes are reachable from n
- all the other nodes can reach this node n
So,
- step 1: pick an arbitrary node n as start, just do DFS/BFS, mark visited way 1 or 2. To check if all the other nodes are reachable from n.
- step 2: reverse the whole graph (reverse the direction of each edges) and do step 1 again. to check from any other node y, x is reachable from y. O(|V| + |E|) * 3 = O(|V| + |E|)
Undirected graph connected <=> picking an arbitrary node n, if all the other nodes are reachable. Steps:
- pick an arbitrary node as start, just do DFS/BFS, mark visited way 1 or 2.
- if all nodes in the graph are reachable.
- O(|V| + |E|)
class Vertex {
List<Vertex> neighbors;
State state = State.UNVISITED;
}
List<Vertex> graph; //input is a graph, we don't know if it is connected.
boolean hasCycle(List<Vertex> graph) {
for (Vertex v : graph) {
if (hasCycle(v)) {
return true;
}
}
return false;
}
boolean hasCycle(Vertex v) {
if (v.state == State.VISITED) {
return false;
}
if (v.state == State.VISITING) {
return true; // there is a cycle.
}
v.state = State.VISITING;
for (Vertex n : v.neighbors) {
if (hasCycle(n)) {
return true;
}
}
v.state = State.VISITED;
return false;
}O(|V| + |E|)
boolean hasCycle(Vertex v) {
if (v.state == State.VISITING) {
return true; // there is a cycle.
}
v.state = State.VISITING;
for (Vertex n : v.neighbors) {
if (hasCycle(n)) {
return true;
}
}
v.state = State.UNVISITED; // backtracking
return false;
}possibly exponential - O(b^|V|)
重要概念 A directed graph does not have cycle <=> There is topological order. A directed graph has cycle <=> There is no topological order.
Assuming there is only one connected area.
enum State {
UNVISITED, VISITED;
}
class Vertex {
List<Vertex> neighbors;
State state = State.UNVISITED;
}
boolean hasCycle(Vertex v, Vertex prev) { // notice there is a prev node
if (v.state == State.VISITED) {
return true; // has cycle;
}
v.state = State.VISITED;
for (Vertex n : v.neighbors) {
if (n != prev && hasCycle(n, v)) {
return true;
}
}
return false;
}hasCycle(seed, null) O(|V| + |E|)
The undirected graph is a tree, then it satisfies the following 3 properites:
- no cycle
- |E| + |V| - 1 --> can be proved by induction
- connected
** if the undirected graph is a tree -> 1, 2, 3**
- 1,2 => the undirected graph is a tree
- 2,3 => the undirected graph is a tree
- 1,3 => the undirected graph is a tree
- from root, all the other vertices are reachable
- from root, to any other vertex, there is one and only one path
root nodes: indegree == 0, the only vertex with indegree 0.
- Find the root. - root is the node with 0 in degree.
traverse the whole graph -> traverse each of the edges (u, v)
for each edge:
v.indegree++;
O(|V| + |E|)
- from the root, DFS, mark visited for the node when first time visiting, Mark visited 1
- if there is any nodes visited more than once.
- if there is any nodes not reachable.
- O(|V| + |E|)
- from root, all the other vertices are reachable
- all the other nodes are with indegree 1. (|E| = |V| - 1)
already covered above.
- directed graph
- undirected graph
- make a reverse copy of the given directed graph. 1 and 2 are the same.
Scenario 1: The given graph is represented by List using adjacency list. for loop to traverse all the vertex and copy the vertices and the corresponding edges.
Scenario 2: The given graph is only represented by a seed node, because all the other nodes are reachable from the seed node. (week connected)
- we need some way to iterate all the nodes in the graph
- when we iterate each node cur, we copy the node and copy all the outgoing edges from the cur node
we don't care about if there is any cycle, we can just use mark visited method 1 or 2 DFS/BFS.
Map<Vertex, Vertex> map:
- record the mapping from the old node to new node
- record the visited old nodes
void copyGraph(Vertex x) {
map.put(x, new Vertex(x));
copy(x);
}
void copy(Vertex x) {
for (Vertex n : x.neighbors()) {
if (!map.containsKey(n)) {
map.put(n, new Vertex(n));
copy(n);
}
map.get(x).neighbors.add(map.get(n)); // we need to copy all the edges from current veretx x.
}
}
void copy(Vertex x) {
for (Vertex n : x.neighbors()) {
if (!map.containsKey(n)) {
map.put(n, new Vertex(n));
copy(n);
}
map.get(n).neighbors.add(map.get(x)); // we need to copy all the edges entering current veretx x.
}
}O(|V| + |E|)
dfs(Vertex v) { // do not know if v is visited, check if it is visited in this level.
if v is visited
return;
mark v visited;
for n in v.neighbors:
dfs(n);
}
vs
dfs(Vertex v) { // v is visited, check if it is visited at previous level.
for n in v.neighbors:
if n is visited:
continue;
mark n visited.
dfs(n);
}
The valid input is DAG:
- Directed
- Acyclic
-
Based on BFS
-
Based on DFS
what kind of DFS we will need?
A task can be executed only when all its dependencies are cleared!
We reverse the edge direction in the graph.
When can you make the node as visited (executed)?
Only when all vertices reachable from v are marked visited, we can mark v visited - to visit each vertex once and only once.
- Mark visited 2 to solve the problem of topological order
- topological order <==> the order of marking the nodes visited
So, Step1 - reverse the whole graph
Step2 - Run DFS mark visited 2
enum State {
UNVISITED, VISITING, EXECUTED;
}
class Vertex {
List<Vertex> dependencies;
State state = State.UNVISITED;
}
List<Vertex> topoOrder(List<Vertex> graph) {
List<Vertex> topoOrder = new ArrayList();
for (Vertex v : List<Vertex> graph) {
execute(v, topoOrder);
}
return topoOrder;
}
void execute(Vertex v, List<Vertex> topoOrder) {
if (v.state == State.EXECUTED) {
return;
}
if (v.state == State.VISITING) {
throw new IlegelStateException("Cycle detected."); // there is a cycle
}
v.state = State.VISITING;
for (Vertex n : v.dependencies) {
executed(n, topoOrder);
}
v.state = State.EXECUTED; // run the task after all its dependencies finish.
topoOrder.add(v);
}Graph -- a list of nodes
L -> Empty list that will contain the topological order of the nodes
S -> **Data Structure** of all nodes with no incoming edges candidates
Initialize S: put all nodes with in-degree 0 into S - O(|V|)
while S is non-empty do
remove a node n from S
add n to tail of L
for each node m with an edge e from n to m do
//remove edge e from the graph
m.indegree--;
if m.indegree == 0 then
insert m into S
// S is empty
if graph has nodes then
return error (graph has at least one cycle)
else
return L (a topologically sorted order)
"Topo order <===> we need a queue" XXX this is really bad.
- Determine if the directed graph has only one valid topological order.
-> check for each step, if in the S there is only one choice.
- if there are multiple possible topological order, return the lexicographically smallest one.
Queue -> PriorityQueue.
- Given an ordered list of the vertices of a given graph, determine if it is the topological order.
- Method 1: Topological order
check if each element is in S. S -> use Set; (s -> all the candidates task can be executed next step --> all indegree 0 tasks)
- Method 2: Optimize space by only checking indegree
Map<Node, Indegree>
- check if the task can be executed <===> check if the indegree of the task is 0.
- Find all possible topological orders of a given graph
for x in (1, 2, 3, 4)
if x.indegree == 0
dfs(... next level)
- Suppose you can execute multiple tasks at a time, each of the task takes the same amount of time, that is the minimum time to finish all the tasks
- Be care of the key - value order in map. For example, if A depends on B,
- Regards to DFS, key is A, value is a list of B.
- Regards to BFS, key is B, value is a list of A.
Step 1: Run DFS mark visited 2 on the given graph to get the topo order list.
Step 2: Reverse the topo order list
F(n) =F(n - 1) + F(n - 2) --> the topological order must be 1, 2, 3, 4, 5 ... n
Recursion + Memorization ==> the order of calculating the results for each of the subproblems is following the topological order.*
- how do we know that a task can be executed? - if all the dependency tasks executed!
- how do we represent this? edge(a, b) means b depends on a. We only cares about in-degree: # of edges end at the vertex.
- which is the first task to be executed? in-degree == 0 - means no dependency tasks! what do we do when we execute one task?
- if a -> b and a is executed, then b's dependency on a is eliminated, we can let b's in-degree -1
- Two int arrays with same length, how to get the dot product?
1.1 If the array is very long, but most of elements in the arrays are 0. How to improve it?
(Answer: Two lists of Pairs (index, value). Move them together)
Time: O(# of non-zero elemets in both of the two arrays)
1.2 What if one of the array has very samll number of non-zeros, the other one has a lot of non-zeros? Two lists with size n1, n2, where n1 >> n2.
(Answer: binary search. O(n2 * logn1))
- 不需要一开始把整个图构建出来,因为我们解决问题的时候很可能只会遍历其中一部分。