Skip to content

Graph Note

Xin Wan edited this page Mar 12, 2018 · 27 revisions

How to represent a graph?

1. adjacent matric

class Graph {
    boolean[][] adjacentMatrix;
}

class Graph {
    Weight[][] adjacentMatric;
}

Space: O(|V|^2) Pros: Lookup edge for a certain pair - O(1) Cons: Possible sparse matrix if |E| << |V|

2. Adjacent List

class Vertex {
    int id;
    List<Vertex> neighbors;
}

class Graph {
    List<Vertex> vertices;
}

Pros: space: O(|V| + |E|) Cons: Time complexity is O(v) to check whether there is an edge from a node to the other.

Use adjacent list to compress the space needed.

Some optimaztion about lookup for a certain pair of vertices, if there is any edge connecting them.

class Vertex {
 int id;
 Set<Vertex> neighbors;
}

class Vertex {
    int id;
    Map<Vertex, Weight> neighbors;
}

class Graph {
    Set<Vertex> vertices;
}

Some common scenarios:

  1. Using integers to denote each of the nodes in the graph.
Map<Integer, List/Set<Integer>>
     vertex    adjacent list(edges)

Map<Integer, Map<Integer, Integer>>
     vertex    neighbor, weight 

3. edge lists

[{n1, n2}, {n1, n4}, {n2, n3}, {n3, 4}]

class Graph {
    List<Edge> edges;
}

Space: O(|E|)

Connected Graph

A graph is connected only if it meets the requirement: From any vertex vi, there exists a path from vi to any other vertex vj in the graph.

It does not related to whether the graph is directed or not

Tips: A connected graph can be represented by one of the nodes in graph. (GraphNode seed)

Cyclic / Acyclic

A very important type of graph is Directed Acyclic Graph(DAG)

DFS

What kind of problems DFS/BFS are solving? Similar things and different things.

  • they can both traverse all vertices reachable from starting node.
  • They can both find a path from a start vertex to a goal vertex. (DFS can find all paths from starting node.)
  • backtracking: related to "transition procedure" (path) instead of "state" (vertex) But
  • from some problems, DFS can do but BFS can not. (eg: Backtracking)
  • for the same problem, DFS can be more efficient than BFS, or BFS can be more efficient than DFS, according to different types of the input graphs.

How would you do DFS from graph

Classical DFS - Marking visited method 1

Starting from a node, traverse all the reachable nodes using depth first heuristric, and each node is only visited once.

  • mark visited when first time entering the node
  • try to mark visited the node before all its decendants.
  • If we don't mark visited, it may cause cycle or duplicates.

Mark the node as visited when first time entering the node.

enum State {
	UNVISITED, VISITED;
}

class Vertex {
	State state = UNVISITED;
	List<Vertex> neighbors;
}

void dfs(Vertex v) {
	if (v.state == VISITED) return;

	v.state = VISITED; // first time entering the node
	for (Vertex n : v.neighbors) {
		dfs(n);
	}
}
  • Time: O(|V| + |E|)
  • 每个node只会被visited一次 《=》每个node只会被expand一次O(|V|
  • 在expand一个node的时候,从这个node出发的所有edge都被generate一次 => 总共generate的次数 == edge的总数 O(|E|
  • 每个node可能被generate多次,但是只有第一次有效(mark visited),并且会继续从这个node expand

Why/When do we care about if a node is visited once or more times?

  • Usually we only cares about the "if the vertex is reachable", not which exact path reaching the vertex.
  • If we care about different possible paths, it is a different story.

Comparing to DFS on Binary Tree (a special kind of graph), why we don't need to mark visited in this case?

  • because for each node in the tree, there is <= 1 incoming edges, there is no alternate paths reaching the node, in another word, during DFS, there is only one way to reach the node so that marking visited is unnecessary.

Mark Visited Method 2 (Can we mark it visited after all its descendants?)

Starting N1, after visited all the nodes reachable from N1, then we mark N1 visited once and only once.

  • Only after visited all the reachable nodes from n, we can make n visited
  • we need some additional state for each node in "visiting": a special marker of the nodes on the current DFS path to avoid duplicately visiting nodes.
unvisited
visiting
visited
class Vertex {
    State state = UNVISITED;
    List<Vertex> neighbors;
}

enum State {
	UNVISITED, VISITING, VISITED;
}

void travers(Vertex v) {
	dfs(v);
}

void dfs(Vertex v) {
	if (v.state == VISITED) return;
    if (v.state == VISITING) return; // there is a cycle.

	v.state = VISITING; // first time entering the node
	for (Vertex n : v.neighbors) {
		dfs(n);
	}
	v.state = VISITED
}

Time: O(|V| + |E|) - polynomial at most |V|^2.

  • for each node, only do dfs once and only once.
  • for each edge, only do dfs once and only once.

In what scenario, we will need method 2 instead of method 1?

  • We would like to follow the order of mark the node visited last.
  • We would like to know if there is any cycle in the graph. If we don't care about distinguishing the cycle, we can use method 1.

Three Categories of typical graphs

                                       Mark Visited 1                     Mark Visited 2

Tree                       do not need to mark visit at all: XXX               XXX

DAG                                    necessary                             not need visiting

Graph possibly with cycle           OK (can not detect cycle)                OK(can detect cycle)

One last thing: Starting from one node x, DFS can only help you traverse all the nodes reachable from x.

So, what we need to do:

class Graph {
	List<Vertex> vertices;
}

void traverse(Graph graph) {
	for (Vertex v : graph.vertices) {
            //if v is not visited -> there is new (weak) connected compontent.
	    dfs(v); // method 1 & method 2 both applicable
	}
}

DFS vs. Recursion:

Recursion:

  • aspect1 implementation: function calling itself.
  • aspect2 pure recursion: using subproblem to solve larger problem.

Implementation of DFS.

  • can use recursion
  • or use explicit stack

Backtracking vs. DFS

  • Find all paths starting from root in a binary tree. See Mark visited method 3.

Mark visited method 3 (O(# of paths))

We care about the path, instead of if the vertex is reachable in this case.

  • find all possible paths from the start node, on the path there is no duplicate vertices.
  • 这个mark visited只是mark在当前DFS path上的node。 为了保证,在当前这条DFS路径上没有cycle*
enum State {
	UNVISITED, VISITED;
}

class Vertex {
	List<Vertex> neighbors;
	State state = State.UNVISITED;
}

void allPaths(Vertex v) {
	if (v.state == VISITED) return;

	v.state = VISITED;
	for (Vertex n : v.neighbors) {
		allPaths(n);
	}
	v.state = UNVISITED;
}
                                     Mark Visited                  Mark Visited 1,2
Tree                       no need (reason: no cycle), O(n)          O(n) = O(|V| + |E|) : **|E| = |V| - 1**
DAG                        no need (reason: no cycle). worst O(b^V)   O(|E| + |V|)
Graph with cycle              "visited" necessary, worst O(b^|V|)               O(|E| + |V|)
b = branching factor

Determine if a graph is (strongly) connected

for any possible pair of nodes n1, n2 in this graph, there exists a path n1 -> n2.

1.1 directed graph

Method 1: Brute Force:

for each pair (nx, ny) --> determine if ny is reachable from nx -> starting from nx do DFS 1,2 or BFS to see if ny is reachable.

Method 2: Optimized brute force

for each nx: determine if all the other nodes are reachable. **using DFS mark visited 1, 2 or BFS. **

O(|V| * (|V| + |E|))

Method 3:

Connected graph: choose an arbitrary node n:

  • all the other nodes are reachable from n
  • all the other nodes can reach this node n

So,

  • step 1: pick an arbitrary node n as start, just do DFS/BFS, mark visited way 1 or 2. To check if all the other nodes are reachable from n.
  • step 2: reverse the whole graph (reverse the direction of each edges) and do step 1 again. to check from any other node y, x is reachable from y. O(|V| + |E|) * 3 = O(|V| + |E|)

Undirected graph

Undirected graph connected <=> picking an arbitrary node n, if all the other nodes are reachable. Steps:

  • pick an arbitrary node as start, just do DFS/BFS, mark visited way 1 or 2.
  • if all nodes in the graph are reachable.
  • O(|V| + |E|)

Determine if a graph has cycle

Directed graph

Method 1: DFS mark visited 2

class Vertex {
    List<Vertex> neighbors;
    State state = State.UNVISITED;
}

List<Vertex> graph; //input is a graph, we don't know if it is connected.

boolean hasCycle(List<Vertex> graph) {
    for (Vertex v : graph) {
        if (hasCycle(v)) {
            return true;
        }
    }

    return false;
}

boolean hasCycle(Vertex v) {
    if (v.state == State.VISITED) {
        return false;
    }

    if (v.state == State.VISITING) {
        return true; // there is a cycle.
    }

    v.state = State.VISITING;
    for (Vertex n : v.neighbors) {
        if (hasCycle(n)) {
            return true;
        }
    }

    v.state = State.VISITED;
    return false;
}

O(|V| + |E|)

Method2: DFS mark visited 3 (finding all paths)

boolean hasCycle(Vertex v) {
    if (v.state == State.VISITING) {
        return true; // there is a cycle.
    }

    v.state = State.VISITING;
    for (Vertex n : v.neighbors) {
        if (hasCycle(n)) {
            return true;
        }
    }

    v.state = State.UNVISITED; // backtracking
    return false;
}

possibly exponential - O(b^|V|)

Method 3: Topological Order based on BFS

重要概念 A directed graph does not have cycle <=> There is topological order. A directed graph has cycle <=> There is no topological order.

Undirected graph

Assuming there is only one connected area.

Method 1: DFS - mark visited 1

enum State {
    UNVISITED, VISITED;
}

class Vertex {
    List<Vertex> neighbors;
    State state = State.UNVISITED;
}

boolean hasCycle(Vertex v, Vertex prev) { // notice there is a prev node
    if (v.state == State.VISITED) {
        return true; // has cycle;
    }
    v.state = State.VISITED;
    for (Vertex n : v.neighbors) {
        if (n != prev && hasCycle(n, v)) {
            return true;
        }
    }

    return false;
}

hasCycle(seed, null) O(|V| + |E|)

Method 2: BFS - mark visited, record the previous node.

Method 3:

The undirected graph is a tree, then it satisfies the following 3 properites:

  1. no cycle
  2. |E| + |V| - 1 --> can be proved by induction
  3. connected

** if the undirected graph is a tree -> 1, 2, 3**

  • 1,2 => the undirected graph is a tree
  • 2,3 => the undirected graph is a tree
  • 1,3 => the undirected graph is a tree

Determine if a graph is a tree?

Directed graph

Method 1: weak connectivity

  • from root, all the other vertices are reachable
  • from root, to any other vertex, there is one and only one path

root nodes: indegree == 0, the only vertex with indegree 0.

  1. Find the root. - root is the node with 0 in degree.
traverse the whole graph -> traverse each of the edges (u, v)
for each edge:
   v.indegree++;

O(|V| + |E|)
  1. from the root, DFS, mark visited for the node when first time visiting, Mark visited 1
  • if there is any nodes visited more than once.
  • if there is any nodes not reachable.
  • O(|V| + |E|)

Method 2: (???)

  • from root, all the other vertices are reachable
  • all the other nodes are with indegree 1. (|E| = |V| - 1)

Undirected graph

already covered above.

Clone a directed / undirected graph

  1. directed graph
  2. undirected graph
  3. make a reverse copy of the given directed graph. 1 and 2 are the same.

Scenario 1: The given graph is represented by List using adjacency list. for loop to traverse all the vertex and copy the vertices and the corresponding edges.

Scenario 2: The given graph is only represented by a seed node, because all the other nodes are reachable from the seed node. (week connected)

  • we need some way to iterate all the nodes in the graph
  • when we iterate each node cur, we copy the node and copy all the outgoing edges from the cur node

we don't care about if there is any cycle, we can just use mark visited method 1 or 2 DFS/BFS.

Map<Vertex, Vertex> map:
- record the mapping from the old node to new node
- record the visited old nodes

void copyGraph(Vertex x) {
    map.put(x, new Vertex(x));
    copy(x);
}

void copy(Vertex x) {
    for (Vertex n : x.neighbors()) {
        if (!map.containsKey(n)) {
            map.put(n, new Vertex(n));
            copy(n);
        }

        map.get(x).neighbors.add(map.get(n)); // we need to copy all the edges from current veretx x.
    }
}

void copy(Vertex x) {
    for (Vertex n : x.neighbors()) {
        if (!map.containsKey(n)) {
            map.put(n, new Vertex(n));
            copy(n);
        }

        map.get(n).neighbors.add(map.get(x)); // we need to copy all the edges entering current veretx x.
    }
}

O(|V| + |E|)

Two ways to check if the node has been visited.

dfs(Vertex v) { // do not know if v is visited, check if it is visited in this level.
    if v is visited
        return;
    mark v visited;
    for n in v.neighbors:
        dfs(n);

}
vs 

dfs(Vertex v) { // v is visited, check if it is visited at previous level.
    for n in v.neighbors:
        if n is visited: 
            continue;
        mark n visited.
        dfs(n);

}

Topological Order

The valid input is DAG:

  • Directed
  • Acyclic

There are basically two solutions for Topo-Sort

  • Based on BFS

  • Based on DFS

Proposal 1 (DFS)

what kind of DFS we will need?

A task can be executed only when all its dependencies are cleared!

We reverse the edge direction in the graph.

When can you make the node as visited (executed)?

Only when all vertices reachable from v are marked visited, we can mark v visited - to visit each vertex once and only once.

  • Mark visited 2 to solve the problem of topological order
  • topological order <==> the order of marking the nodes visited

So, Step1 - reverse the whole graph

Step2 - Run DFS mark visited 2

enum State {
	UNVISITED, VISITING, EXECUTED;
}

class Vertex {
	List<Vertex> dependencies;
	State state = State.UNVISITED;
}

List<Vertex> topoOrder(List<Vertex> graph) {
	List<Vertex> topoOrder = new ArrayList();

	for (Vertex v : List<Vertex> graph) {
		execute(v, topoOrder);
	}

	return topoOrder;
}

void execute(Vertex v, List<Vertex> topoOrder) {
	if (v.state == State.EXECUTED) {
		return;
	}

	if (v.state == State.VISITING) {
		throw new IlegelStateException("Cycle detected."); // there is a cycle
	}

	v.state = State.VISITING;
	for (Vertex n : v.dependencies) {
		executed(n, topoOrder);
	}

	v.state = State.EXECUTED; // run the task after all its dependencies finish.
	topoOrder.add(v);
}

Better Proposal 2 (DFS)

Step 1: Run DFS mark visited 2 on the given graph to get the topo order list.

Step 2: Reverse the topo order list

Recursion vs. Recursion + Memorization vs. Topological Order

F(n) =F(n - 1) + F(n - 2) --> the topological order must be 1, 2, 3, 4, 5 ... n

Recursion + Memorization ==> the order of calculating the results for each of the subproblems is following the topological order.*

Classic Questions

  1. Two int arrays with same length, how to get the dot product?

1.1 If the array is very long, but most of elements in the arrays are 0. How to improve it?

(Answer: Two lists of Pairs (index, value). Move them together)

Time: O(# of non-zero elemets in both of the two arrays)

1.2 What if one of the array has very samll number of non-zeros, the other one has a lot of non-zeros? Two lists with size n1, n2, where n1 >> n2.

(Answer: binary search. O(n2 * logn1))

Clone this wiki locally