Most people used depth-first search. However, the two fastest programs were using topological sort ( Chris Allsopp's program got the prize, and Daniel Pike's program was just a few percentage points slower in my tests). Some of the depth-first search programs also looked very good but unfortunately had minor bugs in them.
The most common problem with DFS programs was not exploring all of the graph: they would do DFS from some node and stop, potentially missing a cycle. For example, if a graph looked as follows:
1 --> 2 2 --> 1 3 -->and the enumeration of nodes happened to start with 3, then doing DFS from 3 would not discover the cycle.
Below I discuss some of the solutions and their complexity.
To colour the nodes white, grey or black in the loops() method, we put them in corresponding hash tables. If a node is inserted in a white hash table, its colour is white, and so on. We have to make sure the node is always in only one of the three hash tables.
The outline of the method is as follows:
colour all nodes white for each node in the graph if the node is white do DFS with cycle detection from this node return falseThe DFS cycle detection from node s goes like this:
create a stack colour s grey put it on the stack while the stack is non-empty look at node n on top of the stack get n's adjacency list iterate through the list until find a white node m or reached the end if found a grey node, return true if found a white node m colour m grey push m on the stack stop iterating if reached the end without finding a white node colour n black pop the stackThe worst case time complexity of this algorithm is O(V + E2), as shown below.
The worst case is when there is no cycle, so we explore the whole graph. Each node gets to go on the stack once (it only goes on the stack if it is white, and doing DFS from it re-colours it black eventually). For each node on the stack, we backtrack to it as many times as there are outgoing edges from it. This is the number of times the node is found on top of the stack. When a node is on top of the stack, we iterate through its adjacency list to find an unvisited neighbour. This implementation does it in an inefficient way, always starting from the beginning of the adjacency list and going forward until such a neighbour is found. For example, if we have a graph like this:
1 --> 2,3,4,5then the first time when node 1 is on top of the stack we make one step in the iteration until we find 2, the second time we make two steps to find node 3, next time we do three steps etc.:
traversals of 1's adjacency list: 2 first traversal 2,3 second traversal 2,3,4 third traversal 2,3,4,5 fourth traversalSo the total number of steps in traversal of a node's adjacency list (over the whole DFS) is quadratic in the length of the list (to be precise, (e1+1)e1/2 where e1 is the size of 1's adjacency list, let's just say e12).
Since we do those traversals for every node reachable from s, the complexity of DFS from s depends on how many nodes are reachable from s and the sizes of their adjacency lists. Assuming there are n nodes reachable from s, we push, colour and pop n nodes on the stack and do e12+ e22+ ... + en2 checks, where ei is the size of the adjacency list of the ith node. Complexity of this DFS from s is O(vs + es2) where vs is the size of the subgraph reachable from s and es is the number of edges in it (e12+ e22+ ... + en2 is less than es2 = (e1+...+en)2).
Now for the whole algorithm: since there is an outer loop over all nodes, it looks as if we need to multiply time complexity for each DFS search by the number of nodes. In fact, we do not do DFS from each node, only from the white ones, that is, those not involved in the previous DFS search. Essentially the graph is split into subgraphs (nodes reachable from the first node in the enumeration and all the rest, for example) and each subgraph is searched once, so we just add up the times it takes to search each subgraph. Supposing there are k subgraphs, and each is searched in time (vi + ei2), the whole graph can be searched in time v1 + v2 + ... + vk + e12 + ... + ek2 which is O((V + E2) (V = v1 + v2 + ... + vk and E = e1 + ... + ek).
create a queue enqueue all nodes with in-degree 0 set deleted counter to be 0 while queue is non-empty dequeue a node n increment deleted counter for each node in n's adjacency list decrement its in-degree; if it becomes 0, enqueue the node for each node, make in-degree equal to saved copy (restore the graph to original state) return (deleted counter < graph size)The first two operations are linear in V; each node is enqueued once; each adjacency list traversed once; complexity O(V+E). Running time is better than my model solution for DFS.