Links from rejected documents (i.e., documents that did not pass the classification test for a given topic) are considered for further crawling, too; however, we restrict the depth of additionally traversed links from such documents to a value of two. The rationale behind this threshold is that one often has to "tunnel" through topic-unspecific welcome or table-of-content pages before again reaching a thematically relevant document. When a document is reached that passes the classification test, the limit for the allowed crawling depth along this path is dropped.
The BINGO! crawler has no global limits on crawling depth. Rather it uses the filling degrees of the ontology's various topics as a stopping criterion. When a topic holds a certain number of successfully classified documents (say 200), BINGO! suspends crawling. At this point, link analysis and re-training are performed for all topics, and then crawling is resumed.