Current Release: 3.4.1
General Info System Overview New Features License Downloads Message Board Contact |
Focused (thematic) crawling is a relatively new, promising approach to improving the recall of expert search on the Web. It involves the automatic classification of visited documents into a user- or community-specific topic hierarchy (ontology). The quality of the training data for the classifier is the most critical issue and potential bottleneck for the effectivity and scale of a focused crawler.
The BINGO! implementation presents an approach to focused crawling that aims to overcome the limitations of the initial training data. To this end, BINGO! identifies, among the crawled and positively classified documents of a topic, characteristic "archetypes" and uses them for periodically re-training the classifier; this way the crawler is dynamically adapted based on the most significant documents seen so far.
The preliminary experiments indicate that the dynamic enhancement of training data based on archetypes improves the overall precision of a focused crawler by a substantial margin.