The proliferation of knowledge-sharing communities such as Wikipedia and the progress in scalable information extraction from Web and text sources have enabled the automatic construction of very large knowledge bases. Endeavors of this kind include projects such as DBpedia, Freebase, KnowItAll, ReadTheWeb, and YAGO. These projects provide automatically constructed knowledge bases of facts about named entities, their semantic classes, and their mutual relationships. They contain millions of entities and hundreds of millions of facts about them. Such world knowledge in turn enables cognitive applications and knowledge-centric services like disambiguating natural-language text, semantic search for entities and relations in Web and enterprise data, and entity-oriented analytics over unstructured contents. Prominent examples of how knowledge bases can be harnessed include the Google Knowledge Graph and the IBM Watson question answering system. This tutorial presents state-of-the-art methods, recent advances, research opportunities, and open challenges along this avenue of knowledge harvesting and its applications. Particular emphasis will be on the twofold role of knowledge bases for big-data analytics: using scalable distributed algorithms for harvesting knowledge from Web and text sources, and leveraging entity-centric knowledge for deeper interpretation of and better intelligence with Big Data.
The tutorial takes place at the ACM SIGMOD 2013 conference in New York City on Wednesday, June 26, 2013.