max planck institut
mpii logo Minerva of the Max Planck Society

The YAGO-NAGA Project:    Harvesting, Searching, and Ranking Knowledge from the Web

 Research   Publications   People 


The YAGO-NAGA project started in 2006 with the goal of building a conveniently searchable, large-scale, highly accurate knowledge base of common facts in a machine-processible representation.

We have already harvested knowledge about millions of entities and facts about their relationships, from Wikipedia and WordNet with careful integration of these two sources. The resulting knowledge base, coined YAGO, has very high precision and is freely available. The facts are represented as RDF triples, and we have developed methods and prototype systems for querying, ranking, and exploring knowledge. Our search engine NAGA provides ranked answers to queries based on statistical models.

Several interlinked sub-projects are growing on the YAGO-NAGA basis. Our vision is a confluence of Semantic Web (Ontologies), Social Web (Web 2.0), and Statistical Web (Information Extraction) assets towards a comprehensive repository of human knowledge. Our methodologies combine concepts, models, and algorithms from several fields, including database systems, information retrieval, statistical learning, and logical reasoning.


AIDA is a method, implemented in an online tool, for disambiguating mentions of named entities that occur in natural-language text or Web tables.


AMIE (Association Rule Mining under Incomplete Evidence in Ontological Knowledge Bases) is a joint project with the Ontologies group.


ANGIE is an active knowledge system for interactive exploration.


DEANNA is a framework for natural language question answering over structured knowledge bases.


HYENA is a multi-label classifier for entity types based on hierarchical taxonomies derived from YAGO2.


The Javatools are a suite of Java classes for a variety of small tasks, such as parsing, database interaction or file handling. They are used in the YAGO-NAGA project and available for download as well.

Knowledge Kaleidoscope

Gathering and ranking photos of named entities with high precision, high recall, and diversity.


LEILA was the predecessor of SOFIE.


NAGA is a new semantic search engine supporting keyword search for the casual user as well as graph queries with regular expressions for the expert user.


PATTY is a large collection of relations, arranged by synonyms and into subsumptions.


PRAVDA is a system based on label propagation for knowledge harvesting especially temporal knowledge.


Large-scale information extraction, a continuation of the SOFIE approach.


RDF-3X is an RDF storage and retrieval system that achieves excellent performance by following a RISC-style design philosophy.


SOFIE extracts information from Web sources.


URDF provides a framework of methods and tools for managing uncertain RDF knowledge bases.


UWN is a multilingual version of WordNet, describing meanings of words in different languages and their relationships.


YAGO is a huge semantic knowledge base, derived from Wikipedia, WordNet, and GeoNames. YAGO knows almost 10 million entities (e.g. persons, organizations, cities), and 120 million facts about these entities. Unlike other automatically assembled knowledge bases, YAGO has a manually confirmed accuracy of 95%. YAGO is freely available at