The Core Version of the YAGO2 ontology contains all the entities and ontological facts extracted from Wikipedia (from 2010-08-17), with categories mapped to the WordNet class hierarchy. In addition, we include facts extracted from GeoNames for all the entities we can map to the Wikipedia ones (i.e. we extract only facts for the entities that were extracted from Wikipedia, and not add entities only present in GeoNames). It also contains multi-lingual data from the Universal WordNet (UWN). In total, the Core Version of YAGO2 contains 2.6 million entities and about 124 million facts.
The Full Version of the YAGO2 ontology contains everything that is in the Core Version. Additionally, it contains all the entities and facts from GeoNames - a complete import of the GeoNames data (from a dump of August 2010). It also contains textual and structural data from Wikipedia: all links+anchor texts between the YAGO2 entities, all Wikipedia category names (even those that are not used as a type in YAGO2), as well as the titles of references. In total, it contains nearly 10 million entities and more than 447 million facts.
We provide a Java converter tool for the following purposes:
To use these tools, please download YAGO in its native format, in either version. Then download the Java tool (11Mb).
We also provide a tool to decode the native backslash encoding of entity names to UTF-8: Download the de-/encoding tool.
The YAGO2 Ontology is licensed under a Creative Commons Attribution 3.0 License by the YAGO team of the Max-Planck Institute for Informatics.
You can still download the original YAGO(1) ontology here.