YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia
Frequently Asked Questions
What is YAGO?
YAGO is an ontology, i.e., a database with knowledge about the real world. YAGO contains both entities
(such as movies, people, cities, countries, etc.) and facts about these entities (who played in which movie,
which city is located in which country, etc.). All in all, YAGO contains 10 million entities and 120 million facts.
What is so special about YAGO?
YAGO2 is special in several ways:
- The accuracy of YAGO2 has been manually evaluated, proving a confirmed accuracy of 95%. Every relation is annotated with its confidence value.
- YAGO2 is an ontology that is anchored in time and space. YAGO2 attaches a temporal dimension and a spacial dimension to many of its facts and entities.
- YAGO2 is particularly suited for disambiguation purposes, as it contains a large number of names for entities. It also knows the gender of people.
How is the taxonomy of YAGO structured?
YAGO classifies each entity into a taxonomy of classes. This is done in the way that RDF handles classes: Every entity is an instance of one or multiple classes.
Every class (except the root class) is a subclass of one or multiple classes. This yields a hierarchy of classes — the taxonomy. The YAGO taxonomy is the backbone of the ontology,
and is designed with much care and attention to correctness.
For those interested in the details of that taxonomy, we provide here a more in-depth explanation of the classes. The taxonomy consists of 3 layers:
- The highest layer is the class taxonomy from WordNet. Each class name is of the form wordnet_XXX_YYY,
where XXX is the name of the concept (e.g., singer), and YYY is the WordNet 3.0 synset id of the concept (e.g., 110599806).
For example, the class of singers is wordnet_singer_110599806.
Each class is connected to its more general class by the subclassof relationship.
The highest class in the taxonomy is wordnet_entity_100001740
- The middle layer of the taxonomy consists of classes that have been derived from Wikipedia categories. For example, one class is
wikicategory_American_rock_singers, derived from the Wikipedia category American rock singers. Each of these classes is connected to
one class of the WordNet layer by a subclassof relationship. In the example, wikicategory_American_rock_singers subclassof wordnet_singer_110599806.
Not all Wikipedia categories become classes in YAGO.
- The lowest layer of the taxonomy is the layer of instances. Instances comprise individual entities such as rivers, people, or movies. For example, this layer
contains Elvis_Presley. Each instance is connected to one or multiple classes of the higher layers by the relationship type.
In the example: Elvis_Presley type wikicategory_American_rock_singers.
This way, you can walk from the instance up to its class by
type, and then further up by
subclassof.
Does YAGO have thematic domains?
YAGO provides a class hierarchy in the sense of RDF: Every subclass represents a set of instances that is a subset of the set of
instances of the super class. For example, Elvis Presley is in the class of singers (because Elvis is a singer). This class is a subclass of the class of
persons, because every singer is a person. This is different from a thematic domain hierarchy! A thematic domain hierarchy would contains items such as
"Football", "Sports", "Music" etc. In such a hierarchy, Elvis would be in the domain "Music". At the moment, YAGO does not contain such a thematic
domain hierarchy.
How do labels work in YAGO?
In line with RDF, YAGO distinguishes between the entity (Elvis_Presley) and names for that entity ("Elvis", "The King",
"Mr. Presley", etc.). The reason for this distinction is that one entity can have multiple names. Also, one name can mean multiple entities.
Consider, e.g., the name "The King", which is highly ambiguous. YAGO links a name to the entity by the relationship means. For example, YAGO contains the fact
"Elvis" means Elvis_Presley. In addition, YAGO knows, for each entity, its preferred name.
This name is designated by the relationship hasPreferredName. For example, Elvis_Presley hasPreferredName "Elvis Presley".
Even if Elvis has multiple names, his standard name is "Elvis Presley". In addition, YAGO contains for each name its preferred meaning.
This meaning is designated by hasPreferredMeaning. In the example, "Elvis" hasPreferredMeaning Elvis_Presley. Even if the word
"Elvis" can refer to multiple entities, its default meaning is Elvis Presley.
How do meta facts work?
YAGO gives a fact identifier to each fact. For example, the fact
Elvis_Presley type person could have the fact identifier
#42.
In the native version of YAGO, the fact identifiers are simply an additional column. In the RDF version of YAGO, the fact identifiers work through reification.
They get lost in the Jena version of YAGO.
YAGO contains facts about these fact identifiers. For example, YAGO contains
#42 occursSince 1935-01-08
#42 occursUntil 1977-08-16
#42 wasFoundIn wikipedia:Elvis Presley
These facts mean that Elvis was a person from the year 1935 to the year 1977, and that this fact was found in Wikipedia.
What is the difference between YAGO and DBpedia?
DBpedia is a community effort to extract structured information from Wikipedia. In this sense,
both YAGO and DBpedia share the same goal of generating a structured ontology. The projects differ in their foci. In YAGO,
the focus is on precision, the taxonomic structure, and the spatial and temporal dimension. For a detailed comparison of the projects,
see Chapter 10.3 of our AI journal paper "
YAGO2: A Spatially and Temporally Enhanced
Knowledge Base from Wikipedia".
Where can I find more information about YAGO?
How can I access YAGO?