Decoration
max planck institut
informatik
mpii logo Minerva of the Max Planck Society
 

AMIE: Association Rule Mining under Incomplete Evidence in Ontological Knowledge Bases

 Motivation  Results   People  Publications  Downloads 

Runtime information

AMIE can extract closed horn rules from medium-sized ontologies in a few minutes.

Dataset# of factsThresholdLatest runtimeRules
YAGO2948048Head coverage 0.013.62 minSorted by: Std. Confidence, PCA Confidence
YAGO2948048Support 2 facts4.56 minAll rules
YAGO2 sample46654Support 2 facts5.41sSorted by PCA confidence
YAGO2 with constants948048Head coverage 0.0117.76 minSome interesting examples
DBpedia 2.06704524Head coverage 0.012.89 minRules up to 2 atoms

Knowledge bases

YAGO2

YAGO is a semantic knowledge base derived from Wikipedia, WordNet and GeoNames. The latest version, YAGO2s, contains 120M facts describing properties of 10M different entities. Since the rules output by AMIE are used for prediction, we used the previous version, YAGO2 (released in 2010), to predict facts in YAGO2s. YAGO2 contains 120M facts about 2.6M entities. For both versions of the ontology we did not consider either facts with literal objects or any type of schema information (rdf:type statements, relation signatures and descriptions). For YAGO2s, this is equivalent to use the file yagoFacts with around 4M triples. For YAGO2 we use the file yagocore which contains 948K facts after cleaning. The clean testing versions of [YAGO2] and [YAGO2s] are available for download.

YAGO2 sample

Our experiments included comparisons against state-of-the-art systems which could not handle even our clean version of YAGO2. For this reason, we built a sample of this KB by randomly picking 10K entities and collecting their 3 hops subgraphs. In contrast to a random sample of facts, this method preserves the original graph topology. This procedure resulted in a [47K facts sample].

DBpedia

DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia. The English version of DBpedia contains 1.89 billion facts about 2.45M entities. In the spirit of our data prediction endevours, we mined rules from DBpedia 2.0 to predict facts in the latest version 3.8 (in English). In both cases we used the person data and infoboxes datasets and removed facts with literal objects and rdf:type statements. This produced a clean subset of 6M facts for [DBpedia 2.0] and 12M for [DBpedia 3.8].

Data prediction

Experimental setup

In order to support the suitability of the PCA Confidence metric for prediction of new facts, we carried an experiment which uses the rules mined on YAGO2 (training KB) to predict facts in the newer YAGO2s (target KB). We took all rules mined by AMIE with head coverage threshold 0.01 and ranked them by standard and PCA confidence. Then we took every rule and generated new facts by taking all bindings of the head variables in the body of the rule which are not in the head (sets B, C and D in our mining model). For instance, for the rule ?s <directed> ?o => ?s <created> ?o, we produce predictions of type A <created> B where A and B correspond to bindings of people and films in the <directed> relation (body of the rule) which are not in the <created> relation. This corresponds exactly to the bindings which are beyond the training KB YAGO2.

A fact can be predicted from more than one rule if they share the same head relation. For this reason, we went down in the ranking and for every rule, we removed all predictions that were produced from previous rules. From the remaining predictions, we took a sample of 30 facts and evaluated them automatically in YAGO2s or manually by checking the information in Wikipedia. Automatic evaluation was used if (a) the prediction is in YAGO2s or (b) it violates a functionality constraint (predicting a second death place for a person) in any of the datasets.

Experiments

Downloads