LEILA is a system that can extract pairs of a relation from a set of HTML documents. For example, it can extract pairs of persons with their birthdate, pairs of companies with their headquarters or pairs of entities with their concept. LEILA is part of the YAGO-NAGA project at the Max-Planck Institute for Informatics in Saarbrücken/Germany. LEILA is no longer actively maintained, so there is no more software support for it.
As input, LEILA needs a set of pairs that are in the relation (the examples) and a set of pairs that are not in the relation (the counterexamples). For instance, for the birthdate-relation, the examples could be
Frederic Chopin | 1810 |
Wolfgang Amadeus Mozart | 1756 |
... | ... |
The counterexamples could be
Frederic Chopin | 1980 |
Wolfgang Amadeus Mozart | 2010 |
... | ... |
The examples and counterexamples are given by a Java-method. This means that the counterexamples need not be enumerated. For instance, the Java-method can simply say that any pair of a person that is listed in the examples and a "wrong" birthdate is a counterexample.
LEILA works in 3 phases:
The pattern matching approach is very simple and widely used. Different from previous systems, LEILA uses a deep linguistic analysis of the documents by the Link Grammar Parser. Thus, its patterns are deeper and more robust than simple surface patterns. Furthermore, LEILA bounds its patterns by counterexamples and generalizes them by machine learning.
Please find more information about LEILA in the tabs above.