LEILA: Learning to Extract Information by Linguistic Analysis

Research Downloads Corpora Publications People

What LEILA is

LEILA is a system that can extract pairs of a relation from a set of HTML documents. For example, it can extract pairs of persons with their birthdate, pairs of companies with their headquarters or pairs of entities with their concept. LEILA is part of the YAGO-NAGA project at the Max-Planck Institute for Informatics in Saarbrücken/Germany. LEILA is no longer actively maintained, so there is no more software support for it.

What LEILA needs

As input, LEILA needs a set of pairs that are in the relation (the examples) and a set of pairs that are not in the relation (the counterexamples). For instance, for the birthdate-relation, the examples could be

Frederic Chopin	1810
Wolfgang Amadeus Mozart	1756
...	...

The counterexamples could be

Frederic Chopin	1980
Wolfgang Amadeus Mozart	2010
...	...

The examples and counterexamples are given by a Java-method. This means that the counterexamples need not be enumerated. For instance, the Java-method can simply say that any pair of a person that is listed in the examples and a "wrong" birthdate is a counterexample.

How LEILA works

LEILA works in 3 phases:

It finds all sentences in which an example pair appears. It collects the pattern in which the example pair appears (the positive patterns). For instance, for the following sentence "Chopin was born in 1810", it would extract the pattern "X was born in Y". Then, LEILA runs again through the documents and finds all sentences in which a positive pattern matches (possibly approximately), but a counterexample stands in the place of X and Y. The corresponding pattern is collected as a negative pattern.
LEILA generalizes the positive patterns by machine learning techniques.
LEILA runs through the documents again and finds all sentences, in which a generalized positive pattern matches. It proposes the words in place of X and Y as a new pair. For example, if it finds the sentence "Vivaldi was born in 1678", it will propose the pair Vivaldi/1678.

What is special about LEILA

The pattern matching approach is very simple and widely used. Different from previous systems, LEILA uses a deep linguistic analysis of the documents by the Link Grammar Parser. Thus, its patterns are deeper and more robust than simple surface patterns. Furthermore, LEILA bounds its patterns by counterexamples and generalizes them by machine learning.

Please find more information about LEILA in the tabs above.