Decoration
max planck institut
informatik
mpii logo Minerva of the Max Planck Society
 

LEILA: Learning to Extract Information by Linguistic Analysis

 Research   Downloads   Corpora   Publications   People 

 

Downloads

 

How to use LEILA

  1. Download the Java tools
  2. Download the Link Grammar Parser
  3. Download some recent version of Java (1.5+) if you don't have it
  4. Download the Java-source, the class-files and the documentation of LEILA here
  5. Run Leila.class. LEILA will tell you how to set it up.
 

How data flows in LEILA

The flow of data with LEILA is as follows:

    Corpus      ->   Proper sentences  ->  Parsed sentences  ->  Model   ->  Output Pairs
    documents        (*.LGI)               (*.LGO)               (*.MDL)     (*.TXT)
    (*.HTML)                                     '------------------------->

       '---HTML2LGI.java--'  '----LGParse.java---' '--Train.java---'  '---Test.java---'

       '--------------------------Leila.java------------------------------------------'
 

The corpus can be any set of text or HTML documents. These documents can be spread across different folders or subfolders. The class HTML2LGI.java extracts the proper sentences from from the corpus documents. Each document generates one LGI file containing the sentences. These LGI-files are given to the Link Grammar Parser (called by LGParse.java), which produces parse trees for the sentences. Each LGI-file generates one LGO-file containing the parse trees. The class Train.java tries to find patterns for the target relation in the LGO-files. It generalizes these patterns and stores them as a model in a MDL-file. The class Test.java applies the model to extract output pairs for the target relation from the LGO-files. It stores them in one large plain text file. All of these steps are done automatically in the right order by Leila.java.

Train.java must know the target relation. The target relation is given by a function that decides whether a pair of words is an example, a counterexample or a candidate for the relation. This function should be implemented in a class that extends Relation.java. To LEILA, it does not matter how the function actually works internally. The most common way is to load a list of example pairs from a text file. To decide whether a pair of words is an example pair, the function can just check whether the pair is in the list. Often, the counterexamples need not be present in a list, but they can be deduced algorithmically on the fly. See the experimental section of "LEILA: Learning to Extract Information by Linguistic Analysis" (pdf, ppt, bib) for examples.

 

Existing relations in LEILA

The following relations ship with LEILA: