|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object leila.parsing.HTML2LGI
public class HTML2LGI
This class is part of LEILA (http://mpii.de/yago-naga/leila). It is licensed under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0) by the author Fabian M. Suchanek (http://suchanek.name).
HTML2LGI takes HTML-files and produces LGI-files (Link Grammar Input files) and LL-files (Linear Linkage files, non-grammatical part).
Field Summary | |
---|---|
static |
abbreviations
Abbreviations |
static |
emphEnd
HTML-tags that finish the emphasis of a string |
static |
emphStart
HTML-tags that emphasize a string |
static java.lang.String |
ignoreChars
Characters to be ignored |
static |
ignores
HTML-tags to be ignored |
protected static HTMLReader |
in
Readers and Writers |
protected static java.io.Writer |
lgiout
|
protected static java.io.Writer |
llout
|
static int |
MAXJOIN
Maximal length of a word, must be <60, which is the maximum that the link grammar parser can swallow |
protected static int |
MAXLLLENGTH
Maximal linear linkage length (in chars) |
protected static int |
MINLGILENGTH
Minimal sentence length (in chars) |
protected static int |
MINLLLENGTH
Minimal linear linkage length (in chars) |
static |
skips
HTML-tags to be skipped |
static java.lang.String |
stopChars
Characters that count as sentence delimiters |
static |
stops
HTML-tags that count as sentence delimiters |
Constructor Summary | |
---|---|
HTML2LGI()
|
Method Summary | |
---|---|
protected static void |
flush(java.lang.String headline,
java.lang.StringBuilder s)
Flushes data, resets s |
protected static void |
flushLGI(java.lang.String s)
Flushes data to LGIfile |
protected static void |
flushLL(java.lang.String s)
Flushes data to LLfile |
static void |
main(java.io.File f)
Translates HTML-files to a LGI-file (call by Java) |
static void |
main(java.lang.String[] argv)
Translates HTML-files to a LGI-file (call by User) |
protected static void |
parseFile(java.io.File f)
Parses an HTML-file |
protected static void |
parseText(java.lang.String headline)
Parses a text under a given headline |
protected static java.lang.StringBuilder |
readSentence(java.lang.String[] delim)
Returns a sequence of unproblematic characters and its delimiter |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int MAXJOIN
protected static HTMLReader in
protected static java.io.Writer lgiout
protected static java.io.Writer llout
protected static final int MINLGILENGTH
protected static final int MINLLLENGTH
protected static final int MAXLLLENGTH
public staticabbreviations
public static final java.lang.String ignoreChars
public staticignores
public staticskips
public static final java.lang.String stopChars
public staticstops
public static finalemphStart
public static finalemphEnd
Constructor Detail |
---|
public HTML2LGI()
Method Detail |
---|
protected static void flushLGI(java.lang.String s) throws java.lang.Exception
java.lang.Exception
protected static void flushLL(java.lang.String s) throws java.lang.Exception
java.lang.Exception
protected static void flush(java.lang.String headline, java.lang.StringBuilder s) throws java.lang.Exception
java.lang.Exception
protected static java.lang.StringBuilder readSentence(java.lang.String[] delim) throws java.lang.Exception
java.lang.Exception
protected static void parseText(java.lang.String headline) throws java.lang.Exception
java.lang.Exception
protected static void parseFile(java.io.File f) throws java.lang.Exception
java.lang.Exception
public static void main(java.io.File f) throws java.lang.Exception
java.lang.Exception
public static void main(java.lang.String[] argv) throws java.lang.Exception
java.lang.Exception
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |