|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectsofie.parsing.Token
public class Token
Class Token This class is part of the SOFIE system (http://mpii.de/yago-naga/sofie). It is licensed under the Creative Commons Attribution-Noncommercial-Share-Alike 3.0 Unported License (http://creativecommons.org/licenses/by-nc-sa/3.0/) by Fabian M. Suchanek (http://suchanek.name). If you use this class for scientific purposes, please cite Fabian M. Suchanek, Mauro Sozio, Gerhard Weikum "SOFIE: A Self-Organizing Framework for Information Extraction" (International World Wide Web Conference 2009) This class represents a token in a natural language document.
Nested Class Summary | |
---|---|
static class |
Token.AnyName
Superclass for tokens that are considered in patterns |
static class |
Token.CanonicProperName
Disambiguated proper names |
static class |
Token.CommonIndividual
Superclass for common individuals |
static class |
Token.Company
Companies |
static class |
Token.Date
Normalized dates |
static class |
Token.Functional
Tokens that serve purely functional purposes |
static class |
Token.Literal
Literal tokens |
static class |
Token.NameType
Types of a common name |
static class |
Token.Number
Numbers |
static class |
Token.Person
Persons |
static class |
Token.ProperName
Named entities |
static class |
Token.Punctuation
Punctuation tokens |
static class |
Token.Repeat
A token that places the title token left of itself |
static class |
Token.SemiCanonicProperName
Semi-Disambiguated proper names (like wikipedia links) |
static class |
Token.Separator
Tokens that cannot be part of a pattern |
static class |
Token.StartCommentToken
Token that starts a comment (e.g. |
static class |
Token.StopWord
Stop word tokens |
static class |
Token.Title
Token that shall be repeated before repeat tokens |
static class |
Token.URL
URLs |
static class |
Token.USState
US States |
static class |
Token.WikiLink
non-proper-name Wikipedia links |
static class |
Token.Word
Normal word tokens |
Field Summary | |
---|---|
static Token.Repeat |
CATEGORY
A Wikipedia category |
static Token.StartCommentToken |
HTMLCOMMENT
HTML comment start |
static Token |
IGNORE
Token that can be ignored |
static Token.Repeat |
INFOBOXBAR
A new line of a Wikipedia infobox |
static Token.Repeat |
INFOBOXHEAD
Head of a Wikipedia infobox. |
protected java.lang.String |
original
Holds the original word |
static Token.StartCommentToken |
REF
Wikipedia references |
static Token.StartCommentToken |
REVISION
Wikipedia revisions |
static Token.Separator |
SEPARATOR
Generic STOP token |
static Token.StartCommentToken |
STARTARTICLE
Wikipedia Article |
static Token.StartCommentToken |
STARTBRACES
Wikipedia special boxes |
static Token.StartCommentToken |
STARTINFOBOX
Wikipedia Infobox |
static Token.StartCommentToken |
STARTSCRIPT
HTML script start |
protected java.lang.String |
token
Holds the token itself |
Constructor Summary | |
---|---|
Token(java.lang.String s)
Constructs a token |
Method Summary | |
---|---|
int |
compareTo(Token o)
Compares by name and type |
boolean |
equals(java.lang.Object obj)
|
int |
hashCode()
|
boolean |
isInteresting()
Tells whether the token shall be considered as a possible entity |
boolean |
isMultiWord()
TRUE for words containing '_' or ' ' |
java.lang.String |
original()
Returns the orignal word |
boolean |
replaceMe()
Tells whether the token shall be replaced by '@' in a pattern |
java.lang.String |
token()
Returns the token itself |
java.lang.String |
toString()
|
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
protected java.lang.String token
protected java.lang.String original
public static final Token.Separator SEPARATOR
public static final Token.StartCommentToken STARTSCRIPT
public static final Token.StartCommentToken HTMLCOMMENT
public static final Token.StartCommentToken STARTBRACES
public static final Token.StartCommentToken STARTINFOBOX
public static final Token.StartCommentToken STARTARTICLE
public static final Token.StartCommentToken REVISION
public static final Token.StartCommentToken REF
public static final Token IGNORE
public static final Token.Repeat INFOBOXHEAD
public static final Token.Repeat INFOBOXBAR
public static final Token.Repeat CATEGORY
Constructor Detail |
---|
public Token(java.lang.String s)
Method Detail |
---|
public boolean isInteresting()
public boolean replaceMe()
public java.lang.String token()
public java.lang.String original()
public boolean isMultiWord()
public int compareTo(Token o)
compareTo
in interface java.lang.Comparable<Token>
public boolean equals(java.lang.Object obj)
equals
in class java.lang.Object
public int hashCode()
hashCode
in class java.lang.Object
public java.lang.String toString()
toString
in class java.lang.Object
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |