sofie.parsing
Class Token

java.lang.Object
  extended by sofie.parsing.Token
All Implemented Interfaces:
java.lang.Comparable<Token>
Direct Known Subclasses:
Token.AnyName, Token.CommonIndividual, Token.Functional, Token.Literal, Token.Punctuation, Token.Word

public class Token
extends java.lang.Object
implements java.lang.Comparable<Token>

Class Token This class is part of the SOFIE system (http://mpii.de/yago-naga/sofie). It is licensed under the Creative Commons Attribution-Noncommercial-Share-Alike 3.0 Unported License (http://creativecommons.org/licenses/by-nc-sa/3.0/) by Fabian M. Suchanek (http://suchanek.name). If you use this class for scientific purposes, please cite Fabian M. Suchanek, Mauro Sozio, Gerhard Weikum "SOFIE: A Self-Organizing Framework for Information Extraction" (International World Wide Web Conference 2009) This class represents a token in a natural language document.


Nested Class Summary
static class Token.AnyName
          Superclass for tokens that are considered in patterns
static class Token.CanonicProperName
          Disambiguated proper names
static class Token.CommonIndividual
          Superclass for common individuals
static class Token.Company
          Companies
static class Token.Date
          Normalized dates
static class Token.Functional
          Tokens that serve purely functional purposes
static class Token.Literal
          Literal tokens
static class Token.NameType
          Types of a common name
static class Token.Number
          Numbers
static class Token.Person
          Persons
static class Token.ProperName
          Named entities
static class Token.Punctuation
          Punctuation tokens
static class Token.Repeat
          A token that places the title token left of itself
static class Token.SemiCanonicProperName
          Semi-Disambiguated proper names (like wikipedia links)
static class Token.Separator
          Tokens that cannot be part of a pattern
static class Token.StartCommentToken
          Token that starts a comment (e.g.
static class Token.StopWord
          Stop word tokens
static class Token.Title
          Token that shall be repeated before repeat tokens
static class Token.URL
          URLs
static class Token.USState
          US States
static class Token.WikiLink
          non-proper-name Wikipedia links
static class Token.Word
          Normal word tokens
 
Field Summary
static Token.Repeat CATEGORY
          A Wikipedia category
static Token.StartCommentToken HTMLCOMMENT
          HTML comment start
static Token IGNORE
          Token that can be ignored
static Token.Repeat INFOBOXBAR
          A new line of a Wikipedia infobox
static Token.Repeat INFOBOXHEAD
          Head of a Wikipedia infobox.
protected  java.lang.String original
          Holds the original word
static Token.StartCommentToken REF
          Wikipedia references
static Token.StartCommentToken REVISION
          Wikipedia revisions
static Token.Separator SEPARATOR
          Generic STOP token
static Token.StartCommentToken STARTARTICLE
          Wikipedia Article
static Token.StartCommentToken STARTBRACES
          Wikipedia special boxes
static Token.StartCommentToken STARTINFOBOX
          Wikipedia Infobox
static Token.StartCommentToken STARTSCRIPT
          HTML script start
protected  java.lang.String token
          Holds the token itself
 
Constructor Summary
Token(java.lang.String s)
          Constructs a token
 
Method Summary
 int compareTo(Token o)
          Compares by name and type
 boolean equals(java.lang.Object obj)
           
 int hashCode()
           
 boolean isInteresting()
          Tells whether the token shall be considered as a possible entity
 boolean isMultiWord()
          TRUE for words containing '_' or ' '
 java.lang.String original()
          Returns the orignal word
 boolean replaceMe()
          Tells whether the token shall be replaced by '@' in a pattern
 java.lang.String token()
          Returns the token itself
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

token

protected java.lang.String token
Holds the token itself


original

protected java.lang.String original
Holds the original word


SEPARATOR

public static final Token.Separator SEPARATOR
Generic STOP token


STARTSCRIPT

public static final Token.StartCommentToken STARTSCRIPT
HTML script start


HTMLCOMMENT

public static final Token.StartCommentToken HTMLCOMMENT
HTML comment start


STARTBRACES

public static final Token.StartCommentToken STARTBRACES
Wikipedia special boxes


STARTINFOBOX

public static final Token.StartCommentToken STARTINFOBOX
Wikipedia Infobox


STARTARTICLE

public static final Token.StartCommentToken STARTARTICLE
Wikipedia Article


REVISION

public static final Token.StartCommentToken REVISION
Wikipedia revisions


REF

public static final Token.StartCommentToken REF
Wikipedia references


IGNORE

public static final Token IGNORE
Token that can be ignored


INFOBOXHEAD

public static final Token.Repeat INFOBOXHEAD
Head of a Wikipedia infobox. Is a repeat token, so that the infobox type can become a pattern


INFOBOXBAR

public static final Token.Repeat INFOBOXBAR
A new line of a Wikipedia infobox


CATEGORY

public static final Token.Repeat CATEGORY
A Wikipedia category

Constructor Detail

Token

public Token(java.lang.String s)
Constructs a token

Method Detail

isInteresting

public boolean isInteresting()
Tells whether the token shall be considered as a possible entity


replaceMe

public boolean replaceMe()
Tells whether the token shall be replaced by '@' in a pattern


token

public java.lang.String token()
Returns the token itself


original

public java.lang.String original()
Returns the orignal word


isMultiWord

public boolean isMultiWord()
TRUE for words containing '_' or ' '


compareTo

public int compareTo(Token o)
Compares by name and type

Specified by:
compareTo in interface java.lang.Comparable<Token>

equals

public boolean equals(java.lang.Object obj)
Overrides:
equals in class java.lang.Object

hashCode

public int hashCode()
Overrides:
hashCode in class java.lang.Object

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object