|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object javatools.parsers.PlingStemmer
public class PlingStemmer
This class is part of the Java Tools (see http://mpii.de/yago-naga/javatools). It is licensed under the Creative Commons Attribution License (see http://creativecommons.org/licenses/by/3.0) by the YAGO-NAGA team (see http://mpii.de/yago-naga). The PlingStemmer stems an English noun (plural or singular) to its singular form. It deals with "firemen"->"fireman", it knows Greek stuff like "appendices"->"appendix" and yes, it was a lot of work to compile these exceptions. Examples:
System.out.println(PlingStemmer.stem("boy")); ----> boy System.out.println(PlingStemmer.stem("boys")); ----> boy System.out.println(PlingStemmer.stem("biophysics")); ----> biophysics System.out.println(PlingStemmer.stem("automata")); ----> automaton System.out.println(PlingStemmer.stem("genus")); ----> genus System.out.println(PlingStemmer.stem("emus")); ----> emu
There are a number of word forms that can either be plural or singular. Examples include "physics" (the science or the plural of "physic" (the medicine)), "quarters" (the housing or the plural of "quarter" (1/4)) or "people" (the singular of "peoples" or the plural of "person"). In these cases, the stemmer assumes the word is a plural form and returns the singular form. The methods isPlural, isSingular and isPluralAndSingular can be used to differentiate the cases.
It cannot be guaranteed that the stemmer correctly stems a plural word or correctly ignores a singular word -- let alone that it treats an ambiguous word form in the way expected by the user.
The PlingStemmer uses material from WordNet.
It requires the class FinalSet from the Java Tools.
Field Summary | |
---|---|
static java.util.Set<java.lang.String> |
category00
Words that do not have a distinct plural form (like "atlas" etc.) |
static java.util.Set<java.lang.String> |
categoryCHE_CHES
Words that change from "-che" to "-ches" (like "brioche" etc.), listed in their plural forms |
static java.util.Set<java.lang.String> |
categoryEX_ICES
Words that change from "-ex" to "-ices" (like "index" etc.), listed in their plural forms |
static java.util.Set<java.lang.String> |
categoryICS
Words that end with "-ics" and do not exist as nouns without the 's' (like "aerobics" etc.) |
static java.util.Set<java.lang.String> |
categoryIE_IES
Words that change from "-ie" to "-ies" (like "auntie" etc.), listed in their plural forms |
static java.util.Set<java.lang.String> |
categoryIS_ES
Words that change from "-is" to "-es" (like "axis" etc.), listed in their plural forms |
static java.util.Set<java.lang.String> |
categoryIX_ICES
Words that change from "-ix" to "-ices" (like "appendix" etc.), listed in their plural forms |
static java.util.Set<java.lang.String> |
categoryO_I
Words that change from "-o" to "-i" (like "libretto" etc.), listed in their plural forms |
static java.util.Set<java.lang.String> |
categoryOE_OES
Words that change from "-oe" to "-oes" (like "toe" etc.), listed in their plural forms |
static java.util.Set<java.lang.String> |
categoryON_A
Words that change from "-on" to "-a" (like "phenomenon" etc.), listed in their plural forms |
static java.util.Set<java.lang.String> |
categorySE_SES
Words that end in "-se" in their plural forms (like "nurse" etc.) |
static java.util.Set<java.lang.String> |
categorySSE_SSES
Words that change from "-sse" to "-sses" (like "finesse" etc.), listed in their plural forms |
static java.util.Set<java.lang.String> |
categoryU_US
Words that change from "-u" to "-us" (like "emu" etc.), listed in their plural forms |
static java.util.Set<java.lang.String> |
categoryUM_A
Words that change from "-um" to "-a" (like "curriculum" etc.), listed in their plural forms |
static java.util.Set<java.lang.String> |
categoryUS_I
Words that change from "-us" to "-i" (like "fungus" etc.), listed in their plural forms |
static java.util.Map<java.lang.String,java.lang.String> |
irregular
Maps irregular Germanic English plural nouns to their singular form |
static java.util.Set<java.lang.String> |
singAndPlur
Contains word forms that can either be plural or singular |
Constructor Summary | |
---|---|
PlingStemmer()
|
Method Summary | |
---|---|
static java.lang.String |
cut(java.lang.String s,
java.lang.String suffix)
Cuts a suffix from a string (that is the number of chars given by the suffix) |
static boolean |
isPlural(java.lang.String s)
Tells whether a word form is plural. |
static boolean |
isSingular(java.lang.String s)
Tells whether a word form is singular. |
static boolean |
isSingularAndPlural(java.lang.String s)
Tells whether a word form is the singular form of one word and at the same time the plural form of another. |
static void |
main(java.lang.String[] argv)
Test routine |
static boolean |
noLatin(java.lang.String s)
Returns true if a word is probably not Latin |
static java.lang.String |
stem(java.lang.String s)
Stems an English noun |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static java.util.Set<java.lang.String> categorySE_SES
public static java.util.Set<java.lang.String> category00
public static java.util.Set<java.lang.String> categoryUM_A
public static java.util.Set<java.lang.String> categoryON_A
public static java.util.Set<java.lang.String> categoryO_I
public static java.util.Set<java.lang.String> categoryUS_I
public static java.util.Set<java.lang.String> categoryIX_ICES
public static java.util.Set<java.lang.String> categoryIS_ES
public static java.util.Set<java.lang.String> categoryOE_OES
public static java.util.Set<java.lang.String> categoryEX_ICES
public static java.util.Set<java.lang.String> categoryU_US
public static java.util.Set<java.lang.String> categorySSE_SSES
public static java.util.Set<java.lang.String> categoryCHE_CHES
public static java.util.Set<java.lang.String> categoryICS
public static java.util.Set<java.lang.String> categoryIE_IES
public static java.util.Map<java.lang.String,java.lang.String> irregular
public static java.util.Set<java.lang.String> singAndPlur
Constructor Detail |
---|
public PlingStemmer()
Method Detail |
---|
public static boolean isPlural(java.lang.String s)
public static boolean isSingular(java.lang.String s)
public static boolean isSingularAndPlural(java.lang.String s)
public static java.lang.String cut(java.lang.String s, java.lang.String suffix)
public static boolean noLatin(java.lang.String s)
public static java.lang.String stem(java.lang.String s)
public static void main(java.lang.String[] argv) throws java.lang.Exception
java.lang.Exception
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |