javatools.parsers
Class PlingStemmer

java.lang.Object
  extended by javatools.parsers.PlingStemmer

public class PlingStemmer
extends java.lang.Object

This class is part of the Java Tools (see http://mpii.de/yago-naga/javatools). It is licensed under the Creative Commons Attribution License (see http://creativecommons.org/licenses/by/3.0) by the YAGO-NAGA team (see http://mpii.de/yago-naga). The PlingStemmer stems an English noun (plural or singular) to its singular form. It deals with "firemen"->"fireman", it knows Greek stuff like "appendices"->"appendix" and yes, it was a lot of work to compile these exceptions. Examples:

      System.out.println(PlingStemmer.stem("boy"));
      ----> boy
      System.out.println(PlingStemmer.stem("boys"));
      ----> boy
      System.out.println(PlingStemmer.stem("biophysics"));
      ---->  biophysics
      System.out.println(PlingStemmer.stem("automata"));
      ----> automaton
      System.out.println(PlingStemmer.stem("genus"));
      ----> genus
      System.out.println(PlingStemmer.stem("emus"));
      ----> emu
  

There are a number of word forms that can either be plural or singular. Examples include "physics" (the science or the plural of "physic" (the medicine)), "quarters" (the housing or the plural of "quarter" (1/4)) or "people" (the singular of "peoples" or the plural of "person"). In these cases, the stemmer assumes the word is a plural form and returns the singular form. The methods isPlural, isSingular and isPluralAndSingular can be used to differentiate the cases.

It cannot be guaranteed that the stemmer correctly stems a plural word or correctly ignores a singular word -- let alone that it treats an ambiguous word form in the way expected by the user.

The PlingStemmer uses material from WordNet.

It requires the class FinalSet from the Java Tools.


Field Summary
static java.util.Set<java.lang.String> category00
          Words that do not have a distinct plural form (like "atlas" etc.)
static java.util.Set<java.lang.String> categoryCHE_CHES
          Words that change from "-che" to "-ches" (like "brioche" etc.), listed in their plural forms
static java.util.Set<java.lang.String> categoryEX_ICES
          Words that change from "-ex" to "-ices" (like "index" etc.), listed in their plural forms
static java.util.Set<java.lang.String> categoryICS
          Words that end with "-ics" and do not exist as nouns without the 's' (like "aerobics" etc.)
static java.util.Set<java.lang.String> categoryIE_IES
          Words that change from "-ie" to "-ies" (like "auntie" etc.), listed in their plural forms
static java.util.Set<java.lang.String> categoryIS_ES
          Words that change from "-is" to "-es" (like "axis" etc.), listed in their plural forms
static java.util.Set<java.lang.String> categoryIX_ICES
          Words that change from "-ix" to "-ices" (like "appendix" etc.), listed in their plural forms
static java.util.Set<java.lang.String> categoryO_I
          Words that change from "-o" to "-i" (like "libretto" etc.), listed in their plural forms
static java.util.Set<java.lang.String> categoryOE_OES
          Words that change from "-oe" to "-oes" (like "toe" etc.), listed in their plural forms
static java.util.Set<java.lang.String> categoryON_A
          Words that change from "-on" to "-a" (like "phenomenon" etc.), listed in their plural forms
static java.util.Set<java.lang.String> categorySE_SES
          Words that end in "-se" in their plural forms (like "nurse" etc.)
static java.util.Set<java.lang.String> categorySSE_SSES
          Words that change from "-sse" to "-sses" (like "finesse" etc.), listed in their plural forms
static java.util.Set<java.lang.String> categoryU_US
          Words that change from "-u" to "-us" (like "emu" etc.), listed in their plural forms
static java.util.Set<java.lang.String> categoryUM_A
          Words that change from "-um" to "-a" (like "curriculum" etc.), listed in their plural forms
static java.util.Set<java.lang.String> categoryUS_I
          Words that change from "-us" to "-i" (like "fungus" etc.), listed in their plural forms
static java.util.Map<java.lang.String,java.lang.String> irregular
          Maps irregular Germanic English plural nouns to their singular form
static java.util.Set<java.lang.String> singAndPlur
          Contains word forms that can either be plural or singular
 
Constructor Summary
PlingStemmer()
           
 
Method Summary
static java.lang.String cut(java.lang.String s, java.lang.String suffix)
          Cuts a suffix from a string (that is the number of chars given by the suffix)
static boolean isPlural(java.lang.String s)
          Tells whether a word form is plural.
static boolean isSingular(java.lang.String s)
          Tells whether a word form is singular.
static boolean isSingularAndPlural(java.lang.String s)
          Tells whether a word form is the singular form of one word and at the same time the plural form of another.
static void main(java.lang.String[] argv)
          Test routine
static boolean noLatin(java.lang.String s)
          Returns true if a word is probably not Latin
static java.lang.String stem(java.lang.String s)
          Stems an English noun
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

categorySE_SES

public static java.util.Set<java.lang.String> categorySE_SES
Words that end in "-se" in their plural forms (like "nurse" etc.)


category00

public static java.util.Set<java.lang.String> category00
Words that do not have a distinct plural form (like "atlas" etc.)


categoryUM_A

public static java.util.Set<java.lang.String> categoryUM_A
Words that change from "-um" to "-a" (like "curriculum" etc.), listed in their plural forms


categoryON_A

public static java.util.Set<java.lang.String> categoryON_A
Words that change from "-on" to "-a" (like "phenomenon" etc.), listed in their plural forms


categoryO_I

public static java.util.Set<java.lang.String> categoryO_I
Words that change from "-o" to "-i" (like "libretto" etc.), listed in their plural forms


categoryUS_I

public static java.util.Set<java.lang.String> categoryUS_I
Words that change from "-us" to "-i" (like "fungus" etc.), listed in their plural forms


categoryIX_ICES

public static java.util.Set<java.lang.String> categoryIX_ICES
Words that change from "-ix" to "-ices" (like "appendix" etc.), listed in their plural forms


categoryIS_ES

public static java.util.Set<java.lang.String> categoryIS_ES
Words that change from "-is" to "-es" (like "axis" etc.), listed in their plural forms


categoryOE_OES

public static java.util.Set<java.lang.String> categoryOE_OES
Words that change from "-oe" to "-oes" (like "toe" etc.), listed in their plural forms


categoryEX_ICES

public static java.util.Set<java.lang.String> categoryEX_ICES
Words that change from "-ex" to "-ices" (like "index" etc.), listed in their plural forms


categoryU_US

public static java.util.Set<java.lang.String> categoryU_US
Words that change from "-u" to "-us" (like "emu" etc.), listed in their plural forms


categorySSE_SSES

public static java.util.Set<java.lang.String> categorySSE_SSES
Words that change from "-sse" to "-sses" (like "finesse" etc.), listed in their plural forms


categoryCHE_CHES

public static java.util.Set<java.lang.String> categoryCHE_CHES
Words that change from "-che" to "-ches" (like "brioche" etc.), listed in their plural forms


categoryICS

public static java.util.Set<java.lang.String> categoryICS
Words that end with "-ics" and do not exist as nouns without the 's' (like "aerobics" etc.)


categoryIE_IES

public static java.util.Set<java.lang.String> categoryIE_IES
Words that change from "-ie" to "-ies" (like "auntie" etc.), listed in their plural forms


irregular

public static java.util.Map<java.lang.String,java.lang.String> irregular
Maps irregular Germanic English plural nouns to their singular form


singAndPlur

public static java.util.Set<java.lang.String> singAndPlur
Contains word forms that can either be plural or singular

Constructor Detail

PlingStemmer

public PlingStemmer()
Method Detail

isPlural

public static boolean isPlural(java.lang.String s)
Tells whether a word form is plural. This method just checks whether the stem method alters the word


isSingular

public static boolean isSingular(java.lang.String s)
Tells whether a word form is singular. Note that a word can be both plural and singular


isSingularAndPlural

public static boolean isSingularAndPlural(java.lang.String s)
Tells whether a word form is the singular form of one word and at the same time the plural form of another.


cut

public static java.lang.String cut(java.lang.String s,
                                   java.lang.String suffix)
Cuts a suffix from a string (that is the number of chars given by the suffix)


noLatin

public static boolean noLatin(java.lang.String s)
Returns true if a word is probably not Latin


stem

public static java.lang.String stem(java.lang.String s)
Stems an English noun


main

public static void main(java.lang.String[] argv)
                 throws java.lang.Exception
Test routine

Throws:
java.lang.Exception