javatools.database
Class WordNet

java.lang.Object
  extended by javatools.database.WordNet
All Implemented Interfaces:
java.io.Serializable

public class WordNet
extends java.lang.Object
implements java.io.Serializable

This class is part of the Java Tools (see http://mpii.de/yago-naga/javatools). It is licensed under the Creative Commons Attribution License (see http://creativecommons.org/licenses/by/3.0) by the YAGO-NAGA team (see http://mpii.de/yago-naga). This class provides a wrapping for WordNet. Each instance of this class can wrap one relation. For example, an instance can wrap Mereonymy, another one can wrap Hyponymy etc.. To create an instance for the relation X, you need the files wn_s.pl and wn_X.pl. For example, for Hyponymy, you need the files wn_s.pl and wn_hyp.pl. These files can be downloaded from WordNet's site. They are part of the Prolog version of WordNet. All strings are being normalized (see Char.java).
Example:

   // WordNet synset definitions
   File synsetDef=new File("wn_s.pl");
   // Choose hyponomy
   File relationDef=new File("wn_hyp.pl");
   // Choose to store only nouns and verbs
   EnumSet types
   =EnumSet.of(WordNet.WordType.NOUN,WordNet.WordType.VERB);
   // Choose to store only two senses per word
   int sensesPerWord=2;
   WordNet w=new WordNet(synsetDef,relationDef,types,sensesPerWord);
   for(Synset s : w.synsetsFor("mouse")) {
   D.p(s);              // Print synset
   D.p(" "+s.getUps()); // Print direct supersynsets
   }
 -->
   Synset #102244530 (NOUN): [mouse, ]
   [Synset #102243671 (NOUN): [rodent, gnawer, gnawing_animal, ]]
   Synset #103651364 (NOUN): [mouse, computer_mouse, ]
   [Synset #103158939 (NOUN): [electronic_device, ]]
   Synset #201175362 (VERB): [mouse, ]
   [Synset #201174946 (VERB): [manipulate, ]]
   Synset #201856050 (VERB): [sneak, mouse, creep, pussyfoot, ]
   [Synset #201849285 (VERB): [walk, ]]
 
Note that if you load only n senses per word, there may be synsets in the WordNet instance that do not have words!

See Also:
Serialized Form

Nested Class Summary
static class WordNet.Synset
          Represents a WordNet synset
static class WordNet.WordType
          Types of words in Wordnet
 
Field Summary
static int CLASSGROUP
           
static int DOWNGROUP
           
static int IDGROUP
           
static java.util.regex.Pattern RELATIONPATTERN
          Pattern for relation definitions
static int SENSENUMGROUP
           
static java.util.regex.Pattern SYNSETPATTERN
          Pattern for synset definitions
static int UPGROUP
           
 java.util.Map<java.lang.String,java.util.List<WordNet.Synset>> word2synsets
          Maps words to synsets
static int WORDGROUP
           
 
Constructor Summary
WordNet(java.io.File wn_s, java.util.EnumSet<WordNet.WordType> lextypes, int sensesPerWord)
          Constructor with no relation (only synsets)
WordNet(java.io.File wn_s, java.io.File relation, java.util.EnumSet<WordNet.WordType> lextypes, int sensesPerWord)
          Constructor (main constructor)
WordNet(java.io.File wn_s, java.io.File relation, WordNet.WordType lextype, int sensesPerWord)
          Constructor
WordNet(java.io.File wn_s, WordNet.WordType lextype, int sensesPerWord)
          Constructor with no relation (only synsets)
 
Method Summary
 int ancestor(WordNet.Synset s1, WordNet.Synset s2)
          Returns the distance in the hierarchy upwards from the first node to the second, -1 in case of failure
 int distance(WordNet.Synset s1, WordNet.Synset s2)
          Returns the length of s1->NCA->s2, -1 in case of failure
 java.util.Map<java.lang.Integer,WordNet.Synset> getId2SynsetMap()
          Returns the map from ids to synsets
 WordNet.Synset getSynset(int id)
          Returns a synset for a given id
 java.util.Collection<WordNet.Synset> getSynsets()
          Compiles the set of all Synsets
static void main(java.lang.String[] argv)
          Test routine, requires the Prolog version of WordNet and the paths adjusted.
 WordNet.Synset nca(WordNet.Synset s1, WordNet.Synset s2)
          Returns the nearest common ancestor of two synsets.
 WordNet.Synset nca(WordNet.Synset source, WordNet.Synset destination, int[] dist1, int[] dist2)
          Returns the nearest common ancestor of two synsets.
 int numSynsets()
          Returns the number of synsets
 void remove(WordNet.Synset s)
          Removes a synset
 WordNet.Synset synsetFor(java.lang.String word, java.lang.String otherWord)
          Returns the synset that contain two words
 java.util.List<WordNet.Synset> synsetsFor(java.lang.String s)
          Returns the list of synsets that contain a word
 java.lang.String toString()
          Returns a short String description
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

SYNSETPATTERN

public static java.util.regex.Pattern SYNSETPATTERN
Pattern for synset definitions


IDGROUP

public static final int IDGROUP
See Also:
Constant Field Values

WORDGROUP

public static final int WORDGROUP
See Also:
Constant Field Values

CLASSGROUP

public static final int CLASSGROUP
See Also:
Constant Field Values

SENSENUMGROUP

public static final int SENSENUMGROUP
See Also:
Constant Field Values

RELATIONPATTERN

public static java.util.regex.Pattern RELATIONPATTERN
Pattern for relation definitions


DOWNGROUP

public static final int DOWNGROUP
See Also:
Constant Field Values

UPGROUP

public static final int UPGROUP
See Also:
Constant Field Values

word2synsets

public java.util.Map<java.lang.String,java.util.List<WordNet.Synset>> word2synsets
Maps words to synsets

Constructor Detail

WordNet

public WordNet(java.io.File wn_s,
               java.util.EnumSet<WordNet.WordType> lextypes,
               int sensesPerWord)
        throws java.io.IOException
Constructor with no relation (only synsets)

Throws:
java.io.IOException

WordNet

public WordNet(java.io.File wn_s,
               WordNet.WordType lextype,
               int sensesPerWord)
        throws java.io.IOException
Constructor with no relation (only synsets)

Throws:
java.io.IOException

WordNet

public WordNet(java.io.File wn_s,
               java.io.File relation,
               WordNet.WordType lextype,
               int sensesPerWord)
        throws java.io.IOException
Constructor

Throws:
java.io.IOException

WordNet

public WordNet(java.io.File wn_s,
               java.io.File relation,
               java.util.EnumSet<WordNet.WordType> lextypes,
               int sensesPerWord)
        throws java.io.IOException
Constructor (main constructor)

Throws:
java.io.IOException
Method Detail

getId2SynsetMap

public java.util.Map<java.lang.Integer,WordNet.Synset> getId2SynsetMap()
Returns the map from ids to synsets


getSynset

public WordNet.Synset getSynset(int id)
Returns a synset for a given id


getSynsets

public java.util.Collection<WordNet.Synset> getSynsets()
Compiles the set of all Synsets


synsetsFor

public java.util.List<WordNet.Synset> synsetsFor(java.lang.String s)
Returns the list of synsets that contain a word


numSynsets

public int numSynsets()
Returns the number of synsets


synsetFor

public WordNet.Synset synsetFor(java.lang.String word,
                                java.lang.String otherWord)
Returns the synset that contain two words


toString

public java.lang.String toString()
Returns a short String description

Overrides:
toString in class java.lang.Object

nca

public WordNet.Synset nca(WordNet.Synset source,
                          WordNet.Synset destination,
                          int[] dist1,
                          int[] dist2)
Returns the nearest common ancestor of two synsets. The NCA (nearest common ancestor) is the ancestor node for both synsets that has the smallest distance (number of edges) to them. This need not be the lowest common ancestor! Returns the distance source->NCA in dist1[0] and the distance destination->NCA in dist2[0]. In case of failure, null is returned and dist1[0]=dist2[0]=-1.


nca

public WordNet.Synset nca(WordNet.Synset s1,
                          WordNet.Synset s2)
Returns the nearest common ancestor of two synsets. The NCA (nearest common ancestor) is the ancestor node for both synsets that has the smallest distance (number of edges) to them. This need not be the lowest common ancestor!


distance

public int distance(WordNet.Synset s1,
                    WordNet.Synset s2)
Returns the length of s1->NCA->s2, -1 in case of failure


ancestor

public int ancestor(WordNet.Synset s1,
                    WordNet.Synset s2)
Returns the distance in the hierarchy upwards from the first node to the second, -1 in case of failure


remove

public void remove(WordNet.Synset s)
Removes a synset


main

public static void main(java.lang.String[] argv)
                 throws java.lang.Exception
Test routine, requires the Prolog version of WordNet and the paths adjusted.

Throws:
java.lang.Exception