java.text
Class BreakDictionary

java.lang.Object
  extended byjava.text.BreakDictionary

class BreakDictionary
extends Object

This is the class that represents the list of known words used by DictionaryBasedBreakIterator. The conceptual data structure used here is a trie: there is a node hanging off the root node for every letter that can start a word. Each of these nodes has a node hanging off of it for every letter that can be the second letter of a word if this node is the first letter, and so on. The trie is represented as a two-dimensional array that can be treated as a table of state transitions. Indexes are used to compress this array, taking advantage of the fact that this array will always be very sparse.


Field Summary
private  sun.text.CompactByteArray columnMap
          Maps from characters to column numbers.
private  int numColGroups
          Columns are organized into groups of 32.
private  int numCols
          The number of actual columns in the table
private  char[] reverseColumnMap
          A map used to go from column numbers to characters.
private  short[] rowIndex
          This index maps logical row numbers to physical row numbers
private  int[] rowIndexFlags
          A bitmap is used to tell which cells in the comceptual table are populated.
private  short[] rowIndexFlagsIndex
          This index maps from a logical row number into the bitmap table above.
private  byte[] rowIndexShifts
          For each logical row, this index contains a constant that is added to the logical column number to get the physical column number
private static int supportedVersion
          The version of the dictionary that was read in.
private  short[] table
          The actual compressed state table.
private  int version
           
 
Constructor Summary
BreakDictionary(InputStream dictionaryStream)
           
 
Method Summary
 short at(int row, char ch)
          Uses the column map to map the character to a column number, then passes the row and column number to the other version of at()
 short at(int row, int col)
          Returns the value in the cell with the specified (logical) row and column numbers.
private  boolean cellIsPopulated(int row, int col)
          Given (logical) row and column numbers, returns true if the cell in that position is populated
private  short internalAt(int row, int col)
          Implementation of at() when we know the specified cell is populated.
static void main(String[] args)
           
 void printWordList(String partialWord, int state, PrintWriter out)
           
 void readDictionaryFile(DataInputStream in)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

reverseColumnMap

private char[] reverseColumnMap
A map used to go from column numbers to characters. Used only for debugging right now.


supportedVersion

private static int supportedVersion
The version of the dictionary that was read in.


version

private int version

columnMap

private sun.text.CompactByteArray columnMap
Maps from characters to column numbers. The main use of this is to avoid making room in the array for empty columns.


numCols

private int numCols
The number of actual columns in the table


numColGroups

private int numColGroups
Columns are organized into groups of 32. This says how many column groups. (We could calculate this, but we store the value to avoid having to repeatedly calculate it.)


table

private short[] table
The actual compressed state table. Each conceptual row represents a state, and the cells in it contain the row numbers of the states to transition to for each possible letter. 0 is used to indicate an illegal combination of letters (i.e., the error state). The table is compressed by eliminating all the unpopulated (i.e., zero) cells. Multiple conceptual rows can then be doubled up in a single physical row by sliding them up and possibly shifting them to one side or the other so the populated cells don't collide. Indexes are used to identify unpopulated cells and to locate populated cells.


rowIndex

private short[] rowIndex
This index maps logical row numbers to physical row numbers


rowIndexFlags

private int[] rowIndexFlags
A bitmap is used to tell which cells in the comceptual table are populated. This array contains all the unique bit combinations in that bitmap. If the table is more than 32 columns wide, successive entries in this array are used for a single row.


rowIndexFlagsIndex

private short[] rowIndexFlagsIndex
This index maps from a logical row number into the bitmap table above. (This keeps us from storing duplicate bitmap combinations.) Since there are a lot of rows with only one populated cell, instead of wasting space in the bitmap table, we just store a negative number in this index for rows with one populated cell. The absolute value of that number is the column number of the populated cell.


rowIndexShifts

private byte[] rowIndexShifts
For each logical row, this index contains a constant that is added to the logical column number to get the physical column number

Constructor Detail

BreakDictionary

public BreakDictionary(InputStream dictionaryStream)
                throws IOException
Method Detail

main

public static void main(String[] args)
                 throws FileNotFoundException,
                        UnsupportedEncodingException,
                        IOException
Throws:
FileNotFoundException
UnsupportedEncodingException
IOException

printWordList

public void printWordList(String partialWord,
                          int state,
                          PrintWriter out)
                   throws IOException
Throws:
IOException

readDictionaryFile

public void readDictionaryFile(DataInputStream in)
                        throws IOException
Throws:
IOException

at

public final short at(int row,
                      char ch)
Uses the column map to map the character to a column number, then passes the row and column number to the other version of at()

Parameters:
row - The current state
ch - The character whose column we're interested in
Returns:
The new state to transition to

at

public final short at(int row,
                      int col)
Returns the value in the cell with the specified (logical) row and column numbers. In DictionaryBasedBreakIterator, the row number is a state number, the column number is an input, and the return value is the row number of the new state to transition to. (0 is the "error" state, and -1 is the "end of word" state in a dictionary)

Parameters:
row - The row number of the current state
col - The column number of the input character (0 means "not a dictionary character")
Returns:
The row number of the new state to transition to

cellIsPopulated

private final boolean cellIsPopulated(int row,
                                      int col)
Given (logical) row and column numbers, returns true if the cell in that position is populated


internalAt

private final short internalAt(int row,
                               int col)
Implementation of at() when we know the specified cell is populated.

Parameters:
row - The PHYSICAL row number of the cell
col - The PHYSICAL column number of the cell
Returns:
The value stored in the cell