java.text
Class RuleBasedBreakIterator.Builder

java.lang.Object
  extended byjava.text.RuleBasedBreakIterator.Builder
Direct Known Subclasses:
DictionaryBasedBreakIterator.Builder
Enclosing class:
RuleBasedBreakIterator

protected class RuleBasedBreakIterator.Builder
extends Object

The Builder class has the job of constructing a RuleBasedBreakIterator from a textual description. A Builder is constructed by RuleBasedBreakIterator's constructor, which uses it to construct the iterator itself and then throws it away.

The construction logic is separated out into its own class for two primary reasons:

It'd be really nice if this could be an independent class rather than an inner class, because that would shorten the source file considerably, but making Builder an inner class of RuleBasedBreakIterator allows it direct access to RuleBasedBreakIterator's private members, which saves us from having to provide some kind of "back door" to the Builder class that could then also be used by other classes.


Field Summary
protected static int ALL_FLAGS
          A bit mask representing the union of the mask values listed above.
protected  Vector categories
          A temporary holding place used for calculating the character categories.
protected  boolean clearLoopingStates
          A flag that is used to indicate when the list of looping states can be reset.
protected  Vector decisionPointList
          A list of all the states that have to be filled in with transitions to the next state that is created.
protected  Stack decisionPointStack
          A stack for holding decision point lists.
protected static int DONT_LOOP_FLAG
          A bit mask used to indicate a bit in the table's flags column that marks a state as one the builder shouldn't loop to any looping states
protected static int END_STATE_FLAG
          A bit mask used to indicate a bit in the table's flags column that marks a state as an accepting state.
protected  Hashtable expressions
          A table used to map parts of regexp text to lists of character categories, rather than having to figure them out from scratch each time
protected  CharSet ignoreChars
          A temporary holding place for the list of ignore characters
protected static int LOOKAHEAD_STATE_FLAG
          A bit mask used to indicate a bit in the table's flags column that marks a state as a lookahead state.
protected  Vector loopingStates
          A list of states that loop back on themselves.
protected  Vector mergeList
          A list mapping pairs of state numbers for states that are to be combined to the state number of the state representing their combination.
protected  Vector statesToBackfill
          Looping states actually have to be backfilled later in the process than everything else.
protected  Vector tempStateTable
          A temporary holding place where the forward state table is built
 
Constructor Summary
RuleBasedBreakIterator.Builder()
          No special construction is required for the Builder.
 
Method Summary
private  void backfillLoopingStates()
          This function completes the backfilling process by actually doing the backfilling on the states that are marked for it
private  void buildBackwardsStateTable(Vector tempRuleList)
          This function builds the backward state table from the forward state table and any additional rules (identified by the !
 void buildBreakIterator()
          This is the main function for setting up the BreakIterator's tables.
protected  void buildCharCategories(Vector tempRuleList)
          This function builds the character category table.
private  Vector buildRuleList(String description)
          Thus function has three main purposes: Perform general syntax checking on the description, so the rest of the build code can assume that it's parsing a legal description.
private  void buildStateTable(Vector tempRuleList)
          This is the function that builds the forward state table.
private  void eliminateBackfillStates(int baseState)
          This removes "ending states" and states reachable from them from the list of states to backfill.
protected  void error(String message, int position, String context)
          Throws an IllegalArgumentException representing a syntax error in the rule description.
private  void finishBuildingStateTable(boolean forward)
          This function completes the state-table-building process by doing several postprocessing steps and copying everything into its final resting place in the iterator itself
protected  void handleSpecialSubstitution(String replace, String replaceWith, int startPos, String description)
          This function defines a protocol for handling substitution names that are "special," i.e., that have some property beyond just being substitutions.
private  void mergeStates(int rowNum, short[] newValues, Vector rowsBeingUpdated)
          The real work of making the state table deterministic happens here.
protected  void mungeExpressionList(Hashtable expressions)
           
private  void parseRule(String rule, boolean forward)
          This is where most of the work really happens.
protected  String processSubstitution(String substitutionRule, String description, int startPos)
          This function performs variable-name substitutions.
private  int searchMergeList(int a, int b)
          The merge list is a list of pairs of rows that have been merged somewhere in the process of building this state table, along with the row number of the row containing the merged state.
private  void setLoopingStates(Vector newLoopingStates, Vector endStates)
          This function is used to update the list of current loooping states (i.e., states that are controlled by a *?
private  void updateStateTable(Vector rows, String pendingChars, short newValue)
          Update entries in the state table, and merge states when necessary to keep the table deterministic.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

categories

protected Vector categories
A temporary holding place used for calculating the character categories. This object contains CharSet objects.


expressions

protected Hashtable expressions
A table used to map parts of regexp text to lists of character categories, rather than having to figure them out from scratch each time


ignoreChars

protected CharSet ignoreChars
A temporary holding place for the list of ignore characters


tempStateTable

protected Vector tempStateTable
A temporary holding place where the forward state table is built


decisionPointList

protected Vector decisionPointList
A list of all the states that have to be filled in with transitions to the next state that is created. Used when building the state table from the regular expressions.


decisionPointStack

protected Stack decisionPointStack
A stack for holding decision point lists. This is used to handle nested parentheses and braces in regexps.


loopingStates

protected Vector loopingStates
A list of states that loop back on themselves. Used to handle .*?


statesToBackfill

protected Vector statesToBackfill
Looping states actually have to be backfilled later in the process than everything else. This is where a the list of states to backfill is accumulated. This is also used to handle .*?


mergeList

protected Vector mergeList
A list mapping pairs of state numbers for states that are to be combined to the state number of the state representing their combination. Used in the process of making the state table deterministic to prevent infinite recursion.


clearLoopingStates

protected boolean clearLoopingStates
A flag that is used to indicate when the list of looping states can be reset.


END_STATE_FLAG

protected static final int END_STATE_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a state as an accepting state.

See Also:
Constant Field Values

DONT_LOOP_FLAG

protected static final int DONT_LOOP_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a state as one the builder shouldn't loop to any looping states

See Also:
Constant Field Values

LOOKAHEAD_STATE_FLAG

protected static final int LOOKAHEAD_STATE_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a state as a lookahead state.

See Also:
Constant Field Values

ALL_FLAGS

protected static final int ALL_FLAGS
A bit mask representing the union of the mask values listed above. Used for clearing or masking off the flag bits.

See Also:
Constant Field Values
Constructor Detail

RuleBasedBreakIterator.Builder

public RuleBasedBreakIterator.Builder()
No special construction is required for the Builder.

Method Detail

buildBreakIterator

public void buildBreakIterator()
This is the main function for setting up the BreakIterator's tables. It just vectors different parts of the job off to other functions.


buildRuleList

private Vector buildRuleList(String description)
Thus function has three main purposes:


processSubstitution

protected String processSubstitution(String substitutionRule,
                                     String description,
                                     int startPos)
This function performs variable-name substitutions. First it does syntax checking on the variable-name definition. If it's syntactically valid, it then goes through the remainder of the description and does a simple find-and-replace of the variable name with its text. (The variable text must be enclosed in either [] or () for this to work.)


handleSpecialSubstitution

protected void handleSpecialSubstitution(String replace,
                                         String replaceWith,
                                         int startPos,
                                         String description)
This function defines a protocol for handling substitution names that are "special," i.e., that have some property beyond just being substitutions. At the RuleBasedBreakIterator level, we have one special substitution name, "". Subclasses can override this function to add more. Any special processing that has to go on beyond that which is done by the normal substitution-processing code is done here.


buildCharCategories

protected void buildCharCategories(Vector tempRuleList)
This function builds the character category table. On entry, tempRuleList is a vector of break rules that has had variable names substituted. On exit, the charCategoryTable data member has been initialized to hold the character category table, and tempRuleList's rules have been munged to contain character category numbers everywhere a literal character or a [] expression originally occurred.


mungeExpressionList

protected void mungeExpressionList(Hashtable expressions)

buildStateTable

private void buildStateTable(Vector tempRuleList)
This is the function that builds the forward state table. Most of the real work is done in parseRule(), which is called once for each rule in the description.


parseRule

private void parseRule(String rule,
                       boolean forward)
This is where most of the work really happens. This routine parses a single rule in the rule description, adding and modifying states in the state table according to the new expression. The state table is kept deterministic throughout the whole operation, although some ugly postprocessing is needed to handle the *? token.


updateStateTable

private void updateStateTable(Vector rows,
                              String pendingChars,
                              short newValue)
Update entries in the state table, and merge states when necessary to keep the table deterministic.

Parameters:
rows - The list of rows that need updating (the decision point list)
pendingChars - A character category list, encoded in a String. This is the list of the columns that need updating.
newValue - Update the cells specfied above to contain this value

mergeStates

private void mergeStates(int rowNum,
                         short[] newValues,
                         Vector rowsBeingUpdated)
The real work of making the state table deterministic happens here. This function merges a state in the state table (specified by rowNum) with a state that is passed in (newValues). The basic process is to copy the nonzero cells in newStates into the state in the state table (we'll call that oldValues). If there's a collision (i.e., if the same cell has a nonzero value in both states, and it's not the SAME value), then we have to reconcile the collision. We do this by creating a new state, adding it to the end of the state table, and using this function recursively to merge the original two states into a single, combined state. This process may happen recursively (i.e., each successive level may involve collisions). To prevent infinite recursion, we keep a log of merge operations. Any time we're merging two states we've merged before, we can just supply the row number for the result of that merge operation rather than creating a new state just like it.

Parameters:
rowNum - The row number in the state table of the state to be updated
newValues - The state to merge it with.
rowsBeingUpdated - A copy of the list of rows passed to updateStateTable() (itself a copy of the decision point list from parseRule()). Newly-created states get added to the decision point list if their "parents" were on it.

searchMergeList

private int searchMergeList(int a,
                            int b)
The merge list is a list of pairs of rows that have been merged somewhere in the process of building this state table, along with the row number of the row containing the merged state. This function looks up a pair of row numbers and returns the row number of the row they combine into. (It returns 0 if this pair of rows isn't in the merge list.)


setLoopingStates

private void setLoopingStates(Vector newLoopingStates,
                              Vector endStates)
This function is used to update the list of current loooping states (i.e., states that are controlled by a *? construct). It backfills values from the looping states into unpopulated cells of the states that are currently marked for backfilling, and then updates the list of looping states to be the new list

Parameters:
newLoopingStates - The list of new looping states
endStates - The list of states to treat as end states (states that can exit the loop).

eliminateBackfillStates

private void eliminateBackfillStates(int baseState)
This removes "ending states" and states reachable from them from the list of states to backfill.


backfillLoopingStates

private void backfillLoopingStates()
This function completes the backfilling process by actually doing the backfilling on the states that are marked for it


finishBuildingStateTable

private void finishBuildingStateTable(boolean forward)
This function completes the state-table-building process by doing several postprocessing steps and copying everything into its final resting place in the iterator itself

Parameters:
forward - True if we're working on the forward state table

buildBackwardsStateTable

private void buildBackwardsStateTable(Vector tempRuleList)
This function builds the backward state table from the forward state table and any additional rules (identified by the ! on the front) supplied in the description


error

protected void error(String message,
                     int position,
                     String context)
Throws an IllegalArgumentException representing a syntax error in the rule description. The exception's message contains some debugging information.

Parameters:
message - A message describing the problem
position - The position in the description where the problem was discovered
context - The string containing the error