|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object java.text.RuleBasedBreakIterator.Builder
The Builder class has the job of constructing a RuleBasedBreakIterator from a textual description. A Builder is constructed by RuleBasedBreakIterator's constructor, which uses it to construct the iterator itself and then throws it away.
The construction logic is separated out into its own class for two primary reasons:
It'd be really nice if this could be an independent class rather than an inner class, because that would shorten the source file considerably, but making Builder an inner class of RuleBasedBreakIterator allows it direct access to RuleBasedBreakIterator's private members, which saves us from having to provide some kind of "back door" to the Builder class that could then also be used by other classes.
Field Summary | |
protected static int |
ALL_FLAGS
A bit mask representing the union of the mask values listed above. |
protected Vector |
categories
A temporary holding place used for calculating the character categories. |
protected boolean |
clearLoopingStates
A flag that is used to indicate when the list of looping states can be reset. |
protected Vector |
decisionPointList
A list of all the states that have to be filled in with transitions to the next state that is created. |
protected Stack |
decisionPointStack
A stack for holding decision point lists. |
protected static int |
DONT_LOOP_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a state as one the builder shouldn't loop to any looping states |
protected static int |
END_STATE_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a state as an accepting state. |
protected Hashtable |
expressions
A table used to map parts of regexp text to lists of character categories, rather than having to figure them out from scratch each time |
protected CharSet |
ignoreChars
A temporary holding place for the list of ignore characters |
protected static int |
LOOKAHEAD_STATE_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a state as a lookahead state. |
protected Vector |
loopingStates
A list of states that loop back on themselves. |
protected Vector |
mergeList
A list mapping pairs of state numbers for states that are to be combined to the state number of the state representing their combination. |
protected Vector |
statesToBackfill
Looping states actually have to be backfilled later in the process than everything else. |
protected Vector |
tempStateTable
A temporary holding place where the forward state table is built |
Constructor Summary | |
RuleBasedBreakIterator.Builder()
No special construction is required for the Builder. |
Method Summary | |
private void |
backfillLoopingStates()
This function completes the backfilling process by actually doing the backfilling on the states that are marked for it |
private void |
buildBackwardsStateTable(Vector tempRuleList)
This function builds the backward state table from the forward state table and any additional rules (identified by the ! |
void |
buildBreakIterator()
This is the main function for setting up the BreakIterator's tables. |
protected void |
buildCharCategories(Vector tempRuleList)
This function builds the character category table. |
private Vector |
buildRuleList(String description)
Thus function has three main purposes: Perform general syntax checking on the description, so the rest of the build code can assume that it's parsing a legal description. |
private void |
buildStateTable(Vector tempRuleList)
This is the function that builds the forward state table. |
private void |
eliminateBackfillStates(int baseState)
This removes "ending states" and states reachable from them from the list of states to backfill. |
protected void |
error(String message,
int position,
String context)
Throws an IllegalArgumentException representing a syntax error in the rule description. |
private void |
finishBuildingStateTable(boolean forward)
This function completes the state-table-building process by doing several postprocessing steps and copying everything into its final resting place in the iterator itself |
protected void |
handleSpecialSubstitution(String replace,
String replaceWith,
int startPos,
String description)
This function defines a protocol for handling substitution names that are "special," i.e., that have some property beyond just being substitutions. |
private void |
mergeStates(int rowNum,
short[] newValues,
Vector rowsBeingUpdated)
The real work of making the state table deterministic happens here. |
protected void |
mungeExpressionList(Hashtable expressions)
|
private void |
parseRule(String rule,
boolean forward)
This is where most of the work really happens. |
protected String |
processSubstitution(String substitutionRule,
String description,
int startPos)
This function performs variable-name substitutions. |
private int |
searchMergeList(int a,
int b)
The merge list is a list of pairs of rows that have been merged somewhere in the process of building this state table, along with the row number of the row containing the merged state. |
private void |
setLoopingStates(Vector newLoopingStates,
Vector endStates)
This function is used to update the list of current loooping states (i.e., states that are controlled by a *? |
private void |
updateStateTable(Vector rows,
String pendingChars,
short newValue)
Update entries in the state table, and merge states when necessary to keep the table deterministic. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
protected Vector categories
protected Hashtable expressions
protected CharSet ignoreChars
protected Vector tempStateTable
protected Vector decisionPointList
protected Stack decisionPointStack
protected Vector loopingStates
protected Vector statesToBackfill
protected Vector mergeList
protected boolean clearLoopingStates
protected static final int END_STATE_FLAG
protected static final int DONT_LOOP_FLAG
protected static final int LOOKAHEAD_STATE_FLAG
protected static final int ALL_FLAGS
Constructor Detail |
public RuleBasedBreakIterator.Builder()
Method Detail |
public void buildBreakIterator()
private Vector buildRuleList(String description)
protected String processSubstitution(String substitutionRule, String description, int startPos)
protected void handleSpecialSubstitution(String replace, String replaceWith, int startPos, String description)
protected void buildCharCategories(Vector tempRuleList)
protected void mungeExpressionList(Hashtable expressions)
private void buildStateTable(Vector tempRuleList)
private void parseRule(String rule, boolean forward)
private void updateStateTable(Vector rows, String pendingChars, short newValue)
rows
- The list of rows that need updating (the decision point list)pendingChars
- A character category list, encoded in a String. This is the
list of the columns that need updating.newValue
- Update the cells specfied above to contain this valueprivate void mergeStates(int rowNum, short[] newValues, Vector rowsBeingUpdated)
rowNum
- The row number in the state table of the state to be updatednewValues
- The state to merge it with.rowsBeingUpdated
- A copy of the list of rows passed to updateStateTable()
(itself a copy of the decision point list from parseRule()). Newly-created
states get added to the decision point list if their "parents" were on it.private int searchMergeList(int a, int b)
private void setLoopingStates(Vector newLoopingStates, Vector endStates)
newLoopingStates
- The list of new looping statesendStates
- The list of states to treat as end states (states that
can exit the loop).private void eliminateBackfillStates(int baseState)
private void backfillLoopingStates()
private void finishBuildingStateTable(boolean forward)
forward
- True if we're working on the forward state tableprivate void buildBackwardsStateTable(Vector tempRuleList)
protected void error(String message, int position, String context)
message
- A message describing the problemposition
- The position in the description where the problem was
discoveredcontext
- The string containing the error
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |