org.apache.xalan.serialize
Class CharInfo

java.lang.Object
  extended byorg.apache.xalan.serialize.CharInfo

public class CharInfo
extends Object

This class provides services that tell if a character should have special treatement, such as entity reference substitution or normalization of a newline character. It also provides character to entity reference lookup. DEVELOPERS: See Known Issue in the constructor.


Field Summary
static String HTML_ENTITIES_RESOURCE
          The name of the HTML entities file.
private  CharKey m_charKey
           
private  Hashtable m_charToEntityRef
          Lookup table for characters to entity references.
(package private)  BitSet m_specialsMap
          Bit map that tells if a given character should have special treatment.
private static Class[] NO_CLASSES
          a zero length Class array used in the constructor
private static Object[] NO_OBJS
          a zero length Object array used in the constructor
static char S_CARRIAGERETURN
          The carriage return character, which the parser should always normalize.
static char S_LINEFEED
          The linefeed character, which the parser should always normalize.
static String XML_ENTITIES_RESOURCE
          The name of the XML entities file.
 
Constructor Summary
CharInfo(String entitiesResource)
          Constructor that reads in a resource file that describes the mapping of characters to entity references.
 
Method Summary
protected  void defineEntity(String name, char value)
          Defines a new character reference.
 String getEntityNameForChar(char value)
          Resolve a character to an entity reference name.
 boolean isSpecial(char value)
          Tell if the character argument should have special treatment.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

m_specialsMap

BitSet m_specialsMap
Bit map that tells if a given character should have special treatment.


m_charToEntityRef

private Hashtable m_charToEntityRef
Lookup table for characters to entity references.


HTML_ENTITIES_RESOURCE

public static String HTML_ENTITIES_RESOURCE
The name of the HTML entities file. If specified, the file will be resource loaded with the default class loader.


XML_ENTITIES_RESOURCE

public static String XML_ENTITIES_RESOURCE
The name of the XML entities file. If specified, the file will be resource loaded with the default class loader.


S_LINEFEED

public static char S_LINEFEED
The linefeed character, which the parser should always normalize.


S_CARRIAGERETURN

public static char S_CARRIAGERETURN
The carriage return character, which the parser should always normalize.


NO_CLASSES

private static final Class[] NO_CLASSES
a zero length Class array used in the constructor


NO_OBJS

private static final Object[] NO_OBJS
a zero length Object array used in the constructor


m_charKey

private CharKey m_charKey
Constructor Detail

CharInfo

public CharInfo(String entitiesResource)
Constructor that reads in a resource file that describes the mapping of characters to entity references. Resource files must be encoded in UTF-8 and have a format like:
 # First char # is a comment
 Entity numericValue
 quot 34
 amp 38
 
(Note: Why don't we just switch to .properties files? Oct-01 -sc)

Parameters:
entitiesResource - Name of entities resource file that should be loaded, which describes that mapping of characters to entity references.
Method Detail

defineEntity

protected void defineEntity(String name,
                            char value)
Defines a new character reference. The reference's name and value are supplied. Nothing happens if the character reference is already defined.

Unlike internal entities, character references are a string to single character mapping. They are used to map non-ASCII characters both on parsing and printing, primarily for HTML documents. '<amp;' is an example of a character reference.

Parameters:
name - The entity's name
value - The entity's value

getEntityNameForChar

public String getEntityNameForChar(char value)
Resolve a character to an entity reference name. This is reusing a stored key object, in an effort to avoid heap activity. Unfortunately, that introduces a threading risk. Simplest fix for now is to make it a synchronized method, or to give up the reuse; I see very little performance difference between them. Long-term solution would be to replace the hashtable with a sparse array keyed directly from the character's integer value; see DTM's string pool for a related solution.

Parameters:
value - character value that should be resolved to a name.
Returns:
name of character entity, or null if not found.

isSpecial

public boolean isSpecial(char value)
Tell if the character argument should have special treatment.

Parameters:
value - character value.
Returns:
true if the character should have any special treatment.