org.apache.crimson.parser
Class XmlReader

java.lang.Object
  extended byjava.io.Reader
      extended byorg.apache.crimson.parser.XmlReader

final class XmlReader
extends Reader

This handles several XML-related tasks that normal java.io Readers don't support, inluding use of IETF standard encoding names and automatic detection of most XML encodings. The former is needed for interoperability; the latter is needed to conform with the XML spec. This class also optimizes reading some common encodings by providing low-overhead unsynchronized Reader support.

Note that the autodetection facility should be used only on data streams which have an unknown character encoding. For example, it should never be used on MIME text/xml entities.

Note that XML processors are only required to support UTF-8 and UTF-16 character encodings. Autodetection permits the underlying Java implementation to provide support for many other encodings, such as US-ASCII, ISO-8859-5, Shift_JIS, EUC-JP, and ISO-2022-JP.

Author:
David Brownell

Nested Class Summary
(package private) static class XmlReader.AsciiReader
           
(package private) static class XmlReader.BaseReader
           
(package private) static class XmlReader.Iso8859_1Reader
           
(package private) static class XmlReader.Utf8Reader
           
 
Field Summary
private  String assignedEncoding
           
private static Hashtable charsets
           
private  boolean closed
           
private  Reader in
           
private static int MAXPUSHBACK
           
 
Fields inherited from class java.io.Reader
lock
 
Constructor Summary
private XmlReader(InputStream stream)
           
 
Method Summary
 void close()
          Closes the reader.
static Reader createReader(InputStream in)
          Constructs the reader from an input stream, autodetecting the encoding to use according to the heuristic specified in the XML 1.0 recommendation.
static Reader createReader(InputStream in, String encoding)
          Creates a reader supporting the given encoding, mapping from standard encoding names to ones that understood by Java where necessary.
 String getEncoding()
          Returns the standard name of the encoding in use
 void mark(int value)
          Sets a mark allowing a limited number of characters to be "peeked", by reading and then resetting.
 boolean markSupported()
          Returns true iff the reader supports mark/reset.
 int read()
          Reads a single character.
 int read(char[] buf, int off, int len)
          Reads the number of characters read into the buffer, or -1 on EOF.
 boolean ready()
          Returns true iff input characters are known to be ready.
 void reset()
          Resets the current position to the last marked position.
private  void setEncoding(InputStream stream, String encoding)
           
 long skip(long value)
          Skips a specified number of characters.
private static String std2java(String encoding)
           
private  void useEncodingDecl(PushbackInputStream pb, String encoding)
           
 
Methods inherited from class java.io.Reader
read
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MAXPUSHBACK

private static final int MAXPUSHBACK
See Also:
Constant Field Values

in

private Reader in

assignedEncoding

private String assignedEncoding

closed

private boolean closed

charsets

private static final Hashtable charsets
Constructor Detail

XmlReader

private XmlReader(InputStream stream)
           throws IOException
Method Detail

createReader

public static Reader createReader(InputStream in)
                           throws IOException
Constructs the reader from an input stream, autodetecting the encoding to use according to the heuristic specified in the XML 1.0 recommendation.

Parameters:
in - the input stream from which the reader is constructed
Throws:
IOException - on error, such as unrecognized encoding

createReader

public static Reader createReader(InputStream in,
                                  String encoding)
                           throws IOException
Creates a reader supporting the given encoding, mapping from standard encoding names to ones that understood by Java where necessary.

Parameters:
in - the input stream from which the reader is constructed
encoding - the IETF standard name of the encoding to use; if null, autodetection is used.
Throws:
IOException - on error, including unrecognized encoding

std2java

private static String std2java(String encoding)

getEncoding

public String getEncoding()
Returns the standard name of the encoding in use


useEncodingDecl

private void useEncodingDecl(PushbackInputStream pb,
                             String encoding)
                      throws IOException
Throws:
IOException

setEncoding

private void setEncoding(InputStream stream,
                         String encoding)
                  throws IOException
Throws:
IOException

read

public int read(char[] buf,
                int off,
                int len)
         throws IOException
Reads the number of characters read into the buffer, or -1 on EOF.

Specified by:
read in class Reader
Parameters:
buf - Destination buffer
off - Offset at which to start storing characters
len - Maximum number of characters to read
Returns:
The number of characters read, or -1 if the end of the stream has been reached
Throws:
IOException - If an I/O error occurs

read

public int read()
         throws IOException
Reads a single character.

Overrides:
read in class Reader
Returns:
The character read, as an integer in the range 0 to 65535 (0x00-0xffff), or -1 if the end of the stream has been reached
Throws:
IOException - If an I/O error occurs

markSupported

public boolean markSupported()
Returns true iff the reader supports mark/reset.

Overrides:
markSupported in class Reader
Returns:
true if and only if this stream supports the mark operation.

mark

public void mark(int value)
          throws IOException
Sets a mark allowing a limited number of characters to be "peeked", by reading and then resetting.

Overrides:
mark in class Reader
Parameters:
value - how many characters may be "peeked".
Throws:
IOException - If the stream does not support mark(), or if some other I/O error occurs

reset

public void reset()
           throws IOException
Resets the current position to the last marked position.

Overrides:
reset in class Reader
Throws:
IOException - If the stream has not been marked, or if the mark has been invalidated, or if the stream does not support reset(), or if some other I/O error occurs

skip

public long skip(long value)
          throws IOException
Skips a specified number of characters.

Overrides:
skip in class Reader
Parameters:
value - The number of characters to skip
Returns:
The number of characters actually skipped
Throws:
IOException - If an I/O error occurs

ready

public boolean ready()
              throws IOException
Returns true iff input characters are known to be ready.

Overrides:
ready in class Reader
Returns:
True if the next read() is guaranteed not to block for input, false otherwise. Note that returning false does not guarantee that the next read will block.
Throws:
IOException - If an I/O error occurs

close

public void close()
           throws IOException
Closes the reader.

Specified by:
close in class Reader
Throws:
IOException - If an I/O error occurs