javatools.filehandlers
Class HTMLReader

java.lang.Object
  extended by java.io.Reader
      extended by javatools.filehandlers.HTMLReader
All Implemented Interfaces:
java.io.Closeable, java.lang.Readable

public class HTMLReader
extends java.io.Reader

This class is part of the Java Tools (see http://mpii.de/yago-naga/javatools). It is licensed under the Creative Commons Attribution License (see http://creativecommons.org/licenses/by/3.0) by the YAGO-NAGA team (see http://mpii.de/yago-naga). The HTML-Reader reads characters from a HTML-file.
Example:

         HTMLReader r=new HTMLReader(new File("index.html"));
         int c;
         while((c=r.read())!=-1) {
           if(c==-2) System.out.print(" TAG:",r.getTag());
           else System.out.print(c);
         }

         -->
             This is the HTML-file, with resolved ampersand sequences
             and with -2 returned for tags.
   

If the file is UTF8-encoded, consider wrapping an UTF8Reader:

     HTMLReader r=new HTMLReader(new UTF8Reader(new File("index.html")));
   


Field Summary
 boolean skipSTYLE
          TRUE to skip STYLE attributes
 
Constructor Summary
HTMLReader(java.io.File f)
          Constructs a HTMLReader from a File
HTMLReader(java.io.File f, java.lang.String message)
          Constructs a HTMLReader from a File with a progress bar
HTMLReader(java.io.Reader s)
          Constructs a HTMLReader from a Reader
HTMLReader(java.net.URL url)
          Constructs a HTMLReader for an URL
 
Method Summary
 void close()
           
 java.lang.String getTag()
          Returns the last tag (uppercased)
 java.lang.String getTagContent()
          Returns the content of the last tag
static void main(java.lang.String[] argv)
          Test routine
 int read()
          Reads a character, returns -2 for tags
 int read(char[] cbuf, int off, int len)
           
 java.lang.String readTaggedText(java.lang.String t)
          Seeks the next tag of name t and returns all text to the terminating tag /t.
 java.lang.String readTextLine(int n)
          Reads a sequence of characters up to the blank following the nth char, ignores tags
 boolean scrollTo(java.lang.String s)
          Seeks a specific string and scrolls to it, returns TRUE if found
 boolean scrollToTag(java.lang.String s)
          Seeks a specific tag and scrolls to it, returns TRUE if found
 java.lang.StringBuilder text(java.lang.String forTag)
          Returns the entire text
 
Methods inherited from class java.io.Reader
mark, markSupported, read, read, ready, reset, skip
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

skipSTYLE

public boolean skipSTYLE
TRUE to skip STYLE attributes

Constructor Detail

HTMLReader

public HTMLReader(java.io.Reader s)
Constructs a HTMLReader from a Reader


HTMLReader

public HTMLReader(java.net.URL url)
           throws java.io.IOException
Constructs a HTMLReader for an URL

Throws:
java.io.IOException

HTMLReader

public HTMLReader(java.io.File f)
           throws java.io.FileNotFoundException
Constructs a HTMLReader from a File

Throws:
java.io.FileNotFoundException

HTMLReader

public HTMLReader(java.io.File f,
                  java.lang.String message)
           throws java.io.FileNotFoundException
Constructs a HTMLReader from a File with a progress bar

Throws:
java.io.FileNotFoundException
Method Detail

readTextLine

public java.lang.String readTextLine(int n)
                              throws java.io.IOException
Reads a sequence of characters up to the blank following the nth char, ignores tags

Throws:
java.io.IOException

getTagContent

public java.lang.String getTagContent()
Returns the content of the last tag


getTag

public java.lang.String getTag()
Returns the last tag (uppercased)


read

public int read()
         throws java.io.IOException
Reads a character, returns -2 for tags

Overrides:
read in class java.io.Reader
Throws:
java.io.IOException

readTaggedText

public java.lang.String readTaggedText(java.lang.String t)
                                throws java.io.IOException
Seeks the next tag of name t and returns all text to the terminating tag /t. Nesting is not supported. Returns null if t was not found.

Throws:
java.io.IOException

text

public java.lang.StringBuilder text(java.lang.String forTag)
                             throws java.io.IOException
Returns the entire text

Throws:
java.io.IOException

scrollTo

public boolean scrollTo(java.lang.String s)
                 throws java.io.IOException
Seeks a specific string and scrolls to it, returns TRUE if found

Throws:
java.io.IOException

scrollToTag

public boolean scrollToTag(java.lang.String s)
                    throws java.io.IOException
Seeks a specific tag and scrolls to it, returns TRUE if found

Throws:
java.io.IOException

close

public void close()
           throws java.io.IOException
Specified by:
close in interface java.io.Closeable
Specified by:
close in class java.io.Reader
Throws:
java.io.IOException

read

public int read(char[] cbuf,
                int off,
                int len)
         throws java.io.IOException
Specified by:
read in class java.io.Reader
Throws:
java.io.IOException

main

public static void main(java.lang.String[] argv)
                 throws java.lang.Exception
Test routine

Throws:
java.lang.Exception