|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object java.io.Reader java.io.FilterReader leila.parsing.HTMLReader
public class HTMLReader
This class is part of LEILA (http://mpii.de/yago-naga/leila). It is licensed under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0) by the author Fabian M. Suchanek (http://suchanek.name).
The HTML-Reader reads normalized characters from a HTML-file.
See Char.java for the definition of "normalized".
Tags are returned as "TAG_tagname".
Example:
HTMLReader r=new HTMLReader(new FileReader("index.html")); String s; while(null!=(s=r.readString())) System.out.print(s); --> This is the HTML-file, with resolved ampersand sequences and with all characters normalized, even the umlauts ae, oe, ue. Tags appear like TAG_I this TAG_/I.
Field Summary | |
---|---|
protected java.lang.String |
internalBuf
|
protected java.lang.String |
tagContent
Holds the content of the last tag |
protected boolean |
wasWhiteSpace
Static variable for readString() |
Fields inherited from class java.io.FilterReader |
---|
in |
Fields inherited from class java.io.Reader |
---|
lock |
Constructor Summary | |
---|---|
HTMLReader(java.io.File f)
Constructs a HTMLReader from a File |
|
HTMLReader(java.io.Reader s)
Constructs a HTMLReader from a Reader |
|
HTMLReader(java.net.URL url)
Constructs a HTMLReader for an URL |
Method Summary | |
---|---|
java.lang.String |
getLastTagContent()
Returns the content of the last tag |
static void |
main(java.lang.String[] argv)
Test routine |
int |
read()
Returns a single character. |
int |
read(java.nio.CharBuffer buffi)
Reads into a charbuffer |
java.lang.String |
readString()
Reads a character, returns null for EndOfFile and "TAG_tagname" for tags. |
java.lang.String |
readTaggedText(java.lang.String t)
Seeks the next tag of name t and returns all text to the terminating tag /t. |
java.lang.String |
readText(int n)
Reads a sequence of characters up to the blank following the nth char, ignores tags |
java.lang.String |
readTextChar()
Reads a character, ignores tags |
boolean |
scrollTo(java.lang.String s)
Seeks a specific string and scrolls to it, returns TRUE if found |
Methods inherited from class java.io.FilterReader |
---|
close, mark, markSupported, read, ready, reset, skip |
Methods inherited from class java.io.Reader |
---|
read |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected java.lang.String internalBuf
protected java.lang.String tagContent
protected boolean wasWhiteSpace
Constructor Detail |
---|
public HTMLReader(java.io.Reader s)
public HTMLReader(java.net.URL url) throws java.io.IOException
java.io.IOException
public HTMLReader(java.io.File f) throws java.io.FileNotFoundException
java.io.FileNotFoundException
Method Detail |
---|
public java.lang.String readText(int n) throws java.io.IOException
java.io.IOException
public java.lang.String readTextChar() throws java.io.IOException
java.io.IOException
public int read() throws java.io.IOException
read
in class java.io.FilterReader
java.io.IOException
public int read(java.nio.CharBuffer buffi) throws java.io.IOException
read
in interface java.lang.Readable
read
in class java.io.Reader
java.io.IOException
public java.lang.String getLastTagContent()
public java.lang.String readString() throws java.io.IOException
java.io.IOException
public java.lang.String readTaggedText(java.lang.String t) throws java.io.IOException
java.io.IOException
public boolean scrollTo(java.lang.String s) throws java.io.IOException
java.io.IOException
public static void main(java.lang.String[] argv) throws java.lang.Exception
java.lang.Exception
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |