ISSearch
Interface ISParserInterface

All Known Implementing Classes:
ISParser

public interface ISParserInterface

Parser-Interface of the Web search engine. The parser reads any given text or HTML input (represented by the Reader) and extracts all words and links from that source. The result must be returned as Object that implements the ISDocumentInterface

See Also:
String, HTMLEditorKit, StringTokenizer

Method Summary
 boolean isStopword(String who)
          Decides whether the given token is claimed as stopword or not.
 ISDocumentInterface parse(Reader input)
          Performs the input analysis.
 String stem(String who)
          Applies the Porter stemming algorithm and returns the resulting word stem.
 

Method Detail

parse

public ISDocumentInterface parse(Reader input)
Performs the input analysis. Returns the container object that implements the ISDocumentInterface and contains extracted words, word stems, and links.

Parameters:
input - the input of the parser (e.g., text file or HTTP connection), represented by the Reader
Returns:
Container object with terms and links or null if any internal error occurs.

isStopword

public boolean isStopword(String who)
Decides whether the given token is claimed as stopword or not. This function must apply the FreeWAIS stopword list. This function must be implementen case-insensitive (e.g., both tokens 'the' and 'ThE' should be properly recognized as stopwords)

Parameters:
who - The String to be checked.
Returns:
true if the given string is a stopword, false otherwise.

stem

public String stem(String who)
Applies the Porter stemming algorithm and returns the resulting word stem. The output must be normalized (using String.toLowerCase() and String.trim())

Parameters:
who - The word to be stemmed.
Returns:
word stem, trimmed and lowercase.