ISCrawlerInterface

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

ISSearch
Interface ISCrawlerInterface

All Superinterfaces:: Runnable

All Known Subinterfaces:: ISDBCrawlerInterface

All Known Implementing Classes:: ISCrawler, ISDBCrawler

public interface ISCrawlerInterface
extends Runnable

Interface of the main Crawler class of the Web search engine. This class is used to start and stop the Crawler, to reset the engine and to control crawling parameters.

See Also:: Runnable, Thread, InetAddress, URL, HttpURLConnection, InputStreamReader, BufferedReader, Exception

Field Summary
`static int`	`RUNNING` The Running state of the current thread
`static int`	`STOPPED` The Idle state of the current thread

Method Summary
`void`	`addLink(URL link)` Adds a new link to the URL queue, if the link is not yet visited.
`URL`	`getBest()` Returns the best candidate to be visited next.
`int`	`getCrawlingDepth()` Returns the current maximum allowed crawling depth.
`ISDocumentInterface`	`getCurrentDocument()` Returns the last document visited by the Crawler.
`URL`	`getCurrentURL()` Returns the last URL visited by the Crawler.
`int`	`getMaxQueueSize()` Returns the maximum allowed size of the URL Queue
`int`	`getQueueSize()` Returns the current size of the URL queue
`int`	`getState()` Returns the current state of the crawler.
`boolean`	`isVisited(URL doc)` Checks if the URL of the given document is already visited by the crawler.
`void`	`reset()` Resets the crawler.
`void`	`setCrawlingDepth(int depth)` Sets the maximum allowed crawling depth.
`void`	`setQueueMaxSize(int m)` Set the maximum allowed size of the URL queue
`void`	`start()` Starts the thread of the crawler and changes the engine state to `RUNNING`
`void`	`stop()` Stops the crawler.

Methods inherited from interface java.lang.Runnable

run

Field Detail

RUNNING

public static final int RUNNING

The Running state of the current thread

See Also:: Constant Field Values

STOPPED

public static final int STOPPED

The Idle state of the current thread

See Also:: Constant Field Values

Method Detail

start

public void start()

Starts the thread of the crawler and changes the engine state to RUNNING

stop

public void stop()

Stops the crawler. This method stops crawling and sets the engine status to STOPPED.

reset

public void reset()

Resets the crawler. This method stops the crawling, resets the URL queue, and the list of visited links. Finally, it sets the crawler status to STOPPED,

addLink

public void addLink(URL link)

Adds a new link to the URL queue, if the link is not yet visited.

Parameters:: link - The URL link representation of the new target

getState

public int getState()

Returns the current state of the crawler. Possible states are RUNNING and STOPPED.

Returns:: The current state of the crawler, RUNNING oder STOPPED

getQueueSize

public int getQueueSize()

Returns the current size of the URL queue

Returns:: The current size of the URL queue.

setQueueMaxSize

public void setQueueMaxSize(int m)

Set the maximum allowed size of the URL queue

Parameters:: m - The maximum allowed Queue size

getMaxQueueSize

public int getMaxQueueSize()

Returns the maximum allowed size of the URL Queue

Returns:: The max allowed Queue size

setCrawlingDepth

public void setCrawlingDepth(int depth)

Sets the maximum allowed crawling depth.

Parameters:: depth - The maximum allowed craling depth.

getCrawlingDepth

public int getCrawlingDepth()

Returns the current maximum allowed crawling depth.

Returns:: The current allowed craling depth.

getBest

public URL getBest()

Returns the best candidate to be visited next. The result must have the highest priority (in the sense of the selected ordering strategy) under all available links.

Returns:: The best target to be visited by the Crawler next, null if the queue is empty.

isVisited

public boolean isVisited(URL doc)

Checks if the URL of the given document is already visited by the crawler.

Returns:: true if the engine was able to recognize the given URL as already visited, false.

getCurrentDocument

public ISDocumentInterface getCurrentDocument()

Returns the last document visited by the Crawler.

Returns:: The last visited document as object that implements ISDocumentInterface (and contains all extracted links, words and their stems); null if no documents were crawled yet.

getCurrentURL

public URL getCurrentURL()

Returns the last URL visited by the Crawler.

Returns:: The last visited URL; null if no links were crawled yet.