ISDBCrawler

Method Summary
`void`	`addLink(URL link)` Adds a new link to the URL queue, if the link is not yet visited.
`void`	`closeDB()` Closes the database connection of the built-in database interface.
`URL`	`getBest()` Returns the best candidate to be visited next.
`int`	`getCrawlingDepth()` Returns the current maximum allowed crawling depth.
`ISDocumentInterface`	`getCurrentDocument()` Returns the last document visited by the Crawler.
`URL`	`getCurrentURL()` Returns the last URL visited by the Crawler.
`ISDBinterface`	`getDBInterface()` Returns the built-in database interface of the crawler
`int`	`getMaxQueueSize()` Returns the maximum allowed size of the URL Queue
`int`	`getQueueSize()` Returns the current size of the URL queue
`int`	`getState()` Returns the current state of the crawler.
`boolean`	`isVisited(URL doc)` Checks if the URL of the given document is already visited by the crawler.
`boolean`	`openDB()` Initializes the internal database interface and opens its database connection
`void`	`reset()` Resets the crawler.
`void`	`run()` When an object implementing interface `Runnable` is used to create a thread, starting the thread causes the object's `run` method to be called in that separately executing thread.
`void`	`setCrawlingDepth(int depth)` Sets the maximum allowed crawling depth.
`void`	`setQueueMaxSize(int m)` Set the maximum allowed size of the URL queue
`void`	`start()` Starts the thread of the crawler and changes the engine state to `RUNNING`
`void`	`stop()` Stops the crawler.
`boolean`	`store(URL link, ISDocumentInterface doc)` Stores the crawled document and its URL into the database

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

dbinterface

private ISDBinterface dbinterface

The Built-In database Interface of the crawler

Constructor Detail

ISDBCrawler

public ISDBCrawler()

Creates a new instance of ISCrawler

Method Detail

store

public boolean store(URL link,
                     ISDocumentInterface doc)

Stores the crawled document and its URL into the database

Specified by:: store in interface ISDBCrawlerInterface

Parameters:: link - the URL of the crawled document; doc - extracted terms and links from the document
Returns:: true, if the storage was successful; false otherwise.

openDB

public boolean openDB()

Initializes the internal database interface and opens its database connection

Specified by:: openDB in interface ISDBCrawlerInterface

Returns:: true, if the connection to the database was successful, false otherwise.

closeDB

public void closeDB()

Closes the database connection of the built-in database interface.

Specified by:: closeDB in interface ISDBCrawlerInterface

getDBInterface

public ISDBinterface getDBInterface()

Returns the built-in database interface of the crawler

Specified by:: getDBInterface in interface ISDBCrawlerInterface

Returns:: the database interface of the crawler

addLink

public void addLink(URL link)

Adds a new link to the URL queue, if the link is not yet visited.

Specified by:: addLink in interface ISCrawlerInterface

Parameters:: link - The URL link representation of the new target

getBest

public URL getBest()

Returns the best candidate to be visited next. The result must have the highest priority (in the sense of the selected ordering strategy) under all available links.

Specified by:: getBest in interface ISCrawlerInterface

Returns:: The best target to be visited by the Crawler next, null if the queue is empty.

getCrawlingDepth

public int getCrawlingDepth()

Returns the current maximum allowed crawling depth.

Specified by:: getCrawlingDepth in interface ISCrawlerInterface

Returns:: The current allowed craling depth.

getCurrentDocument

public ISDocumentInterface getCurrentDocument()

Returns the last document visited by the Crawler.

Specified by:: getCurrentDocument in interface ISCrawlerInterface

Returns:: The last visited document as object that implements ISDocumentInterface (and contains all extracted links, words and their stems); null if no documents were crawled yet.

getCurrentURL

public URL getCurrentURL()

Returns the last URL visited by the Crawler.

Specified by:: getCurrentURL in interface ISCrawlerInterface

Returns:: The last visited URL; null if no links were crawled yet.

getMaxQueueSize

public int getMaxQueueSize()

Returns the maximum allowed size of the URL Queue

Specified by:: getMaxQueueSize in interface ISCrawlerInterface

Returns:: The max allowed Queue size

getQueueSize

public int getQueueSize()

Returns the current size of the URL queue

Specified by:: getQueueSize in interface ISCrawlerInterface

Returns:: The current size of the URL queue.

getState

public int getState()

Returns the current state of the crawler. Possible states are RUNNING and STOPPED.

Specified by:: getState in interface ISCrawlerInterface

Returns:: The current state of the crawler, RUNNING oder STOPPED

isVisited

public boolean isVisited(URL doc)

Checks if the URL of the given document is already visited by the crawler.

Specified by:: isVisited in interface ISCrawlerInterface

Returns:: true if the engine was able to recognize the given URL as already visited, false.

setCrawlingDepth

public void setCrawlingDepth(int depth)

Sets the maximum allowed crawling depth.

Specified by:: setCrawlingDepth in interface ISCrawlerInterface

Parameters:: depth - The maximum allowed craling depth.

setQueueMaxSize

public void setQueueMaxSize(int m)

Set the maximum allowed size of the URL queue

Specified by:: setQueueMaxSize in interface ISCrawlerInterface

Parameters:: m - The maximum allowed Queue size

start

public void start()

Starts the thread of the crawler and changes the engine state to RUNNING

Specified by:: start in interface ISCrawlerInterface

stop

public void stop()

Stops the crawler. This method stops crawling and sets the engine status to STOPPED.

Specified by:: stop in interface ISCrawlerInterface

reset

public void reset()

Resets the crawler. This method stops the crawling, resets the URL queue, and the list of visited links. Finally, it sets the crawler status to STOPPED,

Specified by:: reset in interface ISCrawlerInterface

run

public void run()

Description copied from interface: Runnable

When an object implementing interface Runnable is used to create a thread, starting the thread causes the object's run method to be called in that separately executing thread.

The general contract of the method run is that it may take any action whatsoever.

Specified by:: run in interface Runnable

See Also:: Thread.run()

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

ISSearch Class ISDBCrawler

dbinterface

ISDBCrawler

store

openDB

closeDB

getDBInterface

addLink

getBest

getCrawlingDepth

getCurrentDocument

getCurrentURL

getMaxQueueSize

getQueueSize

getState

isVisited

setCrawlingDepth

setQueueMaxSize

start

stop

reset

run

ISSearch
Class ISDBCrawler