==================================================
CUSTOMIZATION RECOMMENDATIONS FOR BINGO! Framework
==================================================
This document contains some recommendations on possible customizations
for the BINGO! framework. It is NOT designed as a step-by-step guide
for trivial modifications; it's rather a collection of conceptual recommendations
for experienced Java developers.
THESE RECOMMENDATIONS ARE PROVIDED WITHOUT ANY EXPRESSED OR IMPLIED WARRANTIES.
IN NO EVENT SHALL THE DATABASE AND INFORMATION SYSTEMS RESEARCH
GROUP BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THESE
RECOMMENDATIONS, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
IMPORTANT: We highly recommend you to create ***BACKUPS** of all BINGO! sources
prior to apply any changes.
NOTE: If you use BINGO! in your scientific work, please cite as:
Sergej Sizov, Michael Biwer, Jens Graupmann, Stefan Siersdorfer,
Martin Theobald, Gerhard Weikum, Patrick Zimmer:
The BINGO! System for Information Portal Generation and Expert Web Search.
The 1st Semiannual Conference on Innovative Data Systems Research (CIDR),
Asilomar(CA), 2003
available at
http://www-db.cs.wisc.edu/cidr2003/program/p7.pdf
==================================================
1) Running BINGO! on non-Win32 platforms
2) Using the Oracle database via Oracle JDBC
3) Using another databases
4) Adding support for special document types
5) Customizations of the classifier
6) Crawler monitoring: Crawler Plug-Ins
7) Modifications of the database schema
8) Menu language customization
9) Stemmer language customization
10) Tokenizer customization
11) Performance tuning
12) Java virtual machine: memory, internal params
13) Adding support for non-http protocols.
14) Customization of the Crawler queue
15) Pattern-based excluding of URLs from crawl
16) Important BINGO! options
17) Adding new global options
18) Adding language support for new GUI menues
19) Customizations of feature spaces
==========================================================================
1) Running BINGO! on non-Win32 platforms
The current BINGO! release was tested on Win32 platform.
With minor changes described below, the framework can be adapted
to run on other operating systems (e.g. Linux)
- the core BINGO! implementation is 100% Java and runs without any changes on most JVMs that are compatible with Sun Java 1.4.* and higher.
- verify that execution rights for all shell scripts located in the root directory of your BINGO! installation are properly set.
On Unix systems, you can use the command 'chmod u+x filename' to enable execution of these files.
- the classifier of the engine uses the SVM modelling tool SVM*Light provided by Thorsten Joachims. The engine calls SVM*Light to build the classification model,
that is than imported into BINGO! from SVM*Light output file. You can obtain the latest compile for your OS as well as sources for SVM*Light from
http://svmlight.joachims.org.
- compile and build the program 'svm_learn' from SVM*Light package and place it into /temp directory of your BINGO! installation
- modify shell scripts '/temp/clean.bat' and '/temp/train.bat' that reside in the same directory to execute the program 'svm_learn'
- verify that execution rights for both shell scripts and the program 'svm_learn' are properly set .
- the engine uses Adobe PDF Filter to parse PDF files. The shell script /bingo/crawler/handler/FiltDump.bat
is used to call the external program /bingo/crawler/handler/FiltDump.exe. You can replace 'FiltDump.exe' by any platform-specific PDF parser
that can perform PDF filtering and produce plain ASCII output.
FiltDump.bat redirects this output into ASCII temp files that are used for further content processing. Finally, you will need to modify
FiltDump.bat and ensure that execution rights are properly set.
If there is no appropriate PDF parser (or PDF processing is not intended), you can delete files FiltDump.exe and FiltDump.bat. This will
automatically disable PDF processing.
2) Accessing the Oracle database via JDBC
the BINGO! framework was tested with both MySQL and Oracle databases. The initial configuration is set up to run with an MySQL instance.
If you want to run BINGO! with an Oracle database, you can reconfigure the framework:
- download the Oracle JDBC driver package and put the library into /lib directory of your BINGO! installation
- include this file (usually classes12.zip, oracle.jar or something similar) into your CLASSPATH setting in shell script files
r.bat, schema.bat, mini.bat and rebuild.bat that are located in the root directory of your BINGO! installation.
- create a new Oracle database instance (e.g. using Oracle management tools) and note its access parameters (hostname, port, service name, root username and password)
- verify SQL scripts /schema/schema_speed.sql and /schema/user.sql and adapt them to your database parameters (e.g. tablespace names), if necessary.
- run the script 'schema.bat' to create a new BINGO! user.
- edit the file /conf/config.xml and replace the setting mysql by oracle.
3) Using other database systems
Most components of BINGO! use SQL'92 compliant queries and
should work with any SQL'92 compatible database system (e.g DB2 or SQLServer).
However, there are some system-dependent
differences in LOB management, database infrastructure (e.g. Views), establishing of JDBC connections,
etc. that need to be reconsidered individually. In general, the following modifications are required:
- download the appropriate JDBC driver package and put the library into /lib directory of your BINGO! installation
- include this file(s) into your CLASSPATH setting in shell script files
r.bat, schema.bat, mini.bat and rebuild.bat that are located in the root directory of your BINGO! installation.
- create a new database instance (e.g. using vendor management tools) and note its access parameters (hostname, port, service name, root username and password)
- create a new database user with properly set access rights, and log in (using this account).
- execute queries from /schema/schema_speed.sql to create the BINGO! schema for this user
- edit the file /conf/config.xml and replace the setting *** by my-database-name.
- create a new class "bingo.db.MyDatabaseInterface extends bingo.db.DBInterface"
that should override all methods with non-SQL92 queries of the framework. You can use existing
classes bingo.db.MySQLInterface and bingo.db.OracleInterface as prototypes.
- edit the file /bingo/util/SessionBuffer.java. You need to modify the method getDBConnection() and add a new driver-specific
method that opens a new connection to your database. You can use existing methods getMySqlConnection() and getOracleConnection() as prototypes.
- rebuild the framework using the shell script rebuld.bat
4) Using additional handlers for special document types
BINGO! is equipped with so-called 'handler' classes that can process documents of particular MIME types. Currently the engine supports document types
text/html, text/plain, application/pdf, and application/xml. If you want to add routines for further mime types (e.g. PostScript or WinWord),
the following steps are required:
- carefully study the class "bingo.crawler.handler.CrawlHandler" to understand the basic handler architecture
- Create a new class "bingo.crawler.handler.MyHandler extends CrawlHandler" with the desired functionality. You can use existing classes
bingo.crawler.handler.HtmlHandler and bingo.crawler.handler.PDFHandler as prototyps.
- Modify the class "bingo.crawler.handler.HandlerManager" and include the handling routine for your mime type.
- Modify the file /bingo/data/allowed_mimes.dat and add the standard name of your mime type and the maximum allowed download size.
- rebuild the framework using the shell script rebuild.bat
5) Classifier customizations
the BINGO! framework uses the linear SVM model to classify crawled documents. Each topic of the ontology contains
its own linear SVM classifier that is based on training samples from that node ant its children (positive examples)
and all opposite nodes on the same level (negative training samples). Furthermore, each node may contain additional
manually selected negative examples (documents with status 'J'=Junk), that are used to improve the classifier quality.
To add another type of the classifier (e.g. Naive Bayes), you need to perform following steps:
- carefully study the classes "bingo.svmlight.ModelBuilder" and "bingo.svmlight.NodeModel" to realize the basic classifier architecture.
- create new classes "bingo.svmlight.MyModelBuilder" and "bingo.svmlight.MyNodeModel" with desired functionality.
- modify the class "bingo.util.BingoTreeNode" (represents the topic of the BINGO! ontology) and replace carefully the current SVMClassifierModel by MyClassifierModel.
- please note that classification scores returned by the classifier are also used to order links on the crawl frontier (higher scores = higher priority). Improper setting
of classification scores in the classifier may thus infer the thematical focusing of the crawler.
- rebuild the framework using the shell script rebuld.bat
If you intend to tune the default SVM classifier of BINGO rather than to implement a new one, following steps might be useful:
- verify the script "/temp/train.bat" that is used to execute the external SVM*Light training routine.
For instance, you can add or modify SVM*Light flags to influence the training procedure:
Learning options:
-z {c,r,p} -> select between classification (c), regression (r),
and preference ranking (p) (default classification)
-c float -> C: trade-off between training error
and margin (default [avg. x*x]^-1)
-w [0..] -> epsilon width of tube for regression
(default 0.1)
-j float -> Cost: cost-factor, by which training errors on
positive examples outweight errors on negative
examples (default 1) (see [4])
-b [0,1] -> use biased hyperplane (i.e. x*w+b>0) instead
of unbiased hyperplane (i.e. x*w>0) (default 1)
-i [0,1] -> remove inconsistent training examples
and retrain (default 0)
Kernel options:
-t int -> type of kernel function:
0: linear (default)
1: polynomial (s a*b+c)^d
2: radial basis function exp(-gamma ||a-b||^2)
3: sigmoid tanh(s a*b + c)
4: user defined kernel from kernel.h
-d int -> parameter d in polynomial kernel
-g float -> parameter gamma in rbf kernel
-s float -> parameter s in sigmoid/poly kernel
-r float -> parameter c in sigmoid/poly kernel
-u string -> parameter of user defined kernel
Optimization options (see [1]):
-q [2..] -> maximum size of QP-subproblems (default 10)
-n [2..q] -> number of new variables entering the working set
in each iteration (default n = q). Set n size of cache for kernel evaluations in MB (default 40)
The larger the faster...
-e float -> eps: Allow that error for termination criterion
[y [w*x+b] - 1] >= eps (default 0.001)
-h [5..] -> number of iterations a variable needs to be
optimal before considered for shrinking (default 100)
-f [0,1] -> do final optimality check for variables removed
by shrinking. Although this test is usually
positive, there is no guarantee that the optimum
was found if the test is omitted. (default 1)
- in addition, you can verify the parametrization of the resulting linear SVM classifier in the class "bingo.util.BingoTreeNode",
method "calcAvgSVMScore()". After the new SVM model has been created, the system classifies its training data using new classifier. The scores of new incoming
documents are normalized by the average score of training data (to make answers from particular node classifiers of the tree comparable).
to make classifiers more restrictive, the document is considered as positively classified only when its score ist higher than the threshold
currently defined as
minSVMTreshold = avgSVMScore * 0.05;
You may want to adapt this setting according to your needs.
6) Crawler monitoring: Crawler Plug-Ins
The Crawler of the BINGO! framework supports registraton of so-called callback objects that can be notified about ist state changes and particular crawling events
b(download, classification, failures, etc.). This mechanism is used to monitor the crawl in BINGO! GUI applications and for some other purposes (e.g. import of new
training data). Following steps are required to implement new 'Listener' objects:
- create a new class "bingo/crawler/MyListener implements LinkListener" or
"bingo/crawler/MyCrawlListener implements CrawlListener".
- use the Crawler methods (class "/bingo/crawler/BINGOCrawler") addCrawlListener(), addLinkListener(), removeCrawlListener() and removeLinkListener()
to register and unregister new listeners.
- when the state of the crawler changes, the listener will be automatically notified about this event by
calling ist callback routines as contained in respective interface definitions.
7) Modifications to database schema and its parameters
You can modify the BINGO! database by editing SQL scripts in "/schema" directory. These scripts are used by automated
routines for MySQL and Oracle databases to create a new user. When you intend to use another database, you can add
classes for new automated schema generator to the package "schema" (directory "/schema") using existing routines for
MySQL and Oracle as prototypes. Also you can create the schema "by hand", executing queries from SQL scripts one by one.
- to modify the database schema, edit the SQL scripts "/schema/schema_speed_mysql.sql" (MySQL), "/schema/schema_speed.sql" (Oracle),
or create your own new script using these files as prototypes.
- to modify the indexes of the schema, edit SQL scripts "/schema/index_create.sql" and "/schema/index_drop.sql".
- to modify quick-start account information in the initial LoginDialog, edit the file "/bingo/data/accounts.dat".
This textfile contains stored connection parameters for BINGO! users that were created using automated BINGO! schema tools.
Connections are stored line by line using following format:
"user;password;hostname;database_name" (MySQL)
"user;password;hostname;service_name" (Oracle)
8) Language customization
The GUI of BINGO! framework supports English and German interface languages. The stemmer of the BINGO! engine provides stopword lists and stemming routines for
English and German languages.
To add support for additional GUI language, you need to perform following steps:
- create a new file "/conf/mylanguage.xml" with formatted GUI messages translated into the language of your choice.
You can use existing files "/conf/english.xml" and "/conf/german.xml" as prototypes.
- modify the file "/conf/config.xml" and replace "***" by "mylanguage".
- modify the "cDLanguageSelect" element in files "/conf/german.xml", "/conf/english.xml", and "/conf/mylanguage.xml"
and add the name of new language: deutsch;english;mylanguage
- recompilation is not required
9) Stemmer language customization
The current version of BINGO! comes up with stemming algorithms for English and German. Internally,
we use Snowball stemmers (written by Martin Porter, author of the well-known Porter stemmer)
to process tokens. Thus, the simplest way to add stemming for a new language is to call the
desired language-specific Snowball stemmer. The Snowball package included in our release supports
following languages: danish, dutch, english, finnish, french, german, italian, norwegian, portugese,
russian, spanish, swedish.
- the documentation and newest versions of Snowball stemmers can be found at
http://snowball.tartarus.org/
- create a new directory /bingo/data/mylanguage. The name "mylanguage" MUST exactly match the name of
corresponding Snowball stemmer. See the included library file "/lib/stemmer.jar"
and its package "net.sf.snowball.ext" for details.
- create a plain text file "/bingo/data/mylanguage/stopwords.txt" that should contain stopwords for the
desired language (you can get stopword files for supported languages from the Snowball homepage).
- modify the BINGO! configuration file "/conf/config.xml":
replace *****
by mylanguage
- modify language configuration files "/conf/english.xml" and "/conf/german.xml".
Replace the option german;english by
german;english;mylanguage
- recompilation is not required
10) Tokenizer customization
The tokenizer is used to split the text of the current document (represented by a character buffer)
into particular tokens. The current tokenization procedure of BINGO! is quite
simple and straightforward:
- Normalization. German-specific characters (umlaut-characters with diaeresis and eszett=long-S) are transcribed by
(ae, oe, ue, and ss).
- Tokenization. Characters a..z,A..Z are threated as symbols, all others as delimiters.
Since the current version of Java "StreamTokenizer" class has unfortunately small hidden bugs
(it is NOT possible to threat some special characters as delimiters), we use a simple
self-implemented tokenization routine.
- Stopword elimination. The system-wide stopword list is used to remove language-specific non-relewant words.
- Stemming. The current language-specific stemmer is applied.
To modify the tokenization algorithm (you may want to add support for numbers that are currently threated as whitespaces,
set up stopword elimination mechanism to be applied on tokens before or after stemming, or transcript language-specific
special characters), following steps are required:
- carefully study and modify the class "/bingo/crawler/handler/parser/StemmerDriver" according to your requirements.
- verify the method "/bingo/util/SessionBuffer.setStopwords()". Uncomment the statement
"token=stem(token)" if the stopword elimination should be done on word stems rather than
words (this change will lead to stronger filtering of potential stopwords);
- rebuild the framework using the shell script rebuld.bat
- NOTE: String replacements use Java regular expressions that are applied to each(!) extracted word.
Big amount of replacement patterns may cause performance problems.
11) Performance tuning
In order to optimize the crawler performance, we recommend to verify and set up crawling parameters according
to your current demands. The most important settings that directly influence the engine performane are:
- the amount of shared memory available to the Java virtual machine for BINGO! application. See Question 12) to
learn more about JVM parameter settings.
- the number of crawling threads. Higher numbers of crawler threads may help to increase the overall crawling speed.
Please keep in mind that each thread maintains its own database connecton, so the database must be set up to allow
the expected number of parallel connections simultaneously. Furthermore, parallel in-memory processing of multiple
documents may rapidly increase the overall memory consumption of the framework. You may use the BINGO! GUI
(section "Options") or directly modify the configuration file "conf/config.xml",
option "**" to change this parameter.
- the number of allowed parallel connections to the same host. Currently, the BINGO! crawler is configured to
allow only a limited nubmer of parallel connections to the same host. The default value is currently set to 4 to
avoid denial-of-service problems on particular Web servers. However, if it is intended to scan completely one
large-scale Web service with guaranteed performance (e.g Amazon or DBLP), this parameter can be increased for
higher processing speed. The appropriate locking mechanism resides in the class "bingo.crawler.frontier.URL_Queue".
You can modify the attribute "max_locks" of the class "bingo.crawler.frontier.URL_Queue" to change this setting.
- the number of links from each document to follow. This option is useful to avoid the queue overflow with useless
links from banner sites and faked hubs with thousands of (mostly useless) links. The default value is 5. You may use
the BINGO! GUI (section "Options") or directly modify the configuration file "conf/config.xml",
option "*" to change this parameter.
- maximum allowed crawling depth. For interconnected communities (e.g. Computer Science), the
characteristic path length is usually small (in order of 10). In some cases, it migh be useful to increase this value
- for instance, when the topic of interest is widely spread over the internet and particular pages are usually not
directly connected to each other. The default value of this parameter is 10. You may use
the BINGO! GUI (section "Options") or directly modify the configuration file "conf/config.xml",
option "10" to change this parameter.
- BINGO! GUI components. In large-scale crawl experiments, running of some GUI components (in particular, animated
link structure and the overview of crawled documents) may slow down the crawling speed. These components are
designed primarily for short demos; we recommend to switch out animations in long-term crawling sessions.
- data to be stored. BINGO! stores by default short document descriptions, extracted links and features into the database.
In addition, you can enable the storage of "raw" document sources (as BLOB). Although this option is useful for some
applications, it would increase the database size and reduce crawling speed. By default, the LOB storage is set to OFF.
You can enable this option using the BINGO! GUI (section "Options") or directly modify the configuration file "conf/config.xml",
setting "true".
12) Java virtual machine: memory, internal params
- We recommend to run the BINGO! framework with sufficient amount of reserved JVM shared memory to avoid allocation problems
at the runtime. For large-scale experiments, it is recommended to provide at least 500 Mb shared memory at startup. Using
Sun JVM, you can force the allocation of shared memory using JVM flags -Xmx and -Xms:
Example: java -Xmx500M -Xms500M myClass
- BINGO! uses advanced parameters of Sun JVM to optimize DNS lookups and HTTP network connections.
Following entries in the class "bingo.crawler.BINGOCrawler" can be modified:
java.security.Security.setProperty("networkaddress.cache.ttl", "300");
java.security.Security.setProperty("networkaddress.cache.negative.ttl", "60");
System.setProperty("networkaddress.cache.ttl", "300");
System.setProperty("networkaddress.cache.negative.ttl", "60");
System.setProperty("http.agent", "Mozilla/4.0 (compatible; MSIE 6.01; Windows NT 5.0)");
System.setProperty("http.keepAlive", "true");
System.setProperty("http.maxConnections", "4");
System.setProperty("sun.net.client.defaultConnectTimeout", "10000");
System.setProperty("sun.net.client.defaultReadTimeout", "60000");
System.setProperty("sun.net.inetaddr.ttl", "300");
System.setProperty("sun.net.inetaddr.negative.ttl", "60");
In case of other JVMs, changing these parameters will have no function. Please refer to the vendor's documentation
on analogous parameter settings for your system.
13) Adding support for non-http protocols.
The Crawler of the BINGO! framework is currently set up to handle URLs via "http" protocol.
URLs that require another protocols (e.g. "ftp://myhost.net/file.pdf"), will be rejected.
You can add support for additional protocols (e.g. "file"):
- carefully study the class "bingo/crawler/frontier/url2resolve".
- Modify its constructor "url2resolve()" to add the desired protocol to the list of allowed protocols
"url2resolve.allowedProtocols".
- Modify its method "url2resolve.getConnection()" to properly handle the new protocol.
- rebuild the framework using the shell script rebuld.bat
14) Customization of the Crawler queue
The queue is an important component of the focused crawler that is responsible for proper ordering of links
on the crawl frontier. You may want to adapt following queue parameters:
- the queue size. The size of the crawler's sorted queue is limited to avoid memory overflows. When the maximum allowed
queue size is reached, new links can be still accepted by replacing lower-rated candidates at the bottom of the queue.
Otherwise, they will be ignored. BINGO! maintains separate queues for particular ontology topics, so the selected value will
be used for every topic queue. Small values (in order of 1000 or less) may cause preliminary loss of focus. High values
(1.000.000 and above) may cause sytem overload.
This tuning parameter can be modified in the configuration file "/conf/config.xml": myvalue
as well as from BINGO GUI (section "Options").
- the URL ordering. Basically, URLs in the queue (objects "bingo.util.BingoDocument") are ordered by their priority attribute
that can be accessed using functions BingoDocument.getPriority() and BingoDocument.setPriority(). The queue
sorts links in descending order: greater priority value means higher priority. Currently, the priority value for each new link
is assigned within routines of the class "bingo.crawler.handler.LinkHandler" according to its normalized SVM score. For
rejected documents, the priority is set to the half of its predecessor priority (to enable tunneling). You can easily modify
this policy by appropriate changes within the class "bingo.crawler.handler.LinkHandler".
- implementation of the sorted queue. The current URL queue vor each class is implemented on top of an Java "ArrayList" object,
backed by an array. You can replace the queue backbone (e.g. using linked list or TreeMap objects), according to your expected
read/update pattern. In this case, you will need to modify the class "bingo.crawler.frontier.ClassQueue" and carefully adapt
particular insert/remove/lookup routines.
NOTE: the built-in Java container object with sorted access (TreeMap) may cause problems on systems with high load using Sun JVM 1.4.2.
We observed that removed objects (TreeMap.delete()) sometimes are NOT immediately deleted from TreeMap and can be retrieved twice or even multiple
times. This would cause violations of DB integrity constraints (document-IDs have to be unique).
15) Pattern-based excluding of URLs from crawl
In some cases it is useful to "lock" (exclude) particular domains or URL patterns from current crawl: banner servers,
irrelevant portals or private homepages might be potential candidates.
- carefully study the file "bingo/crawler/handler/UrlVerifier".
- add or remove desired "bad" patterns to the array "UrlVerifier.forbiddenMatches".
- NOTE: Pattern-based locking uses String-based Java regular expressions that are applied to each(!) target URL.
Big amount of complex patterns may cause performance problems.
- the IP-based locking of particular hosts/domains is currently NOT supported. You can modify the classes
"bingo/crawler/handler/UrlVerifier" and "bingo/crawler/frontier/url2resolve"
to add the desired functionality.
- rebuild BINGO!
16) Important BINGO! options
17) Adding new global options
In some cases, you will need to extend BINGO! by some new global options. In general, following steps
are required to integrate new global options into BINGO!:
- edit two XML files "/conf/config.xml" **and** "/bingo/data/default.dat" (this file replaces the customized config.xml
file when the user hits the Button "RestoreDefaults" in BINGO! Options) and add tags for new global parameter
- edit class "bingo.util.SessionBuffer" and add the new static parameter variable and
appropriate get/set access methods
- modify its methods "initSettings()" and "saveConfig" and add read/store support for the new parameter
- modify class "/bingo/crawler/ControlDialog" (BINGO! Options GUI) and add GUI support for new parameter. Don't
forget to add action listerners for new GUI elements. Modify methods "initStandard()" and
"commitsettings()" for proper initialization and postprocessing of new GUI elements.
18) Adding language support for new GUI menues
The current BINGO! release supports menu languages English and German.
In order to add support for new languages, you need to create a new language schema
for all system messages.
Carefully study language files "conf/german.xml" and "conf/english.xml". Copy one of them
and translate all messages into desired language. Store new file as
"conf/mylanguage.xml".
Edit all "conf/language.xml" files (including the new one) and add the name of new language
into attribute 'cDLanguageSelect':
deutsch;english;mylanguage
19) Customizations of feature spaces
To verify and customize feature spaces for each topic, you can use the collection of JSP servlets
'Bingo Reviser'. The simplest way to install 'BingoReviser' is to copy JSP Files and Java classes of its distribution
into appropriate directories of an existing JSP repository (e.g., 'jsp-examples'). The root page of the Reviser
is called 'bingo_feed_start.htm' and can be accessed in our example via
http://hostname:8080/jsp-examples/bingo/bingo_feed_start.htm (depending on your custom settings, the port and the
directory of this location may be different).
The 'Administration'-page of the Reviser contains the link to 'Feature Reviser' routines. You can process feature spaces
of particular topics using a set of filtering rules and store verified results back into database. The 'positively' marked
features are used by BINGO! for restrictive filtering of new documents: every candidate must contain at least a specified number
of 'good' topic-specifuc features to qualify for this topic.
Appropriate settings for term-based classifier restrictivity can be accessed via BINGO! GUI
(Menu 'Global Settings->Settings->Crawler') or by editing the configuration file 'conf/config.xml',
options and .
================================================================================