Oracle Text Reference Release 9.2 Part Number A96518-01 |
|
This chapter describes the SQL statements and Oracle Text operators you use for creating and managing Text indexes and performing Text queries.
The following statements are described in this chapter:
Note: This section describes the For a complete description of the |
Use ALTER
INDEX
to perform the following maintenance tasks for a CONTEXT
, CTXCAT
, or CTXRULE
index:
CTXCAT
indextype. See REBUILD Syntax.Use the following syntax to rename an index or index partition:
ALTER INDEX [schema.]index_name RENAME TO new_index_name; ALTER INDEX [schema.]index_name RENAME PARTITION part_name TO new_part_name;
Specify the name of the index to rename.
Specify the new name for schema.index.
The new_index_name
parameter can be no more than 25 bytes. If you specify a name longer than 25 bytes, Oracle returns an error and the renamed index is no longer valid.
Specify the name of the index partition to rename.
Specify the new name for partition.
The following syntax is used to rebuild the index, rebuild an index partition, resume a failed operation, perform batch DML, add stopwords to index, add sections and stop sections to index, or optimize the index:
ALTER INDEX [schema.]index REBUILD [PARTITION partname] [ONLINE] [PARAMETERS (paramstring)][PARALLEL N];
Rebuilds the index partition partname
. Only one index partition can be built at a time.
When you rebuild a partition you can specify only sync, optimize full/fast, resume
or replace
in paramstring
. These operations work only on the partname
you specify. You cannot specify resume
when you rebuild partitions or a partitioned index.
To add a partition to the base table, use the ALTER
TABLE
SQL statement. When you add a partition to an indexed table, Oracle automatically creates the metadata for the new index partition. The new index partition has the same name as the new table partition. You can change the index partition name with ALTER
INDEX
RENAME
. To populate the new index partition, you must rebuild it with ALTER
INDEX
REBUILD
.
Splitting or merging a table partition with ALTER
TABLE
renders the index partition(s) invalid. You must rebuild them with ALTER
INDEX
REBUILD
.
Optionally specify the ONLINE
parameter for nonblocking operation, which allows the index to be queried during an ALTER
INDEX
synchronize or optimize operation.
You cannot use PARALLEL with ONLINE.
Note: You can specify replace or resume when rebuilding and index ONLINE, but you cannot specify replace or resume when rebuilding and index partition ONLINE. |
Optionally specify with n the parallel degree for parallel indexing. This parameter is supported only when you use sync, replace
, and resume in paramstring
. The actual degree of parallelism might be smaller depending on your resources.
Parallel indexing can speed up indexing when you have large amounts of data to index and when your operating system supports multiple CPUs.
You cannot use PARALLEL with ONLINE.
Optionally specify paramstring. If you do not specify paramstring, Oracle rebuilds the index with existing preference settings.
The syntax for paramstring is as follows:
paramstring = 'REPLACE [datastore datastore_pref] [filter filter_pref] [lexer lexer_pref] [wordlist wordlist_pref] [storage storage_pref] [stoplist stoplist] [section group section_group] [memory memsize] [index set index_set] | resume [memory memsize] | optimize [token index_token | fast | full [maxtime (time | unlimited)] | sync [memory memsize] | add stopword word [language language] | add zone section section_name tag tag | add field section section_name tag tag [(VISIBLE | INVISIBLE)] | add attr section section_name tag tag@attr | add stop section tag'
Rebuilds an index. You can optionally specify preferences, your own or system-defined.
You can only replace preferences that are supported for that index type. For instance, you cannot replace index set for a CONTEXT
or CTXRULE
index. Similarly, for the CTXCAT
index type, you can replace only lexer, wordlist, storage index set, and memory preferences.
See Also:
Chapter 2, "Indexing" for more information about creating and setting preferences, including information about system-defined preferences. |
Resumes a failed index operation. You can optionally specify the amount of memory to use with memsize.
Note: This |
Note: This To optimize your index, use |
Optimizes the index. Specify token
, fast,
or full
optimization. You typically optimize after you synchronize the index.
When you optimize in token
mode, Oracle optimizes only the index token index_token
in token
mode. Use this method of optimization to quickly optimize index information for specific words.
When you optimize in fast
mode, Oracle works on the entire index, compacting fragmented rows. However, in fast
mode, old data is not removed.
When you optimize in full
mode, you can optimize the whole index or a portion. This method compacts rows and removes old data (deleted rows).
Note: Optimizing in |
You use the maxtime
parameter to specify in minutes the time Oracle is to spend on the optimization operation. Oracle starts the optimization where it left off and optimizes until complete or until the time limit has been reached, whichever comes first. Specifying a time limit is useful for automating index optimization, where you set Oracle to optimize the index for a specified time on a regular basis.
When you specify maxtime unlimited
, the entire index is optimized. This is the default. When you specify 0 for maxtime
, Oracle performs minimal optimization.
Note: This |
Note: This To synchronize your index, use |
Synchronizes the index. You can optionally specify the amount of runtime memory to use with memsize. You synchronize the index when you have DML operations on your base table.
Note: This |
The memory parameter memsize specifies the amount of memory Oracle uses for the ALTER
INDEX
operation before flushing the index to disk. Specifying a large amount of memory improves indexing performance because there is less I/O and improves query performance and maintenance because there is less fragmentation.
Specifying smaller amounts of memory increases disk I/O and index fragmentation, but might be useful if you want to track indexing progress or when run-time memory is scarce.
Dynamically adds a stopword word to the index.
When your stoplist is a multi-language stoplist, you must specify language.
The index is not rebuilt by this statement.
Dynamically adds the zone section section_name identified by tag to the existing index.
The added section section_name applies only to documents indexed after this operation. For the change to take effect, you must manually re-index any existing documents that contain the tag.
The index is not rebuilt by this statement.
Note: This |
Dynamically adds the field section section_name identified by tag to the existing index.
Optionally specify VISIBLE
to make the field sections visible. The default is INVISIBLE
.
See Also:
|
The added section section_name applies only to documents indexed after this operation. For the change to affect previously indexed documents, you must explicitly re-index the documents that contain the tag.
The index is not rebuilt by this statement.
Note: This |
Dynamically adds an attribute section section_name to the existing index. You must specify the XML tag and attribute in the form tag@attr. You can add attribute sections only to XML section groups.
The added section section_name applies only to documents indexed after this operation. Thus for the change to take effect, you must manually re-index any existing documents that contain the tag.
The index is not rebuilt by this statement.
Note: This |
See Also:
"Add Section Constraints" in this section. |
Dynamically adds the stop section identified by tag to the existing index. As stop sections apply only to automatic sectioning of XML documents, the index must use the AUTO_SECTION_GROUP
section group. The tag you specify must be case sensitive and unique within the automatic section group or else ALTER
INDEX
raises an error.
The added stop section tag applies only to documents indexed after this operation. For the change to affect previously indexed documents, you must explicitly re-index the documents that contain the tag.
The text within a stop section is always searchable.
The number of stop sections you can add is unlimited.
The index is not rebuilt by this statement.
Note: This |
Before altering the index section information, Oracle checks the new section against the existing sections to ensure that all validity constraints are met. These constraints are the same for adding a section to a section group with the CTX_DDL
PL/SQL package and are as follows:
NULL_SECTION_GROUP
.The following statement resumes the indexing operation on newsindex
with 2 megabytes of memory:
ALTER INDEX newsindex REBUILD PARAMETERS('resume memory 2M');
The following statement rebuilds the index, replacing the stoplist preference with new_stop
.
ALTER INDEX newsindex REBUILD PARAMETERS('replace stoplist new_stop');
The following example creates a partitioned text table, populates it, and creates a partitioned index. It then adds a new partition to the table and then rebuilds the index with ALTER
INDEX
:
PROMPT create partitioned table and populate it create table part_tab (a int, b varchar2(40)) partition by range(a) (partition p_tab1 values less than (10), partition p_tab2 values less than (20), partition p_tab3 values less than (30)); insert into part_tab values (1,'Actinidia deliciosa'); insert into part_tab values (8,'Distictis buccinatoria'); insert into part_tab values (12,'Actinidia quinata'); insert into part_tab values (18,'Distictis Rivers'); insert into part_tab values (21,'pandorea jasminoides Lady Di'); insert into part_tab values (28,'pandorea rosea'); commit; PROMPT create partitioned index create index part_idx on part_tab(b) indextype is ctxsys.context local (partition p_idx1, partition p_idx2, partition p_idx3);
PROMPT add a partition and populate it alter table part_tab add partition p_tab4 values less than (40); insert into part_tab values (32, 'passiflora citrina'); insert into part_tab values (33, 'passiflora alatocaerulea'); commit;
The following statement rebuilds the index in the newly populated partition. In general, the index partition name for a newly added partition is the same as the table partition name, unless it is already been used. In this case, Oracle generates a new name.
alter index part_idx rebuild partition p_tab4;
The following statement queries the table for the two hits in the newly added partition:
select * from part_tab where contains(b,'passiflora') >0; The following statement queries the newly added partition directly: select * from part_tab partition (p_tab4) where contains(b,'passiflora') >0;
Optimizing your index with ALTER
INDEX
will not be supported in future releases. To optimize your index, use CTX_DDL.OPTIMIZE_INDEX
.
Synchronizing the index with ALTER
INDEX
will not be supported in future releases. To synchronize your index, use CTX_DDL.SYNC_INDEX
.
To add to the index the zone section author
identified by the tag <author>
, issue the following statement:
ALTER INDEX myindex REBUILD PARAMETERS('add zone section author tag author');
To add a stop section identified by tag <fluff>
to the index that uses the AUTO_SECTION_GROUP
, issue the following statement:
ALTER INDEX myindex REBUILD PARAMETERS('add stop section fluff');
Assume that the following text appears in an XML document:
<book title="Tale of Two Cities">It was the best of times.</book>
You want to create a separate section for the title attribute and you want to name the new attribute section booktitle
. To do so, issue the following statement:
ALTER INDEX myindex REBUILD PARAMETERS('add attr section booktitle tag title@book');
CTX_DDL.SYNC_INDEX in Chapter 7, "CTX_DDL Package"
CTX_DDL.OPTIMIZE_INDEX in Chapter 7, "CTX_DDL Package"
Note: This section describes the ALTER TABLE statement as it pertains to adding and modifying a partitioned text table with a context domain index. For a complete description of the ALTER TABLE statement, see Oracle9i SQL Reference. |
You can use ALTER TABLE to add, modify, split, merge, exchange, or drop a partitioned text table with a context domain index. The following sections describe some of the ALTER TABLE operations you can issue.
ALTER TABLE [schema.]table MODIFY PARTITION partition UNUSABLE LOCAL INDEXES
Marks the index partition corresponding to the given table partition UNUSABLE.
ALTER TABLE [schema.]table MODIFY PARTITION partition REBUILD UNUSABLE INDEXES
Rebuilds the index partition corresponding to the specified table partition that has an UNUSABLE status.
ALTER TABLE [schema.]table ADD PARTITION [partition] VALUES LESS THAN (value_list) [partition_description]
Adds a new partition to the high end of a range partitioned table.
To add a partition at the beginning or in the middle of the table, use ALTER TABLE SPLIT PARTITION.
The newly added table partition is always empty, and the context domain index (if any) status for this partition is always VALID. After doing DML, if you want to synchronize or optimize this newly added index partition, you must look up the index partition name, and then issue the ALTER INDEX REBUILD PARTITION command. For this newly added partition, index partition name is usually the same as the table partition name, but if the table partition name is already used by another index partition (as its name), system will assign a name in the form of SYS_Pn
.
By querying the USER_IND_PARTITIONS view and compare the HIGH_VALUE field, you can figure out the index partition name for the newly added partition.
ALTER TABLE [schema.]table MERGE PARTITIONS partition1, partition2 [INTO PARTITION [new_partition] [partition_description]]
Applies only to a range partition. This command merges the contents of two adjacent partitions into a new partition and then drops the original two partitions. If the resulting partition is non-empty, the corresponding local domain index partition is marked UNUSABLE. Users can use ALTER TABLE MODIFY PARTITION to rebuild the partition index.
The naming convention for the resulting index partition is the same as in ALTER TABLE ADD PARTITION.
ALTER TABLE [schema.]table SPLIT PARTITION partition_name_old AT (value_list) [into (partition_description, partition_description)] [prallel_clause]
Applies only to range partition. This command divides a table partition into two partitions, thus adding a new partition to the table. The local corresponding index partitions will be marked UNUSABLE if the corresponding table partitions are non-empty. You can use ALTER TABLE MODIFY PARTITION to rebuild the partition indexes.
The naming convention for the two resulting index partition is the same as in ALTER TABLE ADD PARTITION.
ALTER TABLE [schema.]table EXCHANGE PARTITION partition WITH TABLE table [INCLUDING|EXCLUDING INDEXES} [WITH|WITHOUT VALIDATION] [EXCEPTIONS INTO [schema.]table]
Converts a partition to a non-partitioned table and a table to a partition of a partitioned table by exchanging their data segments. Rowids are preserved.
If EXCLUDING INDEXES is specified, all the context indexes corresponding to the partition and all the indexes on the exchanged table are marked as UNUSABLE. To rebuild the new index partition this case, you can issue ALTER TABLE MODIFY PARTITION.
If INCLUDING INDEXES is specified, then for every local domain index on the partitioned table, there must be a non-partitioned domain index on the non-partitioned table. The local index partitions are exchanged with the corresponding regular indexes.
Field section queries might not work the same if the non-partitioned index and local index use different section id's for the same field section.
Storage is not changed. So if the index on the non-partitioned table $I table was in tablespace XYZ, then after the exchange partition it will still be in tablespace XYZ, but now it is the $I table for an index partition.
Storage preferences are not switched, so if you switch and then rebuild the index the table may be created in a different location.
Both indexes must be equivalent. They must use the same objects, same settings for each object. Note: we only check that they are using the same object. But they should use the same exact everything.
No index object can be partitioned, that is, when the user has used the storage object to partition the $I, $N tables.
If either index or index partition does not meet all these restrictions an error is raised and both the index and index partition will be INVALID. The user needs to manually rebuild both index and index partition using ALTER INDEX ... REBUILD and ALTER INDEX ... REBUILD.
ALTER TABLE [schema.]table TRUNCATE PARTITION [DROP|REUSE STORAGE]
Removes all rows from a partition in a table. Corresponding CONTEXT index partitions are also removed.
Use the CATSEARCH
operator to search CTXCAT
indexes. Use this operator in the WHERE
clause of a SELECT
statement.
The grammar of this operator is called CTXCAT. You can also use the CONTEXT grammar if your search criteria requires special functionality, such as thesaurus, fuzzy matching, proximity searching or stemming. To utilize the CONTEXT grammar, use the Query Template Specification in the text_query
parameter as described in this section.
You use the CATSEARCH
operator with a CTXCAT
index mainly to improve mixed query performance. You specify your text query condition with text_query
and your structured condition with structured_query
.
Internally, Oracle Text uses a combined b-tree index on text and structured columns to quickly produce results satisfying the query.
This operator does not support functional invocation.
CATSEARCH( [schema.]column, text_query VARCHAR2, structured_query VARCHAR2,RETURN NUMBER;
Specify the text column to be searched on. This column must have a CTXCAT
index associated with it.
Specify one of the following to define your search in column
.
The CATSEARCH
operator supports only the following query operations:
These operators have the following syntax:
Operation | Syntax | Description of Operation |
---|---|---|
Logical AND |
a b c |
Returns rows that contain a, b and c. |
Logical OR |
a | b | c |
Returns rows that contain a, b, or c. |
Logical NOT |
a - b |
Returns rows that contain a and not b. |
hyphen with no space |
a-b |
Hyphen treated as a regular character. For example, if the hyphen is defined as skipjoin, words such as web-site are treated as the single query term website. Likewise, if the hyphen is defined as a printjoin, words such as web-site are treated as web site in the |
" " |
"a b c" |
Returns rows that contain the phrase "a b c". For example, entering "Sony CD Player" means return all rows that contain this sequence of words. |
( ) |
(A B) | C |
Parentheses group operations. This query is equivalent to the |
(right and double truncated) |
a*b |
The wildcard character matches zero or more characters. For example, do* matches dog, and gl*s matches glass. Left truncation not supported. Note: Oracle recommends that you create a prefix index if your application uses wildcard searching. You set prefix indexing with the BASIC_WORDLIST preference. |
You specify a marked-up string that specifies a query based on the CONTEXT grammar. Use the following tags and attribute values which are case sensitive:
Specify the structured conditions and the ORDER
BY
clause. There must exist an index for any column you specify. For example, if you specify 'category_id=1 order by bid_close'
, you must have an index for 'category_id, bid_close'
as specified with CTX_DDL
.ADD_INDEX
.
With structured_query
, you can use standard SQL syntax with only the following operators:
The following statement creates the table to be indexed.
CREATE TABLE auction (category_id number primary key, title varchar2(20), bid_close date);
The following table inserts the values into the table:
INSERT INTO auction values(1, 'Sony CD Player', '20-FEB-2000'); INSERT INTO auction values(2, 'Sony CD Player', '24-FEB-2000'); INSERT INTO auction values(3, 'Pioneer DVD Player', '25-FEB-2000'); INSERT INTO auction values(4, 'Sony CD Player', '25-FEB-2000'); INSERT INTO auction values(5, 'Bose Speaker', '22-FEB-2000'); INSERT INTO auction values(6, 'Tascam CD Burner', '25-FEB-2000'); INSERT INTO auction values(7, 'Nikon digital camera', '22-FEB-2000'); INSERT INTO auction values(8, 'Canon digital camera', '26-FEB-2000');
The following statements create the CTXCAT
index:
begin ctx_ddl.create_index_set('auction_iset'); ctx_ddl.add_index('auction_iset','bid_close');end; CREATE INDEX auction_titlex ON auction(title) INDEXTYPE IS CTXCAT PARAMETERS ('index set auction_iset'); Query the Table
A typical query with CATSEARCH
might include a structured clause as follows to find all rows that contain the word camera ordered by bid_close
:
SELECT * FROM auction WHERE CATSEARCH(title, 'camera', 'order by bid_close desc')> 0; CATEGORY_ID TITLE BID_CLOSE ----------- -------------------- --------- 8 Canon digital camera 26-FEB-00 7 Nikon digital camera 22-FEB-00
The following query finds all rows that contain the phrase Sony CD Player and that have a bid close date of February 20, 2000:
SELECT * FROM auction WHERE CATSEARCH(title, '"Sony CD Player"', 'bid_ close=''20-FEB-00''')> 0; CATEGORY_ID TITLE BID_CLOSE ----------- -------------------- --------- 1 Sony CD Player 20-FEB-00
The following query finds all rows with the terms Sony and CD and Player:
SELECT * FROM auction WHERE CATSEARCH(title, 'Sony CD Player', 'order by bid_ close desc')> 0; CATEGORY_ID TITLE BID_CLOSE ----------- -------------------- --------- 4 Sony CD Player 25-FEB-00 2 Sony CD Player 24-FEB-00 1 Sony CD Player 20-FEB-00
The following query finds all rows with the term CD and not Player:
SELECT * FROM auction WHERE CATSEARCH(title, 'CD - Player', 'order by bid_close desc')> 0; CATEGORY_ID TITLE BID_CLOSE ----------- -------------------- --------- 6 Tascam CD Burner 25-FEB-00
The following query finds all rows with the terms CD or DVD or Speaker:
SELECT * FROM auction WHERE CATSEARCH(title, 'CD | DVD | Speaker', 'order by bid_close desc')> 0; CATEGORY_ID TITLE BID_CLOSE ----------- -------------------- --------- 3 Pioneer DVD Player 25-FEB-00 4 Sony CD Player 25-FEB-00 6 Tascam CD Burner 25-FEB-00 2 Sony CD Player 24-FEB-00 5 Bose Speaker 22-FEB-00 1 Sony CD Player 20-FEB-00
The following query finds all rows that are about audio equipment:
SELECT * FROM auction WHERE CATSEARCH(title, 'ABOUT(audio equipment)', NULL)> 0;
PROMPT PROMPT fuzzy: query = ?test PROMPT should match all fuzzy variations of test (e.g. text) select pk||' ==> '||text from test where catsearch(text, '<query> <textquery grammar="context"> ?test </textquery> <score datatype="integer"/> </query>','')>0 order by pk; PROMPT PROMPT fuzzy: query = !sail PROMPT should match all soundex variations of bot (e.g. sell) select pk||' ==> '||text from test where catsearch(text, '<query> <textquery grammar="context"> !sail </textquery> <score datatype="integer"/> </query>','')>0 order by pk; PROMPT PROMPT theme (ABOUT) query PROMPT query: about(California) select pk||' ==> '||text from test where catsearch(text, '<query> <textquery grammar="context"> about(California) </textquery> <score datatype="integer"/> </query>','')>0 order by pk;
Syntax for CTXCAT Indextype in this chapter.
Oracle Text Application Developer's Guide
Use the CONTAINS
operator in the WHERE
clause of a SELECT
statement to specify the query expression for a Text query.
CONTAINS
returns a relevance score for every row selected. You obtain this score with the SCORE operator.
The grammar for this operator is called CONTEXT. You can also use CTXCAT grammar if your application works better with simpler syntax. To do so, use the Query Template Specification in the text_query
parameter as described in this section.
CONTAINS( [schema.]column, text_query VARCHAR2 [,label NUMBER]) RETURN NUMBER;
Specify the text column to be searched on. This column must have a Text index associated with it.
Specify one of the following:
All tags and attributes values are case-sensitive.
See Also:
Chapter 3, "CONTAINS Query Operators" for more information about the operators you can use in query expressions. |
Optionally specify the label that identifies the score generated by the CONTAINS
operator.
For each row selected, CONTAINS
returns a number between 0 and 100 that indicates how relevant the document row is to the query. The number 0 means that Oracle found no matches in the row.
The following example searches for all documents in the in the text
column that contain the word oracle. The score for each row is selected with the SCORE
operator using a label of 1:
SELECT SCORE(1), title from newsindex WHERE CONTAINS(text, 'oracle', 1) > 0;
The CONTAINS
operator must always be followed by the > 0 syntax, which specifies that the score value calculated by the CONTAINS
operator must be greater than zero for the row to be selected.
When the SCORE
operator is called (e.g. in a SELECT
clause), the CONTAINS
clause must reference the score label value as in the following example:
SELECT SCORE(1), title from newsindex WHERE CONTAINS(text, 'oracle', 1) > 0 ORDER BY SCORE(1) DESC;
The following example specifies that the query be parsed using the CATSEARCH grammar:
SELECT id FROM test WHERE CONTAINS (text, '<query> <textquery lang="ENGLISH" grammar="CATSEARCH"> cheap pokemon </textquery> <score datatype="INTEGER"/> </query>'> 0;
With the multi-lexer preference, you can create indexes from multi-language tables.
At query time, the multi-lexer examines the session's language setting and uses the sub-lexer preference for that language to parse the query. If the language setting is not mapped, then the default lexer is used.
When the language setting is mapped, the query is parsed and run as usual. The index contains tokens from multiple languages, so such a query can return documents in several languages.
To limit your query to returning document of a given language, use a structured clause on the language column.
Oracle Text supports the CONTEXT
indexing and querying of a partitioned text table.
However, for optimal performance when querying a partitioned table with an ORDER
BY
SCORE
clause, query the partition. If you query the entire table and use an ORDER
BY
SCORE
clause, the query might not perform optimally unless you include a range predicate that can limit the query to a single partition.
For example, the following statement queries the partition p_tab4 partition directly: select * from part_tab partition (p_tab4) where contains(b,'oracle') >0 ORDER BY SCORE;
The following example shows how to use the CTXCAT grammar in a CONTAINS query. The example creates CTXCAT and a CONTEXT index on the same table, and compares the query results:
PROMPT PROMPT create context and ctxcat indexes both with theme indexing on PROMPT create index tdrbqcq101x on test(text) indextype is ctxsys.context parameters ('lexer theme_lexer'); create index tdrbqcq101cx on test(text) indextype is ctxsys.ctxcat parameters ('lexer theme_lexer'); PROMPT PROMPT ***** San Diego *********** PROMPT ***** CONTEXT grammar *********** PROMPT ** should be interpreted as phrase query ** select pk||' ==> '||text from test where contains(text,'San Diego')>0 order by pk; PROMPT PROMPT ***** San Diego *********** PROMPT ***** CTXCAT grammar *********** PROMPT ** should be interpreted as AND query *** select pk||' ==> '||text from test where contains(text, '<query> <textquery grammar="CTXCAT">San Diego</textquery> <score datatype="integer"/> </query>')>0 order by pk; PROMPT PROMPT ***** Hitlist from CTXCAT index *********** select pk||' ==> '||text from test where catsearch(text,'San Diego','')>0 order by pk;
Syntax for CONTEXT Indextype in this chapter
Chapter 3, "CONTAINS Query Operators"
Oracle Text Application Developer's Guide
Note: This section describes the For a complete description of the |
Use CREATE
INDEX
to create an Oracle Text index. An Oracle Text index is an Oracle domain index of type CONTEXT
, CTXCAT,
CTXRULE or CTXXPATH
.
You must create an appropriate Oracle Text index to issue CONTAINS
, CATSEARCH
, or MATCHES
queries.
You can create the following types of Oracle Text indexes:
CONTEXT
index. This is an index on a text column. You query this index with the CONTAINS
operator in the WHERE
clause of a SELECT
statement. This index requires manual synchronization after DML. See Syntax for CONTEXT Indextype.CTXCAT
index. This is a combined index on a text column and one or more other columns.You query this index with the CATSEARCH
operator in the WHERE
clause of a SELECT
statement. This type of index is optimized for mixed queries. This index is transactional, automatically updating itself with DML to the base table. See Syntax for CTXCAT Indextype.CTXRULE
index. This is an index on a column containing a set of queries. You query this index with the MATCHES
operator in the WHERE
clause of a SELECT
statement. See Syntax for CTXRULE Indextype.CTXXPATH
index. Create this index when you need to speed up ExistsNode() queries on an XMLType column. See Syntax for CTXXPATH Indextype.You do not need the CTXAPP
role to create an Oracle Text index. If you have Oracle grants to create a b-tree index on the text column, you have sufficient permission to create a text index. The issuing owner, table owner, and index owner can all be different users, which is consistent with Oracle standards for creating regular B-tree indexes.
The creation of an Oracle Text index requires temporary tablespace belonging to the CTXSYS user. Insufficient tablespace results in the ORA-01652 error. To remedy, you extend the CTXSYS tablespace, not the tablespace of the issuing user. Generally, the size of the temporary tablespace required is between 50 and 200 percent of your data.
Use this indextype to create an index on a text column. You query this index with the CONTAINS
operator in the WHERE
clause of a SELECT
statement. This index requires manual synchronization after DML.
CREATE INDEX [schema.]index on [schema.]table(column) INDEXTYPE IS ctxsys.context [ONLINE] LOCAL [(PARTITION [partition] [PARAMETERS('paramstring')]
[, PARTITION [partition] [PARAMETERS('paramstring')]])]
[PARAMETERS(paramstring)] [PARALLEL n] [UNUSABLE];
Specify the name of the Text index to create.
Specify the name of the table and column to index.
Your table can optionally contain a primary key if you prefer to identify your rows as such when you use procedures in CTX_DOC
. When your table has no primary key, document services identifies your documents by ROWID
.
The column you specify must be one of the following types: CHAR
, VARCHAR
, VARCHAR2
, BLOB
, CLOB
, BFILE
, XMLType
, or URIType
.
DATE
, NUMBER
, and nested table columns cannot be indexed. Object columns also cannot be indexed, but their attributes can be, provided they are atomic data types.
Indexes on multiple columns are not supported with the CONTEXT
index type. You must specify only one column in the column list.
Note: With the |
Creates the index while allowing inserts/updates/deletes (DML) on the base table.
During indexing, Oracle Text enqueues DML requests in a pending queue. At the end of the index creation, Oracle Text locks the base table. During this time DML is blocked.
The following limitations apply to using ONLINE:
Specify LOCAL
to create a local partitioned context index on a partitioned table. The partitioned table must be partitioned by range. Hash, composite and list partitions are not supported.
You can specify the list of index partition names with partition. If you do not specify a partition name, the system assigns one. The order of the index partition list must correspond to the table partition by order.
The PARAMETERS clause associated with each partition specifies the parameters string specific to that partition. You can only specify memory and storage for each index partition.
You can query the views CTX_INDEX_PARTITIONS
or CTX_USER_INDEX_PARTITIONS
to find out index partition information, such as index partition name, and index partition status.
You cannot use the ONLINE parameter with this operation.
For optimal performance when querying a partitioned index with an ORDER
BY
SCORE
clause, query the partition. If you query the entire table and use an ORDER
BY
SCORE
clause, the query might not perform optimally unless you include a range predicate that can limit the query to a single partition.
See Also:
"Query Performance Limitation with a Partitioned Index" in this chapter under |
Optionally specify with n the parallel degree for parallel indexing. The actual degree of parallelism might be smaller depending on your resources.
You can use this parameter on non-partitioned tables. Creating a non-partitioned index in parallel does not turn on parallel query processing.
Using this parameter on a partitioned table results in serial indexing. To index a partitioned table in parallel, you must create an unusable index and then run the DBMS_PCLXUTIL.BUILD_PART_INDEX
utility.
See Also:
"Creating a Local Partitioned Index in Parallel" Performance Tuning chapter in Oracle Text Application Developer's Guide |
Parallel indexing can speed up indexing when you have large amounts of data to index and when your operating system supports multiple CPUs.
Note: Using PARALLEL on a local index turns on parallel queries even though the index is created serially. (Creating a non-partitioned index in parallel does not turn on parallel query processing.) Parallel querying degrades query throughput especially on heavily loaded systems. Because of this, Oracle recommends that you disable parallel querying after creating a local index. To do so, use ALTER INDEX NOPARALLEL. For more information on parallel querying, see the Performance Tuning chapter in Oracle Text Application Developer's Guide |
The following limitations apply to using PARALLEL:
See Also:
"Creating a Local Partitioned Index in Parallel" in this section to create a local partitioned index with true parallelism. |
Create an unusable index. This creates index meta data only and exists immediately.
You might create an unusable index when you need to create a local partitioned index in parallel.
Optionally specify indexing parameters in paramstring. You can specify preferences owned by another user using the user.preference notation.
The syntax for paramstring
is as follows:
paramstring = '[datastore datastore_pref] [filter filter_pref] [charset column charset_column_name] [format column format_column_name] [lexer lexer_pref] [language column language_column_name] [wordlist wordlist_pref] [storage storage_pref] [stoplist stoplist] [section group section_group] [memory memsize] [populate | nopopulate]'
You create datastore, filter, lexer, wordlist, and storage preferences with CTX_DDL.CREATE_PREFERENCE
and then specify them in the paramstring.
Note: When you specify no paramstring, Oracle uses the system defaults. For more information about these defaults, see "Default Index Parameters" in Chapter 2. |
Specify the name of your datastore preference. Use the datastore preference to specify where your text is stored.See Datastore Types in Chapter 2, "Indexing".
Specify the name of your filter preference. Use the filter preference to specify how to filter formatted documents to plain text or HTML. See Filter Types in Chapter 2, "Indexing".
Specify the name of the character set column. This column must be in the same table as the text column, and it must be of type CHAR
, VARCHAR
, or VARCHAR2
. Use this column to specify the document character set for conversion to the database character set. The value is case insensitive. You must specify an NLS character set string such as JA16EUC.
When the document is plain text or HTML, the INSO_FILTER
and CHARSET
filter use this column to convert the document character set to the database character set for indexing.
You use this column when you have plain text or HTML documents with different character sets or in a character set different from the database character set.
Note: Documents are not marked for re-indexing when only the charset column changes. The indexed column must be updated to flag the re-index. |
Specify the name of the format column. The format column must be in the same table as the text column and it must be CHAR
, VARCHAR
, or VARCHAR2
type.
The INSO_FILTER
uses the format column when filtering documents. Use this column with heterogeneous document sets to optionally bypass INSO filtering for plain text or HTML documents.
In the format column, you can specify one of the following
TEXT
indicates that the document is either plain text or HTML. When TEXT
is specified the document is not filtered, but might be character set converted.
BINARY
indicates that the document is a format supported by the INSO_FILTER
object other than plain text or HTML, such as PDF. BINARY
is the default if the format column entry cannot be mapped.
IGNORE
indicates that the row is to be ignored during indexing. Use this value when you need to bypass rows that contain data incompatible with text indexing such as image data.
Note: Documents are not marked for re-indexing when only the format column changes. The indexed column must be updated to flag the re-index. |
Specify the name of your lexer or multi-lexer preference. Use the lexer preference to identify the language of your text and how text is tokenized for indexing. See Lexer Types in Chapter 2, "Indexing".
Specify the name of the language column when using a multi-lexer preference. See MULTI_LEXER in Chapter 2, "Indexing".
This column must exist in the base table. It cannot be the same column as the indexed column. Only the first 30 bytes of the language column is examined for language identification.
Note: Documents are not marked for re-indexing when only the language column changes. The indexed column must be updated to flag the re-index. |
Specify the name of your wordlist preference. Use the wordlist preference to enable features such as fuzzy, stemming, and prefix indexing for better wildcard searching. See Wordlist Type in Chapter 2, "Indexing".
Specify the name of your storage preference for the Text index. Use the storage preference to specify how the index tables are stored. See Storage Types in Chapter 2, "Indexing".
Specify the name of your stoplist. Use stoplist to identify words that are not to be indexed. See CTX_DDL.CREATE_STOPLIST
in Chapter 7, "CTX_DDL Package".
Specify the name of your section group. Use section groups to create searchable sections in structured documents. See CTX_DDL.CREATE_SECTION_GROUP
in Chapter 7, "CTX_DDL Package".
Specify the amount of run-time memory to use for indexing. The syntax for memsize is as follows:
memsize = number[M|G|K]
where M stands for megabytes, G stands for gigabytes, and K stands for kilobytes.
The value you specify for memsize
must be between 1M and the value of MAX_INDEX_MEMORY
in the CTX_PARAMETERS
view. To specify a memory size larger than the MAX_INDEX_MEMORY
, you must reset this parameter with CTX_ADM.SET_PARAMETER
to be larger than or equal to memsize
.
The default is the value specified for DEFAULT_INDEX_MEMORY
in CTX_PARAMETERS
.
The memsize parameter specifies the amount of memory Oracle uses for indexing before flushing the index to disk. Specifying a large amount memory improves indexing performance because there are fewer I/O operations and improves query performance and maintenance since there is less fragmentation.
Specifying smaller amounts of memory increases disk I/O and index fragmentation, but might be useful when run-time memory is scarce.
Specify nopopulate to create an empty index. The default is populate.
Note: This is the only option whose default value cannot be set with |
Empty indexes are populated by updates or inserts to the base table. You might create an empty index when you need to create your index incrementally or to selectively index documents in the base table. You might also create an empty index when you require only theme and Gist output from a document set.
The following sections give examples of creating a CONTEXT
index.
The following example creates a CONTEXT
index called myindex
on the docs
column in mytable
. Default preferences are used.
CREATE INDEX myindex ON mytable(docs) INDEXTYPE IS ctxsys.context;
See Also:
For more information about default settings, see "Default Index Parameters" in Chapter 2. Also refer to Oracle Text Application Developer's Guide. |
The following example creates a CONTEXT
index called myindex
on the docs
column in mytable
. The index is created with a custom lexer preference called my_lexer
and a custom stoplist called my_stop
.
This example also assumes that these preferences were previously created with CTX_DDL.CREATE_PREFERENCE
for my_lexer
, and CTX_DDL.CREATE_STOPLIST
for my_stop
. Default preferences are used for the unspecified preferences.
CREATE INDEX myindex ON mytable(docs) INDEXTYPE IS ctxsys.context PARAMETERS('LEXER my_lexer STOPLIST my_stop');
Any user can use any preference. To specify preferences that exist in another user's schema, add the user name to the preference name. The following example assumes that the preferences my_lexer
and my_stop
exist in the schema that belongs to user kenny
:
CREATE INDEX myindex ON mytable(docs) INDEXTYPE IS ctxsys.context PARAMETERS('LEXER kenny.my_lexer STOPLIST kenny.my_stop');
The multi-lexer decides which lexer to use for each row based on a language column. This is a character column in the table which stores the language of the document in the text column. For example, you create the table globaldoc
to hold documents of different languages:
CREATE TABLE globaldoc ( doc_id NUMBER PRIMARY KEY, lang VARCHAR2(10), text CLOB );
Assume that global_lexer
is a multi-lexer preference you created. To index the global_doc
table, you specify the multi-lexer preference and the name of the language column as follows:
CREATE INDEX globalx ON globaldoc(text) INDEXTYPE IS ctxsys.context PARAMETERS ('LEXER global_lexer LANGUAGE COLUMN lang');
See Also:
For more information about creating multi-lexer preferences, see MULTI_LEXER in Chapter 2. |
This example shows how to set up parallel indexing. Do the following:
Create the index with a parallel degree. This example uses a parallel degree of 3.
CREATE INDEX myindex ON mytab(pk) INDEXTYPE IS ctxsys.context PARALLEL 3;
The following example creates a text table partitioned into three, populates it, and then creates a partitioned index.
PROMPT create partitioned table and populate it CREATE TABLE part_tab (a int, b varchar2(40)) PARTITION BY RANGE(a) (partition p_tab1 values less than (10), partition p_tab2 values less than (20), partition p_tab3 values less than (30));PROMPT create partitioned index CREATE INDEX part_idx on part_tab(b) INDEXTYPE IS CTXSYS.CONTEXT
LOCAL (partition p_idx1, partition p_idx2, partition p_idx3);
To create a local index in parallel, create an unusable index first, then run the DBMS_PCLXUTIL.BUILD_PART_INDEX
utility.
In this example, the base table has three partitions. We create a local partitioned unusable index first, the run the DBMS_PCLUTIL.BUILD_PART_INDEX
, which builds the 3 partitions in parallel (inter-partition parallelism). Also inside each partition, index creation is done in parallel (intra-partition parallelism) with a parallel degree of 2.
create index tdrbip02bx on tdrbip02b(text) indextype is ctxsys.context local (partition tdrbip02bx1, partition tdrbip02bx2, partition tdrbip02bx3) unusable; exec dbms_pclxutil.build_part_index(3,2,'TDRBIP02B','TDRBIP02BX',TRUE);
After a CREATE
INDEX
or ALTER
INDEX
operation, you can view index errors with Oracle Text views. To view errors on your indexes, query the CTX_USER_INDEX_ERRORS view. To view errors on all indexes as CTXSYS, query the CTX_INDEX_ERRORS view.
For example, to view the most recent errors on your indexes, you can issue:
SELECT err_timestamp, err_text FROM ctx_user_index_errors ORDER BY err_timestamp DESC;
To clear the index error view, you can issue:
DELETE FROM ctx_user_index_errors;
The CTXCAT index is a combined index on a text column and one or more other columns.You query this index with the CATSEARCH
operator in the WHERE
clause of a SELECT
statement. This type of index is optimized for mixed queries. This index is transactional, automatically updating itself with DML to the base table.
CREATE INDEX [schema.]index on [schema.]table(column) INDEXTYPE IS ctxsys.ctxcat [PARAMETERS
('[index set index_set] [lexer lexer_pref] [storage storage_pref] [stoplist stoplist] [wordlist wordlist_pref] [memory memsize]');
Specify the name of the table and column to index.
The column you specify when you create a CTXCAT
index must be of type CHAR
or VARCHAR2
. No other types are supported for CTXCAT
.
Specify the index set preference to create the CTXCAT
index. See Creating a CTXCAT Index example in this chapter.
Although a CTXCAT
index offers query performance benefits, creating the index has its costs. The time Oracle takes to create a CTXCAT
index depends on its total size, and the total size of a CTXCAT
index is directly related to
Having many component indexes in your index set also degrades DML performance since more indexes must be updated.
Because of these added costs in creating a CTXCAT
index, carefully consider the query performance benefit each component index gives your application before adding it to your index set.
See Also:
Oracle Text Application Developer's Guide for more information about creating |
When you create an index of type CTXCAT
, you can use only the following index preferences in the parameters
string:
Preference Class | Supported Types |
---|---|
Datastore |
None. |
Filter |
None |
Lexer |
BASIC_LEXER (index_themes attribute not supported) |
Wordlist |
|
Storage |
|
Stoplist |
Supports single language stoplists only (BASIC_STOPLIST type.) |
Section Group |
None |
When you create a CTXCAT
index, you cannot specify datastore, filter and section group preferences. You also cannot specify language, format, and charset columns as with a CONTEXT
index.
This section gives a brief example for creating a CTXCAT
index. For a more complete example, see the Oracle Text Application Developer's Guide.
Consider a table called AUCTION
with the following schema:
create table auction( item_id number, title varchar2(100), category_id number, price number, bid_close date);
Assume that queries on the table involve a mandatory text query clause and optional structured conditions on category_id
. Results must be sorted based on bid_close
and category_id
.
You can create a catalog index to support the different types of structured queries a user might enter. For structured queries, a CTXCAT
index improves query performance over a context index.
To create the indexes, first create the index set preference then add the required indexes to it:
begin ctx_ddl.create_index_set('auction_iset'); ctx_ddl.add_index('auction_iset','bid_close'); ctx_ddl.add_index('auction_iset','price, bid_close');end;
Create the CTXCAT
index with CREATE
INDEX
as follows:
create index auction_titlex on AUCTION(title) indextype is CTXSYS.CTXCAT parameters ('index set auction_iset');
To query the title column for the word pokemon, you can issue regular and mixed queries as follows:
select * from AUCTION where CATSEARCH(title, 'pokemon',NULL)> 0; select * from AUCTION where CATSEARCH(title, 'pokemon', 'price < 50 order by bid_close desc')> 0;
See Also::
Oracle Text Application Developer's Guide for a complete |
This is an index on a column containing a set of queries. You query this index with the MATCHES
operator in the WHERE
clause of a SELECT
statement.
CREATE INDEX [schema.]index on [schema.]table(column) INDEXTYPE IS ctxsys.ctxrule [PARAMETERS ('[lexer lexer_pref] [storage storage_pref]
[section group section_pref] [wordlist wordlist_pref]');
[PARALLEL n];
Specify the name of the table and column to index.
The column you specify when you create a CTXRULE
index must be VARCHAR2
or CLOB
. No other types are supported for CTXRULE
.
Specify the lexer preference to be used for processing the queries and the documents to be classified with the MATCHES
function. Currently, only the BASIC_LEXER lexer type is supported.
For processing queries, this lexer supports the following operators: ABOUT
, STEM
, AND
, NEAR
, NOT
, OR
, and WITHIN
.
The thesaural operators (BT*
, NT*
, PT
, RT
, SYN
, TR
, TRSYS
, TT
etc.) are supported. However, these operators are expanded using a snapshot of the thesaurus at index time, not when the MATCHES function is issued. This means that if you change your thesaurus after you index, you must re-index your query set
The following operators are not supported: ACCUM
, EQIUV
, MINUS
, WEIGHT
, THRESHOLD
, WILDCARD
, FUZZY
, and SOUNDEX
.
Specify the storage preference for the index on the queries.Use the storage preference to specify how the index tables are stored. See Storage Types in Chapter 2, "Indexing".
Specify the section group. This parameter does not affect the queries. It applies to sections in the documents to be classified. The following section groups are supported for the CTXRULE
indextype:
See Section Group Types in Chapter 2, "Indexing".
CTXRULE does not support special sections.
Specify the wordlist preferences. This is used to enable stemming operations on query terms.See Wordlist Type in Chapter 2, "Indexing".
See the Oracle Text Application Developer's Guide for a complete example of using the CTXRULE indextype in a document routing application.
Create this index when you need to speed up ExistsNode() queries on an XMLType column.
CREATE INDEX [schema.]index on [schema.]table(XMLType column) INDEXTYPE IS ctxsys.CTXXPATH [PARAMETERS ('[storage storage_pref] [memory memsize] [populate | nopopulate]')];
Specify the name of the table and column to index.
The column you specify when you create a CTXXPATH
index must be XMLType
. No other types are supported for CTXXPATH
.
Specify the storage preference for the index on the queries.Use the storage preference to specify how the index tables are stored. See Storage Types in Chapter 2, "Indexing".
Specify the amount of run-time memory to use for indexing. The syntax for memsize is as follows:
memsize = number[M|G|K]
where M stands for megabytes, G stands for gigabytes, and K stands for kilobytes.
The value you specify for memsize
must be between 1M and the value of MAX_INDEX_MEMORY
in the CTX_PARAMETERS
view. To specify a memory size larger than the MAX_INDEX_MEMORY
, you must reset this parameter with CTX_ADM.SET_PARAMETER
to be larger than or equal to memsize
.
The default is the value specified for DEFAULT_INDEX_MEMORY
in CTX_PARAMETERS
.
Specify nopopulate to create an empty index. The default is populate.
Note: This is the only option whose default value cannot be set with |
Empty indexes are populated by updates or inserts to the base table. You might create an empty index when you need to create your index incrementally or to selectively index documents in the base table.
Index creation on an XMLType column:
CREATE INDEX xml_index ON xml_tab(col_xml) indextype is ctxsys.CTXXPATH; or CREATE INDEX xml_index ON xml_tab(col_xml) indextype is ctxsys.CTXXPATH PARAMETERS('storage my_storage memory 40M');
Querying the table with ExistsNode:
select xml_id from xml_tab x where x.col_ xml.existsnode('/book/chapter[@title="XML"]') > 0;
See Also:
Oracle9i XML Developer's Guide - Oracle XML DB for information on using the CTXXPATH indextype. |
CTX_DDL.CREATE_PREFERENCE in Chapter 7, "CTX_DDL Package".
CTX_DDL.CREATE_STOPLIST in Chapter 7, "CTX_DDL Package".
CTX_DDL.CREATE_SECTION_GROUP in Chapter 7, "CTX_DDL Package".
Note: This section describes the For a complete description of the |
Use DROP
INDEX
to drop a specified Text index.
DROP INDEX [schema.]index [force];
Optionally force the index to be dropped.
The following example drops an index named doc_index
in the current user's database schema.
DROP INDEX doc_index;
Use force option when Oracle cannot determine the state of the index, such as when an indexing operation crashes.
Use this operator to find all rows in a query table that match a given document. The document must be a plain text, HTML, or XML document.
This operator requires a CTXRULE
index on your set of queries.
MATCHES
returns 1 for one or more matches and 0 for no match.
MATCHES does not support functional invocation
MATCHES( [schema.]column, document VARCHAR2 or CLOB,RETURN NUMBER;
Specify the column containing the indexed query set.
Specify the document to be classified. The document can be plain-text, HTML, or XML. Binary formats are not supported.
Assuming that a table querytable
has a CTXRULE
index associated with it, you can issue the following query that passes in a document string to be classified. The SELECT
statement returns all rows (queries) that are satisfied by the incoming document:
SELECT classification FROM querytable WHERE MATCHES(text, 'Smith is a common name in the United States') > 0;
Syntax for CTXRULE Indextype in this chapter.
Oracle Text Application Developer's Guide
Use the SCORE
operator in a SELECT
statement to return the score values produced by a CONTAINS query.
SCORE(label NUMBER)
Specify a number to identify the score produced by the query. You use this number to identify the score in the CONTAINS
clause.
The SCORE
operator can be used in a SELECT
, ORDER
BY
, or GROUP
BY
clause.
When the SCORE
operator is called (e.g. in a SELECT
clause), the CONTAINS
clause must reference the score label value as in the following example:
SELECT SCORE(1), title from newsindex WHERE CONTAINS(text, 'oracle', 1) > 0 ORDER BY SCORE(1) DESC;
Assume that a news database stores and indexes the title and body of news articles separately. The following query returns all the documents that include the words Oracle in their title and java in their body. The articles are sorted by the scores for the first CONTAINS
(Oracle) and then by the scores for the second CONTAINS
(java).
SELECT title, body, SCORE(10), SCORE(20) FROM news WHERE CONTAINS (news.title, 'Oracle', 10) > 0 OR
CONTAINS (news.body, 'java', 20) > 0 ORDER BY NVL(SCORE(10),0), NVL(SCORE(20),0);
|
Copyright © 1998, 2002 Oracle Corporation. All Rights Reserved. |
|