Skip Headers
Oracle® Text Applic ation Developer's Guide
10
g
Release 1 (10.1)
Part Number B10729-01
Home
Book List
Index
Master Index
Feedback
Next
< /td>
View PDF
Contents
List of Figures
List of Tables
Title and Copyright Information
Send Us Your Comments
Preface
Audience
Organization
Related Docum entation
Conventions
Documentation Accessibilit y
1
Oracle Te xt Application Development
1.1
What is Oracle Tex t?
1.2
Designing Your Application
1.3
Text Queries on Document Collections
1.3.1
Flowchart of Text Query Application
1.4
Queries on Catalog Information
1.4.1
Flowchart for Catalog Query Application
1.5
Document Classification
1.6
XML Searching
1.6.1
Using Oracle Text
1.6.2
Using the Oracle XML DB Framework
1.6.3 Combining Oracle Text features with Oracle XML DB
1.6.3.1
Using the Text-on-XML Method
1.6.3.2
Using the XML-on-Text Method
2
Getting Started with Oracle Text
2.1
Overview of Getting Started with Oracle Text
2.2
Creating an Oracle Text User
2.3
Query Application Quick Tour
2.3.1
Buildi ng Web Applications with the Oracle Text Wizard
2.3.1.1< /span> Oracle JDeveloper
2.3.1.2
Oracle Text Wizard Addins< /a>
2.3.1.3
Oracle Text Wizard Instructions
< /dl>
2.4
Catalog Application Quick Tour
2.5
Classification Application Quick Tour
2.5.1
Steps for Creating a Classification Application
3
Indexing
3.1
About Oracle Text Indexes
3.1.1
Type of Index
3.1.2
Structure of the Oracle Text CONTEXT Index
3.1.2.1
Merged Word and Them e Index
3.1.3
The Oracle Text Indexing Process
3.1.3.1
Datastore Object
3.1.3.2
Filter Object
3.1.3.3
Sect ioner Object
3.1.3.4
Lexer Object
3.1.3.5
Indexing Engine
3.1.4
Partitioned Tables and Indexes
3.1.4.1
Querying Partitioned Tables
3.1.5
Creating an Index On line
3.1.6
Parallel Indexing
3.1.7
Indexing and Views
3. 2
Considerations For Indexing
3.2.1
Location of Text
3.2.1.1
Supported Column Types
3.2.1.2
Storing Text in the Text Table
< span class="secnum">3.2.1.3 Storing File Path Names
3.2.1.4 Storing URLs
3.2.1.5
Storing Associated Document Information< /a>
3.2.1.6
Format and Character Set Columns
3.2.1.7
Supported Document Formats
3.2.1.8
Summary of DATASTORE Types
3.2.2 Document Formats and Filtering
3.2.2.1
No Filterin g for HTML
3.2.2.2
Filtering Mixed-Format Columns
< a href="ind.htm#sthref176">
3.2.2.3
Custom Filtering
3.2.3 Bypassing Rows for Indexing
3.2.4
Document Character Set
3.2.4.1
Mixed Character Set C olumns
3.2.5
Document Language
3.2.5.1
Languages Features Outside BASIC_LEXER
3.2.5.2
Indexing Multi-language Columns
3.2.6 Indexing Special Characters
3. 2.6.1
Printjoins Character
3.2.6.2
Skipjoins Character
3.2.6.3
Other Characters
3.2.7
Case-Sensitive Indexing and Querying
3.2.8
Language Specific Features
3.2.8.1 Indexing Themes
3.2.8.2
Base-Letter Conversion for Character s with Diacritical Marks
3.2.8.3
Alternate Spelling
3.2.8.4
Composite Words
3.2.8.5
Korean, Japanese, and Chinese Indexing
3.2.9
Fuzzy Matching and Stemming
3.2.10
Better Wildca rd Query Performance
3.2.11
Document Section Searching
3.2.12
Stopwords and Stopthemes
3.2.12.1
Multi-Language Stoplists
3.2.13
Index Performance
3.2.14
Query Performa nce and Storage of LOB Columns
3.3
Index Creation
< /dd>
3.3.1
Procedure for Creating a CONTEXT Index
3.3.2
Creating Preferences
3.3.2.1
Datastore Examples
3.3.2.2
NULL_FILTER Example: Indexing HTML Documents
3.3.2.3
PROCEDURE_F ILTER Example
3.3.2.4
BASIC_LEXER Example: Setting Printjoins Ch aracters
3.3.2.5
MULTI_LEXER Example: Indexing a Multi-Language Table
3.3.2.6
BASIC_WORDLIST Example: Enabling Substring and Pre fix Indexing
3.3.3
Creating Section Groups for Section Searching
3.3.3.1
Example: Creating HTML Sections
3.3.4
Using Stopwords and Stoplists
3.3.4.1
Multi-Language Stoplists
< span class="secnum">3.3.4.2 Stopthemes and Stopclasses
3.3.4.3 span> PL/SQL Procedures for Managing Stoplists
3.3.5
C reating an Index
3.3.6
Creating a CONTEXT Index
3.3.6.1
CONTEXT Index and DML
3.3.6.2
Default CONTEXT Index Example
3.3. 6.3
Custom CONTEXT Index Example: Indexing HTML Documents
3.3.7
Creating a CTXCAT Index
3.3.7.1
CTXCA T Index and DML
3.3.7.2
About CTXCAT Sub-Indexes and Their Costs
3.3.7.3
Creating CTXCAT Sub-indexes
3.3.7.4
Creating CTXCAT Index
3.3.8
Creating a CTXRULE Index
3.3.8.1 Create a Table of Queries
3.3.8.2
Create the CTXRULE Index
3.3.8.3
Classifying a Document
3.4
Index Maintenance
3.4.1
Viewing Index Errors
3.4.2
Dropping an Index
3.4.3
Resuming Failed Index
3.4.3.1
Example: Resuming a Failed Index
3.4.4
Rebuilding an Index
3.4.4.1
Example: Rebuilding and Index
3.4.5
Dropping a Preference
3.4.5.1
Example
dd>
3.5
Managing DML Operations for a CONTEXT Index
3.5.1
Viewing Pending DML
3.5.2
Synchronizing the Index
3 .5.2.1
Setting Background DML
3.5.3
Index Optim ization
3.5.3.1
CONTEXT Index Structure
< a href="ind.htm#sthref400">
3.5.3.2
Index Fragmentation
3.5.3.3
Document Invalidation and Garbage Collection
3.5.3.4
Single Token Optimization
3.5.3.5
Viewing Index Fragmentation and Garbage Data
3.5.3.6
Examples: Optimizing the Index
4
Querying
4.1
Overview of Queries
dd>
4.1.1
Querying with CONTAINS
4.1.1.1
CONTAINS SQL Example
4.1.1.2
CONTAINS PL/SQL Example
4.1.1.3
St ructured Query with CONTAINS
4.1.2
Querying with CAT SEARCH
4.1.2.1
CATSEARCH SQL Query
4.1.2.2
CATSEARCH Example
4.1.3
Querying with MATCHES
4.1 .3.1
MATCHES SQL Query
4.1.3.2
MATCHES PL/SQL Example a>
4.1.4
Word and Phrase Queries
4.1.4.1
CONTAINS Phrase Queries
4.1.4.2
CATSEARCH Phrase Queries
4.1.5
Querying Stopwords
4.1.6
ABOUT Queries and Them es
4.1.6.1
Querying Stopthemes
4.1.7
Query Expressions
4.1.7.1
CONTAINS Operators
4.1.7.2 CATSEARCH Operator
4.1.7.3
MATCHES Operator
4.1.8
Case-Sensitive Searching
4.1.8.1
Word Queries
4 .1.8.2
ABOUT Queries
4.1.9
Query Feedback
4.1.10
Query Explain Plan
4.1.11
Using a Thesaurus in Queries
4 .1.12
Document Section Searching
4.1.13
Using Query Tem plating
4.1.14
Query Rewrite
4.1.15
Query Relaxation
4.1.16< /span> Query Language
4.1.17
Alternative Scoring
4.1.18
Alternative Grammar
4.1.19
Query Analysis
4.1.20
Other Quer y Features
4.2
The CONTEXT Grammar
4.2.1
ABOUT Query
4.2.2
Logical Operators
4.2.3
Section Sear ching
4.2.4
Proximity Queries with NEAR and NEAR_ACCUM Operato rs
4.2.5
Fuzzy, Stem, Soundex, Wildcard and Thesaurus Expansio n Operators
4.2.6
Using CTXCAT Grammar
4.2.7
Stored Query Expressions
< span class="secnum">4.2.7.1 Defining a Stored Query Expression
4.2.7.2
SQE Example
4.2.8
Calling PL/SQL Fun ctions in CONTAINS
4.2.9
Optimizing for Response Time
4.2.9.1
Other Factors that Influence Query Response Time
4.2.10
Counting Hits
4.2.10.1
SQL Count Hits Example
4.2.10.2
Counting Hits with a Structured Predicate
4 .2.10.3
PL/SQL Count Hits Example
4.3 The CTXCAT Grammar
4.3.1
Using CONTEXT Grammar wi th CATSEARCH
5
Document Presentation
5.1
Highlighting Quer y Terms
5.1.1
Text highlighting
5.1.2
Theme Highlighting
5.1.3
CTX_DOC Highlighting Procedures
5.1.3.1 Highlight Procedure
5.1.3.2
Markup Procedure
< a href="view.htm#sthref581">
5.1.3.3
Filter Procedure
5.1.3.4
CTX_DOC.POLICY_FILTER Procedure
5.2
Obtaining Lists of Themes, Gists, and Theme Summaries
5.2.1
Lists of Themes
5.2.1.1
In- Memory Themes
5.2.1.2
Result Table Themes
5.2.2
Gist and Theme Summary
5.2.2.1
In-Memory Gist
5.2.2.2 Result Table Gists
5.2.2.3
Theme Summary
< /dl>
5.3
Document Presentation and Highlighting
5.3.1
Highlighting Example
5.3.2 Document List of Themes Example
5.3.3 span> Gist Example
6
Document Classification
6.1
O verview
6.1.1
Classification Applications
6.2
Classification Solutions
6.3
Rule-Based Classification
6.3.1
Rule-based Classification Example
6.3.2
CTXRULE Parameters and Limitations
6.4 Supervised Classification
6.4.1
Decision Tre e Supervised Classification
6.4.1.1
Decision Tree Supervised Classification Example
6.4.2
SVM-Base d Supervised Classification
6.4.2.1
SVM-Based Sup ervised Classification Example
6.5
Unsu pervised Classification (Clustering)
6.5.1
Cluste ring Example
7
Performance Tuning
7.1
Optimizing Queri es with Statistics
7.1.1
Collecting Statistics
< /dd>
7.1.1.1
Example
7.1.2
Re-Collecting Statistics
7.1.3
Deleting Statistics
7.2
Optimizi ng Queries for Response Time
7.2.1
Other Factors th at Influence Query Response Time
7.2.2
Improved Response Time with FIRST_ROWS(n) for ORDER BY Queries
7.2.2.1
Ab out the FIRST_ROWS Hint
7.2.3
Improved Response Tim e using Local Partitioned CONTEXT Index
7.2.3.1
Ran ge Search on Partition Key Column
7.2.3.2
ORDER BY Partition Key Column
7.2.4
Improved Response Time with Local Partitioned Index for Order by Score
7.3
Optimizing Queries for Throughput
7.3.1
CHOOSE and ALL ROWS M odes
7.3.2
FIRST_ROWS Mode
7.4
Tracing
7.5 Parallel Queries
7.6
Tuning Queries with Blocking Operatio ns
7.7
Frequently Asked Questions a About Query Performance a>
7.7.1
What is
Query Performance
?
< dd>
7.7.2
What is the fastest type of text query?
7.7.3
Should I collect statistics on my tables?
7.7.4
How does the size of my data affect queries?
7.7.5 How does the format of my data affect queries?
7.7.6
What is a
functional
versus an
indexed
lookup?
7.7.7 What tables are involved in queries?
7.7.8
Does sorting the results slow a text-only query?
7.7.9
How do I make a ORDER BY score query faster?
7.7.10
W hich Memory Settings Affect Querying?
7.7.11
Does out of line LOB storage of wide base table columns improve performance?
7.7.12< /span> How can I make a CONTAINS query on more than one column faster?
7.7.13
Is it OK to have many expansions in a query?
7.7.14
How can local partition indexes help?
7.7.15
Should I query in parallel?
7.7.16
Should I index themes?
7.7.17
When should I use a CTXCAT index?
7.7.18
When is a CTXCAT index NOT suitable?
7.7.19 What optimizer hints are available, and what do they do?
7.8
Frequently Asked Questions About Indexing Performance
7.8.1
How long should indexing take?
7.8.2
Which index memory settings should I use?
7.8.3
How much disk overhead will indexing require?
7.8.4 How does the format of my data affect indexing?
7.8.5
Can parallel indexing improve performance?
7.8.6
How can I impro ve index performance for creating local partitioned index?
7.8.7 How can I tell how much indexing has completed?
7.9 Frequently Asked Questions About Updating the Index
7 .9.1
How often should I index new or updated records?
7.9.2 span> How can I tell when my indexes are getting fragmented?
7.9.3 span> Does memory allocation affect index synchronization?
8
Document Section Searching
8.1
About Document Section Searching
8.1.1 Enabling Section Searching
8.1.1.1
Create a Section Group
8.1.1.2
Define You r Sections
8.1.1.3
Index your Documents
8.1.1.4
Section Searching with WITHIN Operator
8.1.1.5
Path Searching with INPATH and HASPATH Operators
8.1.2
Section Types
8.1.2.1
Zone Section
8.1.2.2
Field Se ction
8.1.2.3
Stop Section
8.1.2.4
MDATA Section
8. 1.2.5
Attribute Section
8.1.2.6
Special Sections
8.2
HTML Section Searching
< dl>
8.2.1
Creating HTML Sections
8.2.2
Searching HTML Meta Tags
8.2.2.1
Example: Creating Sections for
<META>
Tags
8.3
XML Section Searching
8.3.1
Automatic Sectioning
8.3.2
Attribute Searching
8.3.2.1
Creating Attribute Sections
8.3.2.2
Searching Attributes with the INPATH Opera tor
8.3.3
Creating Document Type Sensitive Sectio ns
8.3.4
Path Section Searching
8.3.4.1
Creating Index with PATH_SECTION_GROUP
8.3.4.2
Top-Level Tag Searching
8.3.4.3
Any-Level Tag Searching
8.3.4.4
D irect Parentage Searching
8.3.4.5
Tag Value Testing
8.3.4.6
Attribute Searching
8.3.4.7
Attribute Value Testing
8.3.4.8
Path Testing
8.3.4.9
Section Equality Test ing with HASPATH
9
Working With a Thesaurus
9.1 Overview of Thesauri
9.1.1
Thesaurus Creation and Maintenance
9.1.1.1
CTX_THES Package
< a href="cthes.htm#sthref895">
9.1.1.2
Thesaurus Operators
9.1.1.3
ctxload Utility
9.1.2
Case-sensitive Thesauri
9.1.3
Case-insensitive Thesauri
dd>
9.1.4
Default Thesaurus
9.1.5
Supplied Thesaurus
9.1.5. 1
Supplied Thesaurus Structure and Content
9.1.5.2
Supp lied Thesaurus Location
9.2
Defining Thesa ural Terms
9.2.1
Defining Synonyms
9.2.2
Defining Hierarchical Relations
9.3
Using a Thesaurus in a Query Application
9.3.1
Loading a Custom Thesaurus and Issuing Thesaural Queries
9.3.1.1
Advantage
9.3.1.2
Limitations
9.3.2
Augmenting Knowledge Base with Custom Thesaurus
9.3.2.1
Advantage
9.3.2.2
Limitations
9.3.2.3
Linking New Terms to Existing Terms
9.3.2.4< /span> Loading a Thesaurus with ctxload
9.3.2.5
Compiling a Lo aded Thesaurus
9.4
About the Supplied Know ledge Base
9.4.1
Adding a Language-Specific Knowledg e Base
9.4.1.1
Limitations
dd>
10
Administration
10.1
Oracle Text Users and Roles
< dd>
10.1.1
CTXSYS User
10.1.2
CTXAPP Role
10.1.3
Granting Roles and Privileges to Users
10.2
DML Queue
10.3
The CTX_OUTPUT Package
10.4
The CTX_REPORT Package
10.5
Servers< /a>
10.6
Administration Tool
11
Migrating Applications from Earlier Releases font>
11.1
Security Improvements in Oracle Text
11.1.1
CTXSYS No Longer Has DBA Permissions
11.1.2
Migrating CTXSYS-Owned Procedures
11.1.3
Effective User During Indexing
11.1.4
Procedures Do Not Need to Be Owned by CTXSYS
11.1.5
Synching and Optimizing of Other Users' Indexes
1 1.1.6
CTX Packages and Invoker's Rights
11.1.7
CREATE TABLE Permissions
11.2
Migrating Back to Previous Releases
A
CONTE XT Query Application
A.1
Web Query Application Ov erview
A.2
The PSP Web Application
A.2.1
Web Application Prerequisites
A.2.2
Building the Web Application
A. 2.3
PSP Sample Code
A.2.3.1
loader.ctl
dd>
A.2.3.2
loader.dat
A.2.3.3 search_htmlservices.sql
A.2.3.4 search_html.psp
A.3
The JSP Web Appli cation
A.3.1
Web Application Prerequisites
A.3.2
JSP Sample Code
A.3.2.1
search_html.jsp
B
CATSEARCH Query Application
B.1
CATSEARCH Web Query Application Overview
B.2
The JSP Web Application
B.2.1
Building the JSP Web Application
B.2.2
JS P Sample Code
B.2.2.1
loader.ctl
B.2.2.2
loader.dat
B.2.2.3
catalogSearch.jsp
Index