Steve Callan walks through some Oracle Text setup steps and working examples of different index types used by Oracle Text.
Oracle
Text - Expanding Your String Searching Capabilities in Oracle Database
discussed some of Oracle Text's features and functionality. In this
article, Steve Callan goes through some setup steps and worked examples.
First of all, where do you get
Oracle Text? In at least 10g and above, it is installed by default.
Where can you see if feature X is installed or not? One place is within
DBA_REGISTRY, query on comp_name and status.
SQL> col comp_name for a40
SQL> select comp_name, status from dba_registry;
COMP_NAME STATUS
---------------------------------------- -------
Oracle Database Catalog Views VALID
Oracle Database Packages and Types VALID
Oracle Workspace Manager VALID
JServer JAVA Virtual Machine VALID
Oracle XDK VALID
Oracle Database Java Packages VALID
Oracle Expression Filter VALID
Oracle Data Mining VALID
Oracle Text VALID
Oracle XML Database VALID
Oracle Rules Manager VALID
Oracle interMedia VALID
OLAP Analytic Workspace VALID
Oracle OLAP API VALID
OLAP Catalog VALID
Spatial VALID
Oracle Enterprise Manager VALID
Since Text is a feature, and as
we all know, upgrades can be cause for concern in terms of extra steps having
to be performed, you'll be happy to note that the upgrade of Oracle Text from
one release to another comes along as part of the overall upgrade process. In
other words, you don't have to do anything.
Second, what does it take for a
regular user to be able to use Oracle Text? Not a whole lot. Create a table and
insert data as needed, create the appropriate Oracle Text index, and execute a
query against the table.
SQL> conn scott/tiger
Connected.
SQL> create table docs (id number primary key, text varchar2(80));
Table created.
SQL> insert into docs values (1, 'first document');
1 row created.
SQL> insert into docs values (2, 'second document');
1 row created.
SQL> commit;
Commit complete.
SQL> create index doc_index on docs(text)
2 indextype is ctxsys.context;
Index created.
SQL> select id, text
2 from docs
3 where contains(text, 'first') > 0;
ID TEXT
---------- ------------------------------
1 first document
In the above example, only two
elements are different from what you would expect to see in a regular query.
The first is the CREATE INDEX statement. Note the syntax where INDEXTYPE is of
CTXSYS.CONTEXT. As mentioned in the introductory article, one of the four index
types in Oracle Text is context-based. The second is the syntax in the SELECT
statement. Other than those two new aspects, using Oracle Text, albeit in a
simple example, is pretty easy to do.
Who is the CTXSYS user? Using Toad
for its handy schema browser interface, we can see the following information
with respect to what constitutes this schema (ignoring tables and indexes for
the time being).
What becomes clear about CTXSYS
is that its main driver (what makes it tick) is the CTXAPP role, and looking
further into what the role has for grants is a collection of EXECUTE privileges
on several packages.
For the simple query I used
earlier, the amount of steps shown in a formatted trace file (using TKPROF) is
pretty amazing. The actual query appears as the 6th statement or
procedure call (one of them is dynamic sampling since the table has not been
analyzed yet). Recalling what was mentioned earlier about the indexes in Text
being domain indexes? We can see this as a fact via the execution plan.
select id, text from docs
where contains(text,'first')>0
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.03 0.03 0 145 0 0
Execute 1 0.01 0.01 0 0 0 0
Fetch 2 0.00 0.00 0 2 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.04 0.05 0 147 0 1
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 54
Rows Row Source Operation
------- ---------------------------------------------------
1 TABLE ACCESS BY INDEX ROWID DOCS (cr=16 pr=0 pw=0 time=70341 us)
1 DOMAIN INDEX DOC_INDEX (cr=15 pr=0 pw=0 time=70375 us)
When examining the SQL being
executed behind the scenes (specifically, DML), you'll see references to four
different items (tables) ending with a distinctive "dollar letter"
identifier. These suffixes are:
They are further identified as
being prefixed with DR, the index name, and then the table identifier (one of
the four suffixes). A white
paper at OTN on how Text processes DML explains them in more detail.
What this all leads up to is
that to use Text, a user needs to have (aside from other normal privileges)
been granted the CTXAPP role, some additional EXECUTE grants on other CTXSYS
packages, and sufficient storage space for table-like indexes.
A set of grants in the
documentation is shown below.
GRANT EXECUTE ON CTXSYS.CTX_CLS TO scott;
GRANT EXECUTE ON CTXSYS.CTX_DDL TO scott;
GRANT EXECUTE ON CTXSYS.CTX_DOC TO scott;
GRANT EXECUTE ON CTXSYS.CTX_OUTPUT TO scott;
GRANT EXECUTE ON CTXSYS.CTX_QUERY TO scott;
GRANT EXECUTE ON CTXSYS.CTX_REPORT TO scott;
GRANT EXECUTE ON CTXSYS.CTX_THES TO scott;
GRANT EXECUTE ON CTXSYS.CTX_ULEXER TO scott;
The user using Text also needs
to be aware of how Text-related indexes are maintained. Going back to the
simple DOCS table, let's insert another record and then query for it.
SQL> insert into docs values (3, 'third document');
1 row created.
SQL> commit;
Commit complete.
SQL> select id, text from docs
2 where contains(text,'third')>0;
no rows selected
You can clearly find this
record via regular SQL, but the Oracle Text query fails to return the record.
Why is that? Again, this type of index requires synchronization after DML, and
the user definitely needs EXECUTE privileges on the CTX_DDL package to make
that happen. So, let's sync the index and re-try the query.
SQL> exec ctx_ddl.sync_index('doc_index');
PL/SQL procedure successfully completed.
SQL> select id, text from docs
2 where contains(text,'third')>0;
ID TEXT
---------- ------------------------------
3 third document
Depending on your application,
you can begin to see the criticality of this requirement. Suppose you were
managing a no-fly list that is updated once every 24 hours. A new person of
interest is inserted into the no-fly table, but without the index being updated
yet, there will be a window where this person may be allowed to board an
aircraft unless some other step is taken (e.g., manually checking a web page or
some other external source of information). A failure there is not good either.
In the index sync statement, I
took advantage of a default parameter (several, actually). The specification
for CTX_DDL is shown below.
PROCEDURE sync_index(
idx_name in varchar2 default NULL,
memory in varchar2 default NULL,
part_name in varchar2 default NULL,
parallel_degree in number default 1
);
Specifying the index name is
sufficient, but what comes into play with real world sized data sets is how
much memory you can afford to use to cache the index ahead of time. What does
the SYNC_INDEX procedure do? We don't know (exactly) because Oracle wrapped the
package body. As no value was given for the memory parameter, does that mean no
memory was used? The answer to that is no, and the way to see the default value
is to query a CTXSYS table.
The path to the description of
Text packages is a redirection from the PL/SQL Packages and Types Reference
guide to the Oracle Text
Reference guide. The description of CTX_DDL says that the memory parameter
uses the system value for DEFAULT_INDEX_MEMORY. Don't go looking for this via
SHOW PARAMETER as "system" does not refer to the instance
initialization parameters. Instead, query the CTX_PARAMETERS table, and in my
case, the value for this parameter is 12582912 bytes, or about 12MB. The max
setting is 1GB.
As another example of using
Oracle Text, but this time with a different index type, we can use the catalog
search example shown in the Oracle Text Application
Developer's Guide. The example is based on using more than one column for an index,
and in this case, supports searching for string and sorting at the same time.
The steps are to create a table, populate it, create a sub-index (the
additional column we'll be using in addition to the main index), create the
catalog index, and they query from the table.
Create table and populate it
CREATE TABLE auction(
item_id NUMBER,
title VARCHAR2(100),
category_id NUMBER,
price NUMBER,
bid_close DATE);
INSERT INTO AUCTION VALUES
(1, 'NIKON CAMERA', 1, 400, '24-OCT-2002');
INSERT INTO AUCTION VALUES
(2, 'OLYMPUS CAMERA', 1, 300, '25-OCT-2002');
INSERT INTO AUCTION VALUES
(3, 'PENTAX CAMERA', 1, 200, '26-OCT-2002');
INSERT INTO AUCTION VALUES
(4, 'CANON CAMERA', 1, 250, '27-OCT-2002');
commit;
Create the sub-index
EXEC CTX_DDL.CREATE_INDEX_SET('auction_iset');
EXEC CTX_DDL.ADD_INDEX('auction_iset','price');
Create the catalog index
CREATE INDEX auction_titlex ON AUCTION(title)
INDEXTYPE IS CTXSYS.CTXCAT
PARAMETERS ('index set auction_iset');
And now we're ready to query.
SQL> COLUMN title FORMAT a40;
SQL> SELECT title, price FROM auction
2 WHERE CATSEARCH(title, 'CAMERA', 'order by price')> 0;
TITLE PRICE
---------------------------------------- ----------
PENTAX CAMERA 200
CANON CAMERA 250
OLYMPUS CAMERA 300
NIKON CAMERA 400
Overall, pretty easy, except
the documentation has a couple of typographical Easter eggs to make some
statements fail. The first CTX_DDL call should be CREATE_INDEX_SET, not
CREATE.INDEXT_SET. The "sub-index A" comment at the end of an EXEC
statement does not work well either, so just omit it as I did above.
Finally, what happens when the
base table is updated (or overall, DML is applied against it)? With the catalog
index, the index is synchronized for you. Given that something (DML) is taking
place on the base table and an index is being updated, is there an implicit
commit taking place? The answer is no. You can test this by running the query
based on the two new records, and then running the same query in another
session. So, one other potential gotcha in the documentation is the absence of
a COMMIT statement.
In Closing
So far, we've seen two working
examples of different index types used by Oracle Text. The simple things are
simple to do; and likewise, the harder things are harder too. Next time, we'll
take a look at some of the more complex features and usage of this tool.
Additional Resources
Oracle Text
Previous
Back to DBAsupport.com