DBAsupport.com Forums - Powered by vBulletin
Page 1 of 2 12 LastLast
Results 1 to 10 of 19

Thread: Please help me speed this up...

  1. #1
    Join Date
    Mar 2002
    Location
    Mesa, Arizona
    Posts
    1,204

    Please help me speed this up...

    I have a process that takes a clob and checks it against a table of words. It takes about 17 seconds per clob. I need to speed it up.

    Here the core of the code:
    begin
    open c_unclean_text;
    loop
    fetch c_unclean_text into v_text_id,v_text;
    open c_words;
    loop
    fetch c_words into v_word;
    if regexp_like(v_text,'["''*() .;:,^!-]'||v_word||'["''() .;:,$!-?*]','i') then
    v_wordlist := v_wordlist||v_word||' ';
    end if;
    exit when c_words%NOTFOUND;
    end loop;
    close c_words;
    if v_wordlist||'+++' != '+++' then
    insert into vtest (id, wordlist) values (vtext_id, v_wordlist);
    commit;
    end if;
    v_wordlist := '';
    exit when c_unclean_text%NOTFOUND;
    end loop;
    close c_unclean_text;
    end;
    /

    Some notes:
    1. I used regexp_like() because LIKE would return "grASS" or "ASSociation". I just want delimited words.
    2. What about using a pl/sql table for the word table?
    3. The process works good, but it just takes too long. I may have 3k-5k of these a night. 3k takes about 13 hours to process.
    4. I considered Oracle Text but I don't think it'll do regexp_like().

    I hope you see something that can help.
    "I do not fear computers. I fear the lack of them." Isaac Asimov
    Oracle Scirpts DBA's need

  2. #2
    Join Date
    Nov 2000
    Location
    Pittsburgh, PA
    Posts
    4,166

    Re: Please help me speed this up...

    Originally posted by KenEwald
    Some notes:
    1. I used regexp_like() because LIKE would return "grASS" or "ASSociation". I just want delimited words.
    2. What about using a pl/sql table for the word table?
    3. The process works good, but it just takes too long. I may have 3k-5k of these a night. 3k takes about 13 hours to process.
    4. I considered Oracle Text but I don't think it'll do regexp_like().

    I hope you see something that can help.
    I haven't heard of regexp_like. But I haven't done much with 10g yet, I had to look it up. As a PL/SQL programmer I would have to say that you should use implicit cursors whenever possible, which includes this case.

    Second, Have you looked at the waits generated when this program is running? Do you know how much temp space this is using? I was playing with clobs in 9i a few years ago and I started to have problems with filling up the temp tablespace. Fetching a clob definitley adds overhead. Take a look at the DBMS_LOB package. There is some functionality that might be helpful, like instr.

    It has been awhile since I have looked at this, but I remember something about creating an object that points to the correct row in the table and using the dbms_lob procedure to process the data. A Pl/SQL might be helpful.

    Also are there any bug reports on this function as it relates to clobs?
    Last edited by gandolf989; 10-18-2004 at 01:22 PM.

  3. #3
    Join Date
    Mar 2002
    Location
    Mesa, Arizona
    Posts
    1,204
    Thanks for the reply...

    Finally if you are only storing 5k in the clob why don't you just make the column VARCHAR2(8000)?
    The clob's could be very long. I was referring to 3 to 5,000 records a night.

    I'll check into implicit cursors. Would that really help the performance?

    I'm thinking the bulk of the work is the full table scan on the 5,177 row word table. For each Clob, I scan all 5,177 rows.

    I'm guessing the biggest performance gain would be to put this in a memory table or array and walk through the array instead of the table. What do you think?

    No problem with temp, I had 15 of these processes running at the same time on a 4 proc box to check 100,000+ clobs. It took about 2 days. Shaving a couple of seconds from each iteration would add up.
    "I do not fear computers. I fear the lack of them." Isaac Asimov
    Oracle Scirpts DBA's need

  4. #4
    Join Date
    Aug 2002
    Location
    Colorado Springs
    Posts
    5,253
    Can this not be turned into a straight insert, where you select the text_id and the matching words by joining the two table based on the regexp_like() function through a single SQL statement, then afterwards deal with concatenating the matching words into asingle record as a separate step?
    David Aldridge,
    "The Oracle Sponge"

    Senior Manager, Business Intelligence Development
    XM Satellite Radio
    Washington, DC

    Oracle ACE

  5. #5
    Join Date
    Mar 2002
    Location
    Mesa, Arizona
    Posts
    1,204
    let's give it a try;

    insert into vtest (id, wordlist)
    select t.text_id, w.word
    from text_table t,
    words w
    where regexp_like(t.text,'["''*() .;:,^!-]'||w.word||'["''() .;:,$!-?*]','i')

    It's valid syntax, but not returning rows. Hum..

    text_table has 1 row: 1,MARTIN
    words has a row where word = 'MARTIN'

    It took 3 seconds .. guess that's because it created a cartesian product between the two tables.

    Here's the explain plan...
    ID PID Operation Name Rows Bytes Cost IN-OUT PQ Dist PStart PStop
    0 SELECT STATEMENT 259 18K 6
    1 0 NESTED LOOPS 259 18K 6
    2 1 TABLE ACCESS FULL TEXT_TABLE 1 65 3
    3 1 TABLE ACCESS FULL WORDS 259 1K 3
    "I do not fear computers. I fear the lack of them." Isaac Asimov
    Oracle Scirpts DBA's need

  6. #6
    Join Date
    Mar 2002
    Location
    Mesa, Arizona
    Posts
    1,204
    Added
    where 1=1 and regexp_...
    and it returns nothing in .5 seconds .. same execution plan.

    By the way... I rewrote it using implicit cursors and it dropped from 17 sec to 13.

    I'm working on creating words as a collection.. that's got to be faster.
    "I do not fear computers. I fear the lack of them." Isaac Asimov
    Oracle Scirpts DBA's need

  7. #7
    Join Date
    Nov 2000
    Location
    Pittsburgh, PA
    Posts
    4,166
    What about this?

    Code:
    DECLARE
    
       TYPE c_text_table IS TABLE OF text_table.text_id
          INDEX BY BINARY_INTEGER;
    
       TYPE c_words_table IS TABLE OF words%ROWTYPE
          INDEX BY BINARY_INTEGER;
    
       v_file      BFILE;
       v_wordlist  VARCHAR2(8000);
    BEGIN
    
       SELECT text_id 
         BULK COLLECT INTO c_text_table
         FROM text_table;
    
       SELECT word
         BULK COLLECT INTO c_words_table
         FROM words;
    
       FOR i IN text_id.FIRST..text_id.LAST
       LOOP
          BEGIN
             SELECT text INTO v_file FROM text_table WHERE text_id = c_text_table(i); 
             DBMS_LOB.OPEN(v_file, DBMS_LOB.LOB_READONLY);
       
             FOR j IN c_words_table.FIRST..c_words_table.LAST
             LOOP
                IF regexp_like( v_file,
                   '["''*() .;:,^!-]' || c_words_table(j) || '["''() .;:,$!-?*]', 'i' )
                THEN
                   v_wordlist  := v_wordlist || c_words_table(j) || ' ';
                END IF;
             END LOOP;
    
             DBMS_LOB.CLOSE( v_file );
    
             IF v_wordlist || '+++' != '+++'
             THEN
                INSERT INTO vtest ( ID, wordlist )
                     VALUES ( text_id(i), v_wordlist );
       
             END IF;
             v_wordlist       := '';
          EXCEPTION
             WHEN OTHERS THEN
                DBMS_LOB.CLOSE( v_file );
                RAISE;
          END;
       END LOOP;
    -- You should only need to commit once.
       COMMIT;
    END;
    /

  8. #8
    Join Date
    Mar 2002
    Location
    Mesa, Arizona
    Posts
    1,204
    After loading the cursors into collections and retesting I ended up gaining 2 seconds per text search (17-15 sec).

    I took the regexp_like out to find out exactly how much time each was consuming. It ran through 200 clobs in 4 seconds (looping through the words in a collection 5177 each time).

    I looked at the text clob and found that each has about 950 trailing blank spaces. After trimming the blanks, we're down to ~4 seconds each. That's more like it.

    This also means that we can save about 80-100mb /day of stored blanks.

    Thanks for all your help on this thread.
    "I do not fear computers. I fear the lack of them." Isaac Asimov
    Oracle Scirpts DBA's need

  9. #9
    Join Date
    Aug 2002
    Location
    Colorado Springs
    Posts
    5,253
    Would it help to clean up the expression with something like the following?
    Code:
    insert into vtest (id, wordlist)
    select t.text_id, w.word
    from text_table t,
    words w
    where regexp_like(t.text,'([^:alpha:]||^)'||w.word||'([^:alpha:]||$)','i')
    I also wonder whether it would be worth "pre-parsing" the string with a LIKE operation just to reduce the amount of times that you call REGEXP_LIKE ...
    Code:
    insert into vtest (id, wordlist)
    select text_id, word
    from (
    select t.text_id, w.word, t.text
    from text_table t,
    words w
    where Upper(t.text) like '%'||Upper(w.word)||'%'
    and rownum >= 1
    )
    where regexp_like(t.text,'([^:alpha:]||^)'||w.word||'([^:alpha:]||$)','i')
    Last edited by slimdave; 10-15-2004 at 05:45 PM.
    David Aldridge,
    "The Oracle Sponge"

    Senior Manager, Business Intelligence Development
    XM Satellite Radio
    Washington, DC

    Oracle ACE

  10. #10
    Join Date
    Mar 2002
    Location
    Mesa, Arizona
    Posts
    1,204
    Perhaps you're right. What would the query look like to check 1,000 clob's against 5,177 words in a like condition? We'd most likely get all of them anyways.

    Thanks for all your smart thinking.
    "I do not fear computers. I fear the lack of them." Isaac Asimov
    Oracle Scirpts DBA's need

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Click Here to Expand Forum to Full Width