Performance trying to do a 'best match'
DBAsupport.com Forums - Powered by vBulletin
Results 1 to 9 of 9

Thread: Performance trying to do a 'best match'

  1. #1
    Join Date
    Sep 2005
    Location
    Phoenix AZ
    Posts
    1

    Performance trying to do a 'best match'

    I have an application that routes telephone calls based on the phone number the user dials and selecting from a list of route plans the telephone company operator creates. The complexity comes in because the route selection table will give a list of partial phone number with variable granularity. For example part of the list may look like this (numbers not real)

    4 (country zone 4)
    44 (UK)
    4424 (UK London)
    442467 (UK London Mobile)

    if the user dials 442467123456789 I need the data from the last entry, if the user dials 442498123456789 then I need the data from the 4424 entry, if the user dials 443666123456789 then I need the 44 data and so on. The problem is performance, the table contains > 200,000 entries and when I use my sql statement which is:

    SELECT digits FROM routestepent WHERE 'DEFAULT' = tblnm AND 6022241234' LIKE digits||'%' ORDER BY LENGTH(digits) DESC

    digits is the column that contains the list of numbers to match to

    tblnm is a column that specifies a group of entries that make up a "route table" form the telephone company's standpoint.

    6022241234 as an example I plugged in and used for testing, in the app we use whatever the user dials

    it works but it takes close to 1/2 second to execute which in the telephony world is wholey unacceptable. It seems oracle is not using the index on the digits field because I am using the LIKE operator.

    thanks.
    Last edited by GSimpson; 09-08-2005 at 06:48 PM.

  2. #2
    Join Date
    Aug 2002
    Location
    Colorado Springs
    Posts
    5,253
    Have you confirmed that Oracle is not using an index -- I would guess that it might be performing a fast full index scan.

    A few things occur to me.

    Have you benchmarked something like ...
    Code:
    Select digits
    from
    (
    select digits
    from routestepent
    where digits in
       (
       substr(6022241234',1,1),
       substr(6022241234',1,2),
       substr(6022241234',1,3),
       substr(6022241234',1,4),
       ..
       etc
       )
       order by length(digits) desc
    )
    where rownum =1
    /
    You'd only need to go up to your longest digit string, of course.

    Another approach:
    • What is the distribution of lengths for the 'digits' column?
    • Is it possible to work out the probability of a match at a given length of digits?
    • Can you then use that information to say something like "30% of the dialled numbers get a match on a digit length of 6"?
    • Now if that were so can you query for a match at that length where there are no matches at a longer length?

    Code:
    select digits
    from routestepent
    WHERE 'DEFAULT' = tblnm AND
         substr(6022241234',1,6) = digits and
         not exists (select 1 from routestepent
    WHERE 'DEFAULT' = tblnm and digits like substr(6022241234',1,7)||'%'
       )
    In other words, can you leverage your own knowledge to try and get a quick hit on the most common one or two cases?

    You might be able to optimize that last example by placing a (technically redundant) code in the table to indicate whether there are longer matches available, say a "terminal_string" code ... so in the examples you give, '4', '44', '4424' would have a terminal_string of 'N' and '442467' would have a terminal_string of 'Y', hence the code would become:

    Code:
    select digits
    from routestepent
    WHERE 'DEFAULT' = tblnm AND
         substr(6022241234',1,6) = digits and
         (terminal_string = 'Y' or
         (terminal_string = 'N' and 
         not exists (select 1 from routestepent
    WHERE 'DEFAULT' = tblnm and digits like substr(6022241234',1,7)||'%'
       )))
    (I think i got the right number of parens on that)

    Lastly, would it be possible once a match has been found to log that for future use against the dialed number in an index-organized lookup table? How often does this routing table change?

    How many tblnm's are there? Maybe instead of indexing on tblnm you could list-partition your table on tblnm and use local indexes to reduce the index size to be scanned. A poor man's alternative would be to use multiple oracle tables for each tblnm, then use your application code to execute a query to the appropriate table.

    By the way, you're using bind variables here, right?
    David Aldridge,
    "The Oracle Sponge"

    Senior Manager, Business Intelligence Development
    XM Satellite Radio
    Washington, DC

    Oracle ACE

  3. #3
    Join Date
    Aug 2002
    Location
    Colorado Springs
    Posts
    5,253
    Another thought ... "digits" is a varchar2 column, not a number() right?
    David Aldridge,
    "The Oracle Sponge"

    Senior Manager, Business Intelligence Development
    XM Satellite Radio
    Washington, DC

    Oracle ACE

  4. #4
    Join Date
    Feb 2005
    Posts
    158
    I agree that you need to leverage knowledge of the data.
    I'd probably look at multiple steps. First one covering the country/city codes, then if no match there, move out to the wider coverage, and so on.

    The first should use a narrow index range scan, the others would be a wider index range scan (but hopefully executed less often. If not, spread the data out into three tables according to precision, and the wider queries will have less data to process).

    SELECT distinct first_row(digits) over (order by length(digits) desc)
    FROM routestepent
    WHERE 'DEFAULT' = tblnm
    AND digits like substr(:num,1,5)||'%'
    and :num LIKE digits||'%'

    WHEN NO_DATA_FOUND then

    SELECT distinct first_row(digits) over (order by length(digits) desc)
    FROM routestepent
    WHERE 'DEFAULT' = tblnm
    AND digits like substr(:num,1,2)||'%'


    WHEN NO_DATA_FOUND then

    SELECT distinct first_row(digits) over (order by length(digits) desc)
    FROM routestepent
    WHERE 'DEFAULT' = tblnm
    AND digits like substr(:num,1,1)||'%'

  5. #5
    Join Date
    May 2000
    Location
    ATLANTA, GA, USA
    Posts
    3,135
    How about reverse key index on the digits col?

    Tamil

  6. #6
    Join Date
    Dec 2000
    Location
    Ljubljana, Slovenia
    Posts
    4,439
    Quote Originally Posted by tamilselvan
    How about reverse key index on the digits col?
    How would reverse key index help with the *SELECT* statement? And in particular, how would reverse key index help with the query that uses *LIKE* operator in the predicate on the indexed column?
    Jurij Modic
    ASCII a stupid question, get a stupid ANSI
    24 hours in a day .... 24 beer in a case .... coincidence?

  7. #7
    Join Date
    May 2000
    Location
    ATLANTA, GA, USA
    Posts
    3,135
    ===
    digits is the column that contains the list of numbers to match to

    tblnm is a column that specifies a group of entries that make up a "route table" form the telephone company's standpoint.

    6022241234 as an example I plugged in and used for testing, in the app we use whatever the user dials
    it works but it takes close to 1/2 second to execute which in the telephony world is wholey unacceptable
    ====


    If the digits col is used for storing the telephone number, then REVERSE KEY INDEX will help to retrive the row fast. When the user enters full digits for a telephone number, then this will work. I just give an option. The original poster concerns about performance. 0.5 second is not acceptable.

    I know we cannot use LIKE operator with the REVERSE KEY INDEX.

    Tamil

  8. #8
    Join Date
    Dec 2000
    Location
    Ljubljana, Slovenia
    Posts
    4,439
    Quote Originally Posted by tamilselvan
    If the digits col is used for storing the telephone number, then REVERSE KEY INDEX will help to retrive the row fast. When the user enters full digits for a telephone number, then this will work.
    But why a *REVERSE KEY* index? Why not an ordinar B*Tree index? You say: "When the user enters full digits for a telephone number, then reverse key index will help to retrive the row fast". But if the same index was made as a normal one (i.e. non-reverse key), then the row retrival will be even faster, no question about that. So my question remains: Why *REVERSE KEY* index? It doesn't make sence at all in the context of the original question.
    Jurij Modic
    ASCII a stupid question, get a stupid ANSI
    24 hours in a day .... 24 beer in a case .... coincidence?

  9. #9
    Join Date
    Sep 2005
    Posts
    278
    4 (country zone 4)
    44 (UK)
    4424 (UK London)
    442467 (UK London Mobile)
    The other approach is to group the data with certain category by adding a column.

    as i said u can add a column with categorized the data, for example

    U can add one column let us say category
    For international number this column can contain one flag say one
    For national number it can contain other value.

    Like that

    If u know the type of call then u can include that category in SQL statement. and u can add an index on category
    Last edited by tabreaz; 09-11-2005 at 09:16 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width