Performance trying to do a 'best match'

**GSimpson** · 09-08-2005, 06:34 PM

I have an application that routes telephone calls based on the phone number the user dials and selecting from a list of route plans the telephone company operator creates. The complexity comes in because the route selection table will give a list of partial phone number with variable granularity. For example part of the list may look like this (numbers not real)

4 (country zone 4)
44 (UK)
4424 (UK London)
442467 (UK London Mobile)

if the user dials 442467123456789 I need the data from the last entry, if the user dials 442498123456789 then I need the data from the 4424 entry, if the user dials 443666123456789 then I need the 44 data and so on. The problem is performance, the table contains > 200,000 entries and when I use my sql statement which is:

SELECT digits FROM routestepent WHERE 'DEFAULT' = tblnm AND ‘6022241234' LIKE digits||'%' ORDER BY LENGTH(digits) DESC

digits is the column that contains the list of numbers to match to

tblnm is a column that specifies a group of entries that make up a "route table" form the telephone company's standpoint.

6022241234 as an example I plugged in and used for testing, in the app we use whatever the user dials

it works but it takes close to 1/2 second to execute which in the telephony world is wholey unacceptable. It seems oracle is not using the index on the digits field because I am using the LIKE operator.

thanks.

**slimdave** · 09-09-2005, 01:03 AM

Have you confirmed that Oracle is not using an index -- I would guess that it might be performing a fast full index scan.

A few things occur to me.

Have you benchmarked something like ...

Code:

Select digits
from
(
select digits
from routestepent
where digits in
   (
   substr(‘6022241234',1,1),
   substr(‘6022241234',1,2),
   substr(‘6022241234',1,3),
   substr(‘6022241234',1,4),
   ..
   etc
   )
   order by length(digits) desc
)
where rownum =1
/

You'd only need to go up to your longest digit string, of course.

Another approach:

What is the distribution of lengths for the 'digits' column?
Is it possible to work out the probability of a match at a given length of digits?
Can you then use that information to say something like "30% of the dialled numbers get a match on a digit length of 6"?
Now if that were so can you query for a match at that length where there are no matches at a longer length?

Code:

select digits
from routestepent
WHERE 'DEFAULT' = tblnm AND
     substr(‘6022241234',1,6) = digits and
     not exists (select 1 from routestepent
WHERE 'DEFAULT' = tblnm and digits like substr(‘6022241234',1,7)||'%'
   )

In other words, can you leverage your own knowledge to try and get a quick hit on the most common one or two cases?

You might be able to optimize that last example by placing a (technically redundant) code in the table to indicate whether there are longer matches available, say a "terminal_string" code ... so in the examples you give, '4', '44', '4424' would have a terminal_string of 'N' and '442467' would have a terminal_string of 'Y', hence the code would become:

Code:

select digits
from routestepent
WHERE 'DEFAULT' = tblnm AND
     substr(‘6022241234',1,6) = digits and
     (terminal_string = 'Y' or
     (terminal_string = 'N' and 
     not exists (select 1 from routestepent
WHERE 'DEFAULT' = tblnm and digits like substr(‘6022241234',1,7)||'%'
   )))

(I think i got the right number of parens on that)

Lastly, would it be possible once a match has been found to log that for future use against the dialed number in an index-organized lookup table? How often does this routing table change?

How many tblnm's are there? Maybe instead of indexing on tblnm you could list-partition your table on tblnm and use local indexes to reduce the index size to be scanned. A poor man's alternative would be to use multiple oracle tables for each tblnm, then use your application code to execute a query to the appropriate table.

By the way, you're using bind variables here, right?

**slimdave** · 09-09-2005, 01:56 AM

Another thought ... "digits" is a varchar2 column, not a number() right?

**gamyers** · 09-09-2005, 03:37 AM

I agree that you need to leverage knowledge of the data.
I'd probably look at multiple steps. First one covering the country/city codes, then if no match there, move out to the wider coverage, and so on.

The first should use a narrow index range scan, the others would be a wider index range scan (but hopefully executed less often. If not, spread the data out into three tables according to precision, and the wider queries will have less data to process).

SELECT distinct first_row(digits) over (order by length(digits) desc)
FROM routestepent
WHERE 'DEFAULT' = tblnm
AND digits like substr(:num,1,5)||'%'
and :num LIKE digits||'%'

WHEN NO_DATA_FOUND then

SELECT distinct first_row(digits) over (order by length(digits) desc)
FROM routestepent
WHERE 'DEFAULT' = tblnm
AND digits like substr(:num,1,2)||'%'

WHEN NO_DATA_FOUND then

SELECT distinct first_row(digits) over (order by length(digits) desc)
FROM routestepent
WHERE 'DEFAULT' = tblnm
AND digits like substr(:num,1,1)||'%'

**tamilselvan** · 09-09-2005, 10:12 AM

How about reverse key index on the digits col?

Tamil

**jmodic** · 09-09-2005, 05:47 PM

Originally Posted by tamilselvan

How about reverse key index on the digits col?

How would reverse key index help with the *SELECT* statement? And in particular, how would reverse key index help with the query that uses *LIKE* operator in the predicate on the indexed column?

**tamilselvan** · 09-10-2005, 11:40 AM

===
digits is the column that contains the list of numbers to match to

tblnm is a column that specifies a group of entries that make up a "route table" form the telephone company's standpoint.

6022241234 as an example I plugged in and used for testing, in the app we use whatever the user dials
it works but it takes close to 1/2 second to execute which in the telephony world is wholey unacceptable
====

If the digits col is used for storing the telephone number, then REVERSE KEY INDEX will help to retrive the row fast. When the user enters full digits for a telephone number, then this will work. I just give an option. The original poster concerns about performance. 0.5 second is not acceptable.

I know we cannot use LIKE operator with the REVERSE KEY INDEX.

Tamil

**jmodic** · 09-11-2005, 05:31 AM

Originally Posted by tamilselvan

If the digits col is used for storing the telephone number, then REVERSE KEY INDEX will help to retrive the row fast. When the user enters full digits for a telephone number, then this will work.

But why a *REVERSE KEY* index? Why not an ordinar B*Tree index? You say: "When the user enters full digits for a telephone number, then reverse key index will help to retrive the row fast". But if the same index was made as a normal one (i.e. non-reverse key), then the row retrival will be even faster, no question about that. So my question remains: Why *REVERSE KEY* index? It doesn't make sence at all in the context of the original question.

**tabreaz** · 09-11-2005, 09:12 AM

4 (country zone 4)
44 (UK)
4424 (UK London)
442467 (UK London Mobile)

The other approach is to group the data with certain category by adding a column.

as i said u can add a column with categorized the data, for example

U can add one column let us say category
For international number this column can contain one flag say one
For national number it can contain other value.

Like that

If u know the type of call then u can include that category in SQL statement. and u can add an index on category

Thread: Performance trying to do a 'best match'

Thread Tools

Display

Performance trying to do a 'best match'

Posting Permissions