Optimizing for unpredictable selectivity

Printable View

10-04-2006, 05:30 PM
KenEwald

Optimizing for unpredictable selectivity

I have a query which uses a name column for search criteria.

It performs nicely if the name is obscure. If the name is common, selectivity suffers.

What is the solution for this problem?
10-04-2006, 06:07 PM
jhmartin

You'll need to provide more details. I find it hard to believe that a basic query against a table containing names suffers significantly based on the name. How many rows are in the table? How many rows are in the worst-case result?

I'm more inclined to think that you are joining this table with something else, and the 'something else' scales poorly based on the number of names matched. You'll have to post the query and the explain plan output for help with that.
10-04-2006, 06:21 PM
KenEwald

You are quite right, it is a question of scale and, yes, I am joining this table with several other tables.

The core driving table (that has the columns for which I have criteria) is a very large table. If I use 'SMITH' in my query, it takes f.o.r.e.v.e.r (100+ seconds), if I use 'FUZ', it returns in 3 seconds.

I apologize, but I can't post the query. Don't you think the bottom line is the issue of selectivity?

Would Oracle Text help here?
10-04-2006, 06:59 PM
jhmartin

I'm still inclined to say 'no' -- how does a query against the name table only behave, without the join?
10-04-2006, 10:31 PM
slimdave

Quote:

Originally Posted by KenEwald

You are quite right, it is a question of scale and, yes, I am joining this table with several other tables.

The core driving table (that has the columns for which I have criteria) is a very large table. If I use 'SMITH' in my query, it takes f.o.r.e.v.e.r (100+ seconds), if I use 'FUZ', it returns in 3 seconds.

I apologize, but I can't post the query. Don't you think the bottom line is the issue of selectivity?

Would Oracle Text help here?

Does the explain plan change between the SMITH and the FUZ examples? Feel free to post them :)
10-05-2006, 10:45 AM
Thomasps

A case study on this..
https://metalink.oracle.com/metalink..._Kanagaraj.pdf
10-05-2006, 12:02 PM
jhmartin

Interesting read. It sounds like a historgram on the name column is indicated here, IF 'SMITH' takes up more than 1.3% of the table. I'd still like to see the distribution of values for that table.
10-05-2006, 04:26 PM
tamilselvan

Quote:

Originally Posted by KenEwald

You are quite right, it is a question of scale and, yes, I am joining this table with several other tables.

The core driving table (that has the columns for which I have criteria) is a very large table. If I use 'SMITH' in my query, it takes f.o.r.e.v.e.r (100+ seconds), if I use 'FUZ', it returns in 3 seconds.

I apologize, but I can't post the query. Don't you think the bottom line is the issue of selectivity?

Would Oracle Text help here?

The bottom line is cardinality that oracle expects to reurn not the slectivity.

Do you use bind variable for the name column?
If yes, peeking the bind variable may also affect the performance.
For example, first time the variable gets 'FUZ" value, and the sql is parsed and optimized the execution (say NL join), the next time if it gets "SMITH' value, then Oracle will still do NL join, may be in this case FULL TABLESCAN and HASH JOIN may outperform NL join.

Atleast you can post the plan after the changing the name of the tables, if you want to hide the SQL.

Tamil
10-06-2006, 05:52 AM
mike9

I would recommand you, in this case, not to use bind variables for the name and force Oracle to use dynamic sampling.

It will probably slowdown your "fast" queries a bit but because they already take 3 seconds I don't think that you will notice it to much.
10-06-2006, 07:23 AM
waitecj

Quote:

Originally Posted by KenEwald

You are quite right, it is a question of scale and, yes, I am joining this table with several other tables.

The core driving table (that has the columns for which I have criteria) is a very large table. If I use 'SMITH' in my query, it takes f.o.r.e.v.e.r (100+ seconds), if I use 'FUZ', it returns in 3 seconds.

I apologize, but I can't post the query. Don't you think the bottom line is the issue of selectivity?

Would Oracle Text help here?

Are these "fuzzy" matches, or are you searching for the entire name with an equality operator? If fuzzy, are you using any functions such as soundex to run with this? I have found that (using soundex) searching for "SMIT%" always takes considerably longer than searching for "FUZ%" (or whatever), even with histograms.

Oracle Text may or may not help if you are fuzzy matching. It depends of how often DML is performed on columns that have Text Indexes (different to "normal indexes"), and whether the overhead of maintaining the indexes is worth your while (EXEC CTX_DDL.SYNC_INDEX('idx')). To be honest, i'm not sure if using it for a column such as surname is the right way to go anyway (more designed for columns with 100s or 1000s of character values).
10-06-2006, 08:03 AM
mike9

I understood it as:
Searching for the entire name with an equality operator
10-06-2006, 08:21 AM
waitecj

Quite possibly. I just wondered after his comment about Oracle Text and the fact we can't see the query, whether we were missing something and just making assumptions as a result. Thats why I asked.
10-06-2006, 11:37 AM
KenEwald

Thank you all for your replies. Great discussion.

Here's the idea:
All of these tables are BIG. The smallest are names and clients. The really big tables are sandwiched beteen these and join by a single key. The criteria is generally an UPPER() and a LIKE() search on one of 3 name columns and a key id search on clients.

I think I've tweaked the queries every way possible.

The row stitching seems to be the real problem here. There's just so many rows to put together to end up with so few.

I'm thinking this is the right time to migrate to a denormalized and pre-aggregated environment. So, yesterday/last night I created a de-normalized table. Seems too simple and wasteful, but queries are so much more efficient.

Ken
10-09-2006, 03:04 PM
jhmartin

Do you have a function index on UPPER() columns? That can really speed things up.
10-09-2006, 03:31 PM
pando

There is not much you can do, I was in a Peoplesoft CRM implementation a couple of years ago, the users were allowed to search by Name and Surname using LIKE, when they hit those popular Surnames (quite often since they are so popular) which returnes 250000 customers the response time was simply awful. Was full scan a better approach? Well no because the tables the query accesed had over 20 million rows.

The fix? More criteria were forced and LIKE was got rid of. Entire Name or Surname must be entered.
10-09-2006, 07:03 PM
KenEwald

Pando, sounds like you and I have walked in the same mire...

The query the app is running is limited to returning 1000 rows (rownum < 1001). That forces them to use client ID or some other criteria. The users quickly learn that 1000 Smiths does them no good.

Yes, we did try Function Based Indexes. Some help, but not really the magic bullet in this case.

Another DBA is actually implementing the denormalization solution with a materialized view. An absolutely great idea in this case. It refreshes from the mv logs and .. on demand, so we can regulate refreshes during peak usage. And, we can create all kinds of crazy indexes on the MV.

This was a good exercise for all of us. Thank you all for your interest and input.

Ken
10-10-2006, 04:16 AM
yxez

i still wish you share your query....just change the table name to EMP
or DEPT or SAL. you made us all curious and thinking of it so much
that we cannot sleep at night. :rolleyes: