DBAsupport.com Forums - Powered by vBulletin
Page 1 of 2 12 LastLast
Results 1 to 10 of 14

Thread: Indexing only DISTINCT values

  1. #1
    Join Date
    Sep 2000
    Location
    Sao Paulo,SP,Brazil, Earth, Milky Way
    Posts
    350

    Indexing only DISTINCT values

    In a 9.2.0.7 EE database, I have a big table(tens of millions rows), the table´s PK is composite (ie, field1 + field2 + DATE_PROCESS + field4). DATE_PROCESS have many thousands of possible values, and there is a lot of repetitions - say, some hundreds of records with DATE_PROCESS 01-JAN-2007, some hundreds with 02-JAN-2007...
    So far, so good, BUT I will have a new requirement, respond to queries like MAX(DATE_PROCESS), TOP-n DATE_PROCESS, etc. Of course, today DATE_PROCESS is "inside" the index, so the best what I can do is a index scan, not acceptable for performance - the answer, obviously, is create a new index with DATE_PROCESS only, ok, ** BUT ** in this case the index will be enormous, as I said I have a lot of repetitions for each DATE_PROCESS, and this is an overhead to my case, logically I need an index of ONLY the distinct values of DATE_PROCESS.... This is my question, therefore : how can I do something like that, maybe some trick with FBIs ? Of course, I know that I can compress index keys, compacting the repeated values, OR I can make a MV with distinct-only values and index that, BUT all these option implies in more space than the bare minimum, logically speaking.... IF I could have an index built only with the desired, distinct values, without repeating the data in MVs or alike, I would be a happy camper...

    regards,

    Chiappa

  2. #2
    Join Date
    Mar 2007
    Location
    Ft. Lauderdale, FL
    Posts
    3,555
    Quote Originally Posted by JChiappa
    IF I could have an index built only with the desired, distinct values
    think a little... how is going Oracle to point to rows that are not in your "magic" index but you want them to be pointed at?

    Question... how's cardinality for your DATE_PROCESS column?... low cardinality might lead you to try bitmap index and have all the fun in the world with it.
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

  3. #3
    Join Date
    May 2000
    Location
    ATLANTA, GA, USA
    Posts
    3,135
    My understanding of your problem is:
    Table Data
    ========
    ROWID DATEPROCESS
    ---------------------
    ROW-1 10-AUG-2007
    ROW-2 10-AUG-2007
    ROW-3 11-AUG-2007
    ROW-4 12-AUG-2007
    ROW-5 13-AUG-2007
    ROW-6 13-AUG-2007

    ROW-1 and ROW-2 have the same date and ROW-5 and ROW-6 have the same date.

    You want to create an index that has ROW-3 and ROW-4 only.

    AM I RIGHT?

  4. #4
    Join Date
    Sep 2000
    Location
    Sao Paulo,SP,Brazil, Earth, Milky Way
    Posts
    350
    PAVB, think about a Function Based b*tree index, in it you can choose what rows goes to the index and what do not, just returning a NULL for the non-desired rows, nulls are not indexed, we know . What I want is something like that, I would choose to index only one of the records with DT_PROCESS=10-AUG-2007 (any of them) , only one of the records with DT_PROCESS=11-AUG-2007, and so on. This kind of index would serve to find the max/top n withou repetition DT_PROCESS quickly, and using the smallest area possible (the repeated values does not matters, to this query).
    Tamil, no : with your sample data, what I want is an index with one key entry to '10-AUG-2007' (could be row #1 or #2, does not matters, BUT notto both), one entry to '11-AUG-2007', one entry to '12-AUG-2007', and one entry with '13-AUG-2007' (pointing to row #5 or to row #6, any of them BUT not both).... To the extent of my knowledge, this can´t be done (what is possible is the options what I said in the original post and derivations of them) , but anyway I´m asking just to be sure....

    regards,

    Chiappa

  5. #5
    Join Date
    Mar 2002
    Posts
    534
    what is the problem of having an index on a table with ten of millions of rows?


    >** BUT ** in this case the index will be enormous, as I said I have a lot of
    > repetitions for each DATE_PROCESS, and this is an overhead to my case

    what kind of overhead?


    Did you have a look at bitmap indexes (as PAVB already said)?
    Are you using table compression?
    Have you considered using range partitioning?

  6. #6
    Join Date
    Nov 2006
    Location
    Sofia
    Posts
    630
    I have the feeling you are trying to use index instead of MView
    create mview about that and it can solve your problem

  7. #7
    Join Date
    Sep 2000
    Location
    Sao Paulo,SP,Brazil, Earth, Milky Way
    Posts
    350
    Bore, of course, as I said in the original message, for sure an indexed MV ** or ** an non-unique compressed index on DT_PROCESS solves my problem (ie, to find max values and/or top-n max values quickly via an indexed search), but none represents the bare minimum extra disk space usage : with the MV option, I will have a segment for the MV data - the distinct records -, and another to the MV index, AND using a "normal" b*tree index on DT_PROCESS , even compressing , due to the relatively large number of key repetitions , I will for sure end with more disk space consumption than a "selective" index containing only the records of interest. As I said, probably this kind of index can´t be done, I will have to use one of the options, but to be sure I´m asking, one never knows, I do not know everything about every possibility, maybe someone suggests some clever way to do ...
    Mike9, no problem per se, I was just trying to evitate it, in this system (a large DW-one) disk space is at a premium, each and every "space economy" if possible/viable/not performance-impacting must be done....So the "overhead" is just more disk usage than the bare minumum, maybe "overhead" was a bad choice of words... Table compression yes, and index compression is a possibility (like I said above), but NO to partition by date (the table is already partitioned by other keys) and NO to bitmap indexes (the table suffers from relatively constant DMLs) - by the way, even if via some "clever" trick I could obtaint an "selective" index like I said, the fact of DML ocurring will be a major impediment . Imagine, using some "trick" think I have an index with only the distinct values (ie, '10-AUG-2007' (could be row #1 or #2, does not matters, BUT not both), one entry to '11-AUG-2007', one entry to '12-AUG-2007', and one entry with '13-AUG-2007'), ok. What happens when the choosed record for, say, '10-AUG-2007' is deleted in the table, how Oracle would "locate" the other record for '10-AUG-2007' and "substitute" it in the index ?? yep, in each way this "requirement" begins to show really impossible, I will need to use some of the other options....

    regards,

    Chiappa

  8. #8
    Join Date
    May 2000
    Location
    ATLANTA, GA, USA
    Posts
    3,135
    Code:
    SQL> 
    SQL> create table tamil (id number(6), rdate date);
    
    Table created.
    
    SQL> 
    SQL> 
    SQL> create or replace function tamil_date_fn
      2  ( p_date IN Date) return number
      3  DETERMINISTIC
      4  as
      5  PRAGMA AUTONOMOUS_TRANSACTION;
      6  p_num int;
      7  begin
      8  	begin
      9  	  select count(*) into p_num from tamil
     10  	   where tamil.rdate = p_date;
     11  	exception when no_data_found then
     12  	  p_num := 0 ;
     13  	end;
     14  	return p_num;
     15  end;
     16  /
    
    Function created.
    
    
    SQL> 
    SQL> create index tamil_idx on tamil
      2  ( case when tamil_date_fn(rdate) = 0 then rdate else null end);
    
    Index created.
    
    SQL> 
    SQL> validate index tamil_idx;
    
    Index analyzed.
    
    SQL> 
    SQL> select NAME, LF_ROWS , DISTINCT_KEYS from index_stats;
    TAMIL_IDX                               0             0                                 
    
    1 row selected.
    
    SQL> 
    SQL> 
    SQL> insert into tamil values (1, trunc(sysdate - 8));
    
    1 row created.
    
    SQL> commit;
    
    Commit complete.
    
    SQL> insert into tamil values (2, trunc(sysdate - 8));
    
    1 row created.
    
    SQL> commit;
    
    Commit complete.
    
    SQL> insert into tamil values (3, trunc(sysdate - 7));
    
    1 row created.
    
    SQL> commit;
    
    Commit complete.
    
    SQL> insert into tamil values (4, trunc(sysdate - 6));
    
    1 row created.
    
    SQL> commit;
    
    Commit complete.
    
    SQL> insert into tamil values (5, trunc(sysdate - 6));
    
    1 row created.
    
    SQL> commit;
    
    Commit complete.
    
    SQL> insert into tamil values (6, trunc(sysdate - 5));
    
    1 row created.
    
    SQL> commit;
    
    Commit complete.
    
    SQL> insert into tamil values (7, trunc(sysdate - 4));
    
    1 row created.
    
    SQL> commit;
    
    Commit complete.
    
    SQL> 
    SQL> select * from tamil;
             1 2007-AUG-08 00:00:00                                                         
             2 2007-AUG-08 00:00:00                                                         
             3 2007-AUG-09 00:00:00                                                         
             4 2007-AUG-10 00:00:00                                                         
             5 2007-AUG-10 00:00:00                                                         
             6 2007-AUG-11 00:00:00                                                         
             7 2007-AUG-12 00:00:00                                                         
    
    7 rows selected.
    
    SQL> 
    SQL> validate index tamil_idx;
    
    Index analyzed.
    
    SQL> 
    SQL> select NAME, LF_ROWS , DISTINCT_KEYS from index_stats;
    TAMIL_IDX                               5             5                                 
    
    1 row selected.
    There are 7 rows in the table, but 5 entries in the index.

    Warning:

    1. Commit is needed after one row is inserted.
    2. Bulk commit may not update the index.
    3. Index rebuild may not work.

  9. #9
    Join Date
    Mar 2002
    Posts
    534
    Quote Originally Posted by JChiappa
    ...
    in this system (a large DW-one) disk space is at a premium, each and every "space economy" if possible/viable/not performance-impacting must be done....So the "overhead" is just more disk usage than the bare minumum, maybe "overhead" was a bad choice of words...
    From my point of view it's, for a DWH system, not disk space which is a premium but the performance of the disk system.

    Also I don't see the big space impact you would have creating a simple btree index over a date field (with probably less then 10000 distinct values). I think that even with 100M rows you should not use much more then 3GB.

    An alternative, without using any additional disk space, would be to recreate the PK index like ( DATE_PROCESS + field1 + field2 + field4).

  10. #10
    Join Date
    Sep 2000
    Location
    Sao Paulo,SP,Brazil, Earth, Milky Way
    Posts
    350
    Ops, sorry for the long delay, people... Mike9, yes, a very viable alternative could be put the desired filed in the "point" of the index, getting an index range-scan, yes, BUT not possible, other queries neeed filed 1 as the first field in the index... And yes, we are talking about shave a few Gbs only, and like I said, yes, performance is the first point, BUT "space economy if possible/viable/not performance-impacting must be done", this is common-sense in this kind of env....Specially when (like we do) the disk space is LOCATED from others, each and every Gb used results in a bigger invoice to be paid... So, IF POSSIBLE, I must to do any savings, IF not interfering with performance. And this will be the point determining the inviability of the whole idea - I could not to create a FBI in my tests, but reading Tamil´s post I see the use of AUTONOMOUS TRANSACTION to do it, for sure this will impact enormously in my performance, will inviablize index updatings (will demands a full index REBUILD each time), so I´m just foregtting about the whole thing, no free lunch here, no "ultra-clever" ideas....
    I will measure the consumption (and the time being spent in index refreshing after DMLs) with the compressed b*tree option and the Mv option, and choose the best one.

    Regards to all,

    Chiappa

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Click Here to Expand Forum to Full Width