-
index clustering
Hi,
I'm working on a DWH project using 9iR2.
I was wondering how "important" the clustering factor of an index is (in a DWH environment)?
During the ETL process is it worth to order the data which are inserted into a new partition over the PK columns to reduce the clustering factor of the PK Index?
Thanks for any feedback
Mike
-
Re: index clustering
Originally posted by mike9
I was wondering how "important" the clustering factor of an index is (in a DWH environment)?
During the ETL process is it worth to order the data which are inserted into a new partition over the PK columns to reduce the clustering factor of the PK Index?
Well even if you order your data while inserts how will you gurantee that the for subseqent inserts the data will go to the blocks having same key vaules or may me keys close to it.
PS i assume you insertion table is not of flush fill kinda logic..
Abhay.
funky...
"I Dont Want To Follow A Path, I would Rather Go Where There Is No Path And Leave A Trail."
"Ego is the worst thing many have, try to overcome it & you will be the best, if not good, person on this earth"
-
the data are inserted via a serial
insert /*+ append noparallel(...)*/ select ... order by pk_column
-
i dont see any point with order by in such cases..
BTW, why do you guys use APPEND by which its sure that u will waste so much of space, as APPEND will start inserting recs after HWM.. and also sapce is not reused after subseqent deletes..
Abhay.
funky...
"I Dont Want To Follow A Path, I would Rather Go Where There Is No Path And Leave A Trail."
"Ego is the worst thing many have, try to overcome it & you will be the best, if not good, person on this earth"
-
hi Abhay,
Thanks for your feedback.
I dont think that we lose a lot of space because, it's a dwh and we never delete data only load, in most cases, only once a month new data. So as far as I understand we will just lose on data block for each new insert append.
But my main question was related to the clustering of an index and if it was worth to order the inserted data by the column of the pk to reduce the clustering factor. I was also wondering which is the real impact of index clustering in a DWH environment where most queries run over 5 minutes.
-
Originally posted by mike9
But my main question was related to the clustering of an index and if it was worth to order the inserted data by the column of the pk to reduce the clustering factor. I was also wondering which is the real impact of index clustering in a DWH environment where most queries run over 5 minutes.
As i said before, there is no point in ordering your data before inserts particularly in the scenario where the TAB is not of flush fill type..
Abhay.
funky...
"I Dont Want To Follow A Path, I would Rather Go Where There Is No Path And Leave A Trail."
"Ego is the worst thing many have, try to overcome it & you will be the best, if not good, person on this earth"
-
a last question what does "flush fill type" mean?
-
Originally posted by mike9
a last question what does "flush fill type" mean?
I was wondering the same thing.
The purpose of physical row ordering is to cluster together the values of a column or columns that are common reporting conditions. For example, if you are querying on ITEM#=101 then physical ordering on ITEM# will reduce the number of blocks holding the values of 101 and therefore speed the query.
It is definately worthwhile, but the choice of columns to order by has to be made carefully.
-
Originally posted by abhaysk
As i said before, there is no point in ordering your data before inserts
Sure there is, it's a cluster.
Jeff Hunter
-
well by "flush and fill" i meant .. truncate data and load afresh for every batch..
if loading into the table is not of that kinda, then for every 'batch inserts' the keys are going to get loosely packed.. so during a course of time it makes no sense, as your clsuter factor is going to go up for every Batch Inserts..
I do aggree for that part of extent occupied by a particular 'batch insert' keys will be placed close.. but again if you take Table as a whole with N extents, the Keys are/will be placed far apart from one another.. and where in Clustering Factor can cause any dramatic change in the query plan..
More over LIO are going to remain same for any operations like 'table access by index unique scans or range scans'/'fts'/'filter with range or unique scans'..
with ordered data (not really a perfect ordered one as i mentioned above) it may help in less PIO, but again if we assume we are not selecting data from different chunks of the extents..
But is it worth, to achive small benifits, do an ordering which may cause bottleneck in your Temp TS..
Abhay.
funky...
"I Dont Want To Follow A Path, I would Rather Go Where There Is No Path And Leave A Trail."
"Ego is the worst thing many have, try to overcome it & you will be the best, if not good, person on this earth"
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|