DBAsupport.com Forums - Powered by vBulletin
Page 3 of 3 FirstFirst 123
Results 21 to 30 of 30

Thread: index clustering

  1. #21
    Join Date
    Mar 2002
    Posts
    534
    Hi Everybody,

    Thanks a lot for all your infromation.
    It's now pretty clear what kind of impact this index cluseting factor may have on the DB.
    I will have to do some tests to find out which are the lower/upper limit for cardinality so that it still worth to order the data.

    Slimave:
    what do you mean by:
    "there would be no point in ordering by a column that is not either heavily predicated in user queries "

    Thanks
    Mike

  2. #22
    Join Date
    Aug 2002
    Location
    Colorado Springs
    Posts
    5,253
    I mean that if the column doesn't get used pretty frequently in the user queries like MY_COLUMN=1, or MY_COLUMN BETWEEN a AND b, or MY_COLUMN IN (a,b,...x) then there would be no point in physically ordering by that MY_COLUMN.
    David Aldridge,
    "The Oracle Sponge"

    Senior Manager, Business Intelligence Development
    XM Satellite Radio
    Washington, DC

    Oracle ACE

  3. #23
    Join Date
    Dec 2002
    Location
    Bangalore ( India )
    Posts
    2,434
    Originally posted by slimdave
    Let's suppose that your table has a total of 1,000,000 rows stored in 12,500 blocks, giving 80 rows per block. You have the option of physically ordering by a column that has 360 distinct values -- lets call it COL360 -- giving about 2,778 rows per value.

    Now if the values of COL360 are perfectly unclustered, then each value will be spread over about 2,778 blocks. If you want to retrieve all rows for a single value of COL360 then you are definately in FTS (or FPS) land. This is our worst-case scenario for potential PIO's.

    If the table were physically ordered by COL360 then on average each distinct value will be contained in 35 blocks (2778/80) -- hello index range scan!

    What we are looking for here is PIO change, and we've got it. This is probably close to the perfect case for physical row ordering.
    Looks like you are limiting your thoughts beyond a single column.. Ok let me give you a simple scnaerio where in Ordering of
    physical rows can be a mess..

    Lets assume (as in any DW will be the case) you have composite column cluster/index and you order by say COL1 of that cluster
    in the Physical Rows.. Now if that table ordered by COL1 & assume it has N number of key columns its queried on..

    Now for N-1 Cols other than COL1 which are queried on will suffer as you actually are seperating from one another by ordering
    with COL1... wont that increase PIO???

    Ok even more lets assume COL1 is the only col which is queried with frequently but this is done via joins with
    other tables.. lets assume we get x% for rows from that driving table and is joined with this ordered table using
    neseted loops (as is the case with range/unique scans).. now you cant guarentee that each row from that x% rowset is
    actually ordered (which can increase your PIO as data blocks accessed previous instant may age out) or may be those rows are all together having many distinct values (and far apart) that is joined with this COL1..
    now you are actually avoiding (the chance atleast) of those random values to exist in few/single block by
    ordering your data by COL1..

    Well as far as i have seen in any DW that joins between the tables will be of many column joins... and very rarely single column join ( yes with reference tables where that column is/will be PK )..

    this was the reason i said its really doesent make sense just to order you data by your PK cols... Rather analyze your
    application/system/users as to how is this table accessed and then make a decesion...

    Originally posted by slimdave
    there would be no point in ordering by a column that is not either heavily predicated in user queries, or that has a cardinality that is too high or too low for there to be a difference.
    yup but not accessed by joining with other tables...

    Originally posted by slimdave
    Allow me to propose some theories here ...
    • The greatest benefit, in terms of the ratio of reduction in blocks over which each value is spread, is achieved when the number of distinct values is between the number of rows per block and the number of blocks per table
    • The best achievable reduction ratio in blocks-per-value is roughly equal to the number of rows per block
    • If it is more efficient to full scan a table than to index access a table in order to retrieve "1/x" of the blocks, then it is only worthwhile ordering on a column that has more than x distinct values


    Of course, these theories all apply only to single column ordering, where thevalues for that column are equi-distributed. The presence of skew in the values will improve the potential benefits of physical row ordering.
    these make sense & also more logical for date types.. as you can expect any application/user to query on date columns..

    wont it??

    Abhay.
    funky...

    "I Dont Want To Follow A Path, I would Rather Go Where There Is No Path And Leave A Trail."

    "Ego is the worst thing many have, try to overcome it & you will be the best, if not good, person on this earth"

  4. #24
    Join Date
    Mar 2002
    Posts
    534
    Slimdave,

    If that column is not used like MY_COLUMN=1 buf like
    FACT_TABLE.MY_COLUMN_FK = DIM_TABLE.PK_ID so that it is not limited by a hard coded value but by a the PK of a dimension is it also worth ot order it? Or would it be to hard for the optimizer to estimate if it's better to do a fts or not?

  5. #25
    Join Date
    Aug 2002
    Location
    Colorado Springs
    Posts
    5,253
    Originally posted by mike9
    Slimdave,

    If that column is not used like MY_COLUMN=1 buf like
    FACT_TABLE.MY_COLUMN_FK = DIM_TABLE.PK_ID so that it is not limited by a hard coded value but by a the PK of a dimension is it also worth ot order it? Or would it be to hard for the optimizer to estimate if it's better to do a fts or not?
    Yes, because your optimizer should be performing star transformations which transform the query so that the predicate is placed against the fact table instead of the dimension table.

    Even if it isn't, you'd still benefit on the join.
    David Aldridge,
    "The Oracle Sponge"

    Senior Manager, Business Intelligence Development
    XM Satellite Radio
    Washington, DC

    Oracle ACE

  6. #26
    Join Date
    Aug 2002
    Location
    Colorado Springs
    Posts
    5,253
    Lets assume (as in any DW will be the case) you have composite column cluster/index and you order by say COL1 of that cluster
    in the Physical Rows.. Now if that table ordered by COL1 & assume it has N number of key columns its queried on..

    Now for N-1 Cols other than COL1 which are queried on will suffer as you actually are seperating from one another by ordering
    with COL1... wont that increase PIO???
    I'd challenge the assumption that all DW's have composite indexes -- I never use them in a straight fact table because they are not well-liked in star transformations, and it reduces the flexibility of your indexing strategy.

    Your argument against row ordering here is based on a scenario where you are not querying on the ordered column, and therefore you are querying on columns that are not physically ordered? If the alternative is to avoid ordering on COL1, how does that improve the queries that do not involve COL1? The ordering would be random for those columns, which can hardly be much better.

    Ok even more lets assume COL1 is the only col which is queried with frequently but this is done via joins with other tables.. lets assume we get x% for rows from that driving table and is joined with this ordered table using
    neseted loops (as is the case with range/unique scans).. now you cant guarentee that each row from that x% rowset is actually ordered (which can increase your PIO as data blocks accessed previous instant may age out) or may be those rows are all together having many distinct values (and far apart) that is joined with this COL1..
    now you are actually avoiding (the chance atleast) of those random values to exist in few/single block by
    ordering your data by COL1..
    I don't think that you have much data warehousing experience. A nested loop join would only be used where the number of likely join for each row in thedimension table is very few. This is usually not the case -- a dim row would join to hundreds/thousands/millions of rows in the fact table, and oracle would use a hash join to do that.

    Your point also seems to be that therange scan would access enough blocks in the fact table that they get aged out of the SGA? I think that this is a rather contrived scenario, especially in a star transforming environment (see previous post) where predicates against dimension tables get transformed to predicates against the fact table, and the fact table becomes the driver.

    Well as far as i have seen in any DW that joins between the tables will be of many column joins... and very rarely single column join ( yes with reference tables where that column is/will be PK )..
    Yup, but oracle still knows that the data is clustered.

    Forget about PK's. I already said that this is not a PK-related technique.

    these make sense & also more logical for date types.. as you can expect any application/user to query on date columns..
    Data type has got nothing to do with this at all. It applies to all data types.
    David Aldridge,
    "The Oracle Sponge"

    Senior Manager, Business Intelligence Development
    XM Satellite Radio
    Washington, DC

    Oracle ACE

  7. #27
    Join Date
    Dec 1999
    Location
    Purgatory
    Posts
    346
    Well, I've learned a great deal from this thread. Thanks.

    Not sure about all the flirting though...

  8. #28
    Join Date
    Aug 2002
    Location
    Colorado Springs
    Posts
    5,253
    Originally posted by Horace
    Not sure about all the flirting though...
    That's the MTV generation for you.
    David Aldridge,
    "The Oracle Sponge"

    Senior Manager, Business Intelligence Development
    XM Satellite Radio
    Washington, DC

    Oracle ACE

  9. #29
    Join Date
    Dec 2002
    Location
    Bangalore ( India )
    Posts
    2,434
    Originally posted by slimdave
    I'd challenge the assumption that all DW's have composite indexes -- I never use them in a straight fact table because they are not well-liked in star transformations, and it reduces the flexibility of your indexing strategy.
    We often use and never had any problems..

    Originally posted by slimdave
    Your argument against row ordering here is based on a scenario where you are not querying on the ordered column, and therefore you are querying on columns that are not physically ordered? If the alternative is to avoid ordering on COL1, how does that improve the queries that do not involve COL1? The ordering would be random for those columns, which can hardly be much better.
    May be its hardly better, but you are avoiding the chance of it as well..


    Originally posted by slimdave
    I don't think that you have much data warehousing experience. A nested loop join would only be used where the number of likely join for each row in thedimension table is very few. This is usually not the case -- a dim row would join to hundreds/thousands/millions of rows in the fact table, and oracle would use a hash join to do that.
    Well well i think we were debating on table access by index range/unique scans where in your data is limited by some driving condition and not just join between corresponding columns of FACT/DIM tables..And in such cases where 'Range/Unique Scans' are involved oracle is bound to use Nested Loops..

    Ordering of data will totally be useless if we were to do a FTS/HASH join kinda operation.. Does experience matter??


    Originally posted by slimdave
    Your point also seems to be that therange scan would access enough blocks in the fact table that they get aged out of the SGA? I think that this is a rather contrived scenario, especially in a star transforming environment (see previous post) where predicates against dimension tables get transformed to predicates against the fact table, and the fact table becomes the driver.
    Well in any DW environment do you expect only one operation/process/query at a given instant??...
    as far as you star transformation is concerned.. the necessary items are silgle column bit map indexes.. if instead single
    column btree indexes are present?? -- then you got one more head ache i.e conversion to bit maps from rowids ...

    Well mike's case is some what different.. he has multiple column PK index.. will this benifit in star transformations??


    Originally posted by slimdave
    Forget about PK's. I already said that this is not a PK-related technique.


    Data type has got nothing to do with this at all. It applies to all data types.
    Well i said it would make sense also... do you really bother to order data if table has mixed kinda operations (update/insert/delete)..
    Ok not really the case here.. but know what you do and its benifits before you actually implement..

    Abhay.
    funky...

    "I Dont Want To Follow A Path, I would Rather Go Where There Is No Path And Leave A Trail."

    "Ego is the worst thing many have, try to overcome it & you will be the best, if not good, person on this earth"

  10. #30
    Join Date
    Aug 2002
    Location
    Colorado Springs
    Posts
    5,253
    OK OK, you've got a million different scenarios in which you think physical row ordering is of no benefit. Do you think that there are any circumstances in which physical row ordering isbeneficial?
    David Aldridge,
    "The Oracle Sponge"

    Senior Manager, Business Intelligence Development
    XM Satellite Radio
    Washington, DC

    Oracle ACE

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Click Here to Expand Forum to Full Width