Concatenated Index on Fact Table
DBAsupport.com Forums - Powered by vBulletin
Results 1 to 8 of 8

Thread: Concatenated Index on Fact Table

  1. #1
    Join Date
    Apr 2001
    Location
    UK
    Posts
    137
    I was reading the performance manual section on the STAR hint and it said that this works best if you have a concatenated index on the fact table of your data warehouse.

    This surprised me as I'd always thought concatenated indexes were frowned on in data warehousing. I thought you are supposed to use lots of single (preferably bitmap) indexes on the fact table. Does anyone actually find a benefit in creating concatenated indexes ?

  2. #2
    Join Date
    May 2000
    Location
    ATLANTA, GA, USA
    Posts
    3,135
    I agree your point. Single column indices are normally used on fact tables, because each SQL statement may contain different columns in the WHERE clause.

  3. #3
    Join Date
    Dec 2000
    Location
    Ljubljana, Slovenia
    Posts
    4,439
    AFAIK in DW environments, it is perfectly normal practice (not only a normal practice, but rather a requirenment, according to Ralph Kimbal, the "preacher of dimension modeling") for fact table to have composite PKs, consisting of all the PK columns of the dimension tables it references via FKs.

    The rules are:
    - dimension tables have one-column primary keys (strictly surrogate ******* key values)
    - fact tables have composite primary keys composed by all dimensions PK columns; fact tables don't have a single-column surrogat keys as there is no chance any query will ever reference that surrogat key, neither as a limiting condition nor as a joining condition!

    So it is very important how you order the columns in your fact's primary key. IN DW tipically every query will contain a limiting condition about the time portion, so very comonly you'll find time dimension key as a leading column in fact's PK.
    Jurij Modic
    ASCII a stupid question, get a stupid ANSI
    24 hours in a day .... 24 beer in a case .... coincidence?

  4. #4
    Join Date
    Nov 2000
    Location
    Baltimore, MD USA
    Posts
    1,339
    The problem is that Kimbal's book pre-dates Oracle's Star Schema optimization method. Basically, you do, indeed, want each dimension table to have a single field, numeric PK. The Fact table then contains many such FK fields - 1 to each dimension. It is conceivable, then, that you make a PK of the combination of all these fields. That would be a B*Tree index. Generally, that should be the only B*Tree index on the table. The rest of the FKs should have bitmap indexes on them.

    The deal is that you have no idea what dimensions the user is going to be restricting by. Even if they 'usually' restrict by date, they may only include the second column in that index 1/10th of the time (given 10 or more FKs maybe) . The rest of the queries would then suck.

    Therefore, if using standard B*Tree indexes, one must index many permutations of FK fields in order to get as much 'coverage' of potential queries as possible.

    Now, stepping back. The fact table in most (mature) warehouses accounts for 95+% of the space. The fact table is generally made up of many FK fields and a few numeric facts. This means that a B*Tree index on all the fields is going to be nearly as big as the table itself. Now think about how many times larger your database is going to have to get to hold indexes for multiple permutations of multiple FKs in the fact table.

    Solution to all these problems: Bitmap indexes. You make a single-field, bitmap index on each single-field FK in the fact table.

    Now, you can restrict by whatever dimensions you want. The optimizer is going to do cartesian joins on all the dimensions first (never hit the fact table until end). Now, each applicable field in that result set will hit its own bitmap index (since multiple can be used to satisfy a query). Then, the optimizer ANDs all these bitmap strings together and comes up with a list of exactly which records in the fact table will satisfy the query - before ever touching the table. So, the processing is very fast. As a bonus, we always know exactly how many indexes we need. With B*Tree, building one for every permutation is often not feasible, so you always have some combinations that will perform very poorly. You sometimes have to juggle these indexes as queries change. But with bitmaps:

    - 1 per FK - done forever
    - if 5 Dimensions, only 5 Bitmap indexes, where there are 5! permutations to consider for B*Tree indexes.
    - Bitmap indexes are VERY small.

    Multiple benefits. The bitmap solution is so powerful that it should be considered the default solution for warehouses. If you want to use B*Tree indexes, you better have a very good reason.

    But I digress

    - Chris

    [Edited by chrisrlong on 06-07-2001 at 06:42 PM]

  5. #5
    Join Date
    Dec 2000
    Location
    Ljubljana, Slovenia
    Posts
    4,439
    I agree with everything Chris sais (as always ) about using bitmap indexes in fact tables. But the following is cruical in this thread:
    Originally posted by chrisrlong
    If you want to use B*Tree indexes, you better have a very good reason.
    The original question was concentrated on the isue of "having composite or single column *primary key* on the fact table". So it is about PK. Bitmaps can not be used for PK constraints, you have to have B*tree index - and this is "a very good reason" to have B*tree index.

    So there are only two options with PKs:
    a) introduce another column in a fact table and populate it with surrogat sequentially-populated values
    b) use existing dimension columns which allready *should* form unique combinations and create a composite PK

    With option a) the primary key index will *never* be used by an optimizer, its only purpose will be to satisfy the rule that each table should have a primary key. Sure, nobody forces you to have one, but I personally feel very uncomfortale having production table without a PK. And by simply having a sequencial single-column PK doesn't realy ensures the uniqness of dimension combinations of the facts, so it even does not serve the primary purpose of the primary key!

    With option b) you get very large index, probably larger than table itself. True, but even if you have PK created with option a) it will probably be bigger then a table itself and probably not so much bigger tha index under b). Remember that in both cases ROWIDs will take up 18 bytes per row, while dimension values typicaly occupy only 2 to 3 bytes. And additionaly, if you take approach a), you are adding to the table size itself, as you introduced a new column to it! Index in option a) will *never* be used, while in option b) it *might* be used by the optimizer.

    If the size of composite index is realy a matter of concern then you should simply make a fact as index organized table. I'm not sure what are additional isues with IOTs regarding the partitioning and parallelism and similar stuff, but with 8i you can have additional indexes on them, hopefully also bitmaps (if bitmap indexes are not allowed on IOT then of course the fact tables are not the right candidates for IOTs).

    One more thing. I might be wrong with this, but I seem to remember that oracle's optimizer will not properly recognized the star schema without a hint if fact table's PK is not composed of all the dimension's primary key columns!

    [Edited by jmodic on 06-08-2001 at 09:22 AM]
    Jurij Modic
    ASCII a stupid question, get a stupid ANSI
    24 hours in a day .... 24 beer in a case .... coincidence?

  6. #6
    Join Date
    Nov 2000
    Location
    Baltimore, MD USA
    Posts
    1,339
    Originally posted by chrisrlong
    you make a PK of the combination of all these fields. That would be a B*Tree index. Generally, that should be the only B*Tree index on the table.
    Sorry if I glossed over that point too quickly. Mr. Modic is absolutely correct about the need for B*Tree indexes for PK constraint. So, in the statement of mine that was quoted, I should have added the disclaimer 'except for the PK'.

    What's funny is that the question was not originally about the PK - it was about composite indexes in general on fact table. I actually went to the documentation for the STAR hint and they do, indeed, talk about the composite indexes on the fact table (no mention of PK anywhere) and was astounded. I was also somewhat intrigued. I need to do some more investigating into that hint. Anyway, Jurij made the assumption in his first reply that the thread was about PKs and answered on that basis. I then missed his PK references and answered in general terms. We crossed paths a bit .

    I agree with the points about the PKs.

    I'm interested in the IOT theory. Note that 9i has *much* better support for these. I think they will really come into their own in 9i - *finally*. Another major adjustment coming from SQLServer was the lack of these in Oracle. In SQLServer, they were so fully integrated and well supported (from back when they were still part of Sybase), that most tables were IOTs - it was just a question of which column(s) to use for the index.

    I'd be interested in getting more people's opinions and experiences on Star Schema designs and implementations. I think I'll start a new thread for that.

    Jurij - Great discussions, as always!

    - Chris


  7. #7
    Join Date
    May 2000
    Location
    ATLANTA, GA, USA
    Posts
    3,135
    Jmodic,
    "The original question was concentrated on the isue of "having composite or single column *primary key* on the fact table". So it is about PK. Bitmaps can not be used for PK constraints, you have to have B*tree index - and this is "a very good reason" to have B*tree index. "

    The original question was not on PK. It was about just composite index.


  8. #8
    Join Date
    Dec 2000
    Location
    Ljubljana, Slovenia
    Posts
    4,439
    Yeah, I realized this when Chris pointed out this in his reply.

    However if you read the first post in this thread it should be pretty selfunderstandable that oracle documentation is talking about PK concatenated index. How in the world would any other concatenated index on the fact table be even considered to be used by the optimizer in conjunction with STAR hint? I mean, when I've read the original post it was perfectly clear to me that docs are talking about concatenated *PK* indexes.

    Chris has put it realy nicely:
    We crossed paths a bit .
    Jurij Modic
    ASCII a stupid question, get a stupid ANSI
    24 hours in a day .... 24 beer in a case .... coincidence?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Click Here to Expand Forum to Full Width