-
time dimension: number or date as PK?
Hi everybody,
I'm creating a new data mart and was wondering what kind of datatype you would recommand to use for the time dimension PK.
Kimball recommands to allways use numbers but personaly I would prefere to use a date so that the range partitioning of the fact table would be simpler.
Regards
Mike
-
Never use numbers for dates; It messes up the optimizer.
Jeff Hunter
-
-
>Never use numbers for dates; It messes up the optimizer.
why should it mess up the optimizer. I only use the PK key of the time dimension to join with the FK of the fact table but never to limit the result.
-
Wouldn't the range partitioning be more simple if you used a date?
-
Originally Posted by mike9
why should it mess up the optimizer. I only use the PK key of the time dimension to join with the FK of the fact table but never to limit the result.
How many values are between 20060501 and 20060601? You say 31, the optimizer says 100.
Jeff Hunter
-
marist89,
Could you please be more precise on the definition of "values".
Why would there be only 31 values? The smallest unit of a date value is one second not one day.
So back to my question: why should it mess up the CBO?
Slimdave,
Yes I agree with you that it would be much simpler to range partition over a date. But the problem is that I have to convince some poeple that it would be better/easier not to take Kimballs way.
-
What arguments does Kimball make in support of his position?
-
Originally Posted by mike9
marist89,
Could you please be more precise on the definition of "values".
Why would there be only 31 values? The smallest unit of a date value is one second not one day.
So back to my question: why should it mess up the CBO?
In the simplest form of a date being an integer, (YYYYMMDD), the first of the month is 20060501. These dates continue until 20060531, or 31 values. However, the optimizer says these are numbers and 20060542 is a valid value, which, of course, we know is not possible.
By using integers for dates, you are feeding the optimizer bad information. This confusion compounds when you add in additional time components of HH, MI, and SS.
Jeff Hunter
-
Yes I agree with you that it would be much simpler to range partition over a date. But the problem is that I have to convince some poeple that it would be better/easier not to take Kimballs way.
I think you misunderstood the Kimball's theory.
The design should be :
Kimball might have said that a DIM table should be created with surrogate key as PK, and date/time col.
And the FACT table has FK referencing the DIM table.
There is a small space advantage if you use adopt this design.
A date col requires 7 bytes internal storage, where as the integer col in the FACT Table may not require 7 bytes.
The 2nd advantage is tomorrow you have the flexibility of changing the data type of the date col in the DIM table to timestamp data type.
Tamil
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|