I have to modify a Datawarehouse and adapt it to multi-company. Basically this is what I should do
a Sales fact table, the Group was one company but after some merges it becomes three companies. So now they have to do rollup and drill down analysis based on companies, product category and Category type
company ===> 3 companies
product category ===> engine oil, fuel etc
category type ===> Services, marketing product, finishing material etc
The new aggreates can be based on companies, product category and/or category type
Product category (these are codes)
Category type (these are codes)
The problem is any company can or NOT to share the product category codes and category type codes. I.E
Company_A, 10, 01 can be same as Company_B, 10, 01 or Company_C, 20, 02
My question how on earth would the dimensions defined? I was thinking to define three dimensions however since the codes by themselves EXCEPT the company code are not UNIQUE (The unique key would be the combination of all three) so there is no way to isolate product category and category type in their dimension (If my understanding of dimension is correct)
I am thinking to simply add the three columns to the fact table and byebye to the dimensions...
Do you mean that the same code values are shared among the companies, but that knowing the meaning of the codes requires that you know the company also, because the sharedcode may mean something different in each company?
The codes can have or not the same meaning in different company so basically is to get a unique code we require the company code too
For example code 01 in company_A can be Fuel and 01 in company_B can be engine oil and 01 in company_C can mean something else OR same to any in company_A or company_B
So the PK would be company_code, product_category_code and category_type_code. To make a dimension on these three columns would be pointless IMHO because these three columns have to go to the fact table, also would be pointless because the analysis can be based on any of these three columns or two of them or all of them
This is a tricky business then.
A conventional way of dealing with this would be to translate the codes to a new, common basis. So if companies A, B, and C all have different codes for the product category "fuel", you would define a single standard "fuel" code (and it need bear no relation to the existing codes -- in fact, it's probably better if it doesn't, in order to avoid confusion). It's alsways nice if you can find some industry standard set of codes to map to, though it's not always possible to do so.
If you really require that the companies be able to view their native codes then you could take one of two approaches.
Approach 1 is to make the company's native codes children of the standard code, and do the company/code concatanation thing to make them unique. Only the company native code would appear in the fact table -- the standard code would be related to the native code through the dimension table.
Approach 2 is to make the standard and native codes completely independant of each other by storing both in the fact table and maintaining separate dimensions tables for each.
Whatever approach you take i would still maintain the company as a separate column from the product category in the fact table.
It's stuff like this that means 70-80% of the DW effort is devoted to ETL coding.
Thx for the replies.
I just found out today that the codes are unique, i.e company_id is unique, product_category_id is unique and category_type_id is unique as well. Thing seems better now.
Now I have some denormalization/normalization query. After knowing these three attributes can be three dimensions by themselves I am proceeding to create three dimensions, one for company, one for product_category and one for category_type. The SALES fact table would have all these three dimensions PK and corresponding FK to these dimensions. The query is there is a customer table which supposes to have all these three dimensions PKs as well because before adding these dimensions customer_id was the PK of customer table but from now on the combination of
is the new PK of customer table, I wonder should I create relationship between customer table and the three new dimensions? In OLTP I would do it but this is DWH and I am not so sure (I am not a modeling expert nor OLTP or DWH). Any suggestions :-?
The usual DWH practice is to code the ETL procedure to guarantee referential and primary key integrity. In such cases you define the PK and FK constraints but make them ...
The last setting will tell the optimizer to assume integrity even though the constraint is neither enforced nor validated by the db.
You also save space on defining a pk index, which is often as large as the table itself.
Is there a direct relationship between product category and category type, like type is the parent of category?
No, there is no direct relationship between product category and category type...
By the way the company will only have 5 rows, product category 20 and category type 10... Not sure if they should be dimensions with so few number of rows! And simply add the ID columns and the description columns of all these three in the customer table then simply in the ETL process do the integrity checks
Last edited by pando; 11-22-2003 at 04:16 AM.
Nothing wrong with so few rows -- I have many dimensions of just two or three values. That's what them bitmap indexes are for.
Click Here to Expand Forum to Full Width