Data Warehouse Query Performance question

**gitter1226** · 11-10-2008, 11:33 AM

I'm trying to write a complicated query for a data warehouse that uses the star schema. One of the fact tables holds inventory history data for many purposes. One of those purposes is history of inventory holds. One of the dimensions for this fact table contains a flag that, when set to true indicates the inventory is put on hold, when set to false indicates the inventory is taken off of hold, and when not set indicates that this fact record doesn't pertain to a hold.

The query I'm writing is for a report that will show inventory that was placed on hold and how many of those have been taken off hold, indicating the remaining quantity that needs to be handled. To further complicate matters, it is possible to "handle" the On-Hold inventory in increments. In other words, you could place 100 parts on hold at once but then the released quantity could come back one at a time, 10 at a time, or all 100 at once. That means I need to sum up the released quantity on the fact records as they pertain to the released holds.

Here is the solution I've come up with:

I created one view called the init_hold_view that fetches the fact records pertaining to initialized holds (*note that I've left out all but the relevant names*)

Code:

select [dimension_keys],
       [needed_dimension_data]
       fact.qty init_qty
  from fact_table fact
  left join [all_dimension_tables]
       on [dimension.dimension_key] = [fact.dimension_key]
 where hold_disposition = 'T'

I then created a second view called the released_hold_view that fetches the fact records pertaining to released holds.

Code:

select [dimension_keys],
       fact.qty rels_qty
  from fact_table fact
  left join [disposition_dimension]
    on [dim_key] = [fact_key]
 where hold_disposition = 'F'

My report query then uses the two views to find the results:

Code:

select init.init_qty,
       sum(rlsd.rels_qty) rels_qty,
       [needed_dimension_data]
  from init_hold_view init
 inner join released_hold_view rlsd
    on [matching dimension keys]
 where ...
 group by [dimension_data and init_qty]

My worry is that I'm performing too many joins and I need to find an optimal solution that can scale to millions of rows. I was hoping that by joining simple integer keys across views into the same table that I was saving myself some work. Looking back at it, I'm not so sure there isn't a better way. Does anyone have any suggestions on a better way to do something like this?

Thank you in advance,
J

Thread: Data Warehouse Query Performance question

Thread Tools

Display

Data Warehouse Query Performance question

Posting Permissions