One of our client is having an DW environment, where
300mb of weekly incremental data is applied every week to the
Data warehousing environment.
They do the below steps,
Download the data which is in the form of Dump.
ETL,(17 procedures are running) which is running for 35 hours.
( The experts of the project who were supporting this told
the time taken is correct, but we doubt.)
Cube building --RUnning for 12 hours.
Every week they are doing the above procedure.
Database -- Oracle 8i.
Since We are new to data warehousing environment and we
were asked to look into this, we need advices from experts
about the job running for long hours. We think they are doing
the entire rebuilding instead of applying the incremental data
into the required format.
I am sure that I will get immediate reply from Slimdave / Tamil..if they available.
I have designed a DW setup before that cut the usual whole process of loading new data to the DW server, regenerate the cube to up to 400%.
Here is what I did, create a dummy database that its only purpose is to load new data anytime as long as the new data(either dump, data from other db server, etc.) is available. After the new data are loaded, when DW DB server is available for use, retrieve all new data from the dummy server using dblink and then update the date range of the cubes(discover in my case at that time) for regenerating/updating the cube. And then truncate again all data from dummy server.
So, users still enjoy the use of the DW server, while the dummy database uploads the new data. And I do it every night.