I have a PS/Oracle batch sqr being tested right now but initial performance results is not acceptable. Ideas have been thrown about in the shop and the prevailing idea is to provide a range of EMPLID for 5-6 processes to work with and run all of them in parallel at night. Estimates has it we'll be able to do it in 3 hours vs 12. The program is going against the person table (450,000 employees and dependents rows) joined 1 to many to their benefits (900,000) rows. The join returns a row 760 bytes long and drives the process. It also selects the employment (300,000 rows) of each employee. All the program has to do is gather all these (with some data editing) and insert a row to an output table (will contain about 5 million). The person and benefits tables are indexed on EMPLID, Dependent #, the employemnt on EMPLID. The tables are not partitioned and cost based. The program is restartable and making commits every 10,000 person. My concerns however with the approach we seemingly is taking are 1) the aggregate run time of the processes in parralel will obviously exceed that of the single process, and will degrade other processes running the same time as well. To me that is not an improvement but simply a reduction in wall clock. 2) when one of the splits goes down and rollback occurs, it would seem likely the the rest will also go down on timeouts.

What do you guys think. Will partitioning the tables help? Will splitting the join (read each person then read the corresponding benefits) also help? I'm not sure what other index that will help. Just not comfortable with the approach, i'd like to hear alternatives or suggestions. thanks