Processing massive data without impacting users
Need your expertise to avoid heavy rollback. Also I dont want to impact online users.
We have a table with more than 10Million rows. Need to scan through all the rows to find the qualified records on certain criteria every month. I will not be updating this table but writing the qualified IDs into different table to process further down the line. Except Primary Key (Num 8) I dont have any other fields to separate the data in to groups. The processor can take 6-10 hours.( Rough guess). What is the best way to read all records and process them. One way (May not be efficient) I can think of is to break the PK IDs into various ranges and have another create table created with ranges. Then process each range one after another so that you will not be holding all 10M records in ur cursor.
Is this the only way or any other efficient way we can handle it?
Thanks alot in advance and I appreciate your help.
The lowest impact way of doing this would be something like ...
This will limit logging for you, and the noparalel hint will help to avoid swamping the i/o subsystem -- you could modify that up if you wish, to something like /*+ parallel(t 2) */ etc.
Nologging PctFree 0
Select /*+ noparallel(s) */
btw, there's no such concept as "holding records in a cursor".
I had EXACTLY this situation - in my case the production table was holding more than 60 millions of rows, and I needed search in all of them, finding the smaller resultset and putting it in another table (GTT, to reduce log/rollback) - and read consistency was not a issue in my case (the table was VERY static in nature).
Of course, read ALL this enormous table directly in o one-off way WILL stay processing for a while, put a heavy load in I/O sub-system, burn a lot of CPU... So , my goal was : read a number of lines (say, 100000), put them in a temporary table, close cursors, release resources and WAIT for some minutes (giving chance to the machine work for my users), then read the ** next ** 100000 rows, and so on..
I was successfull in this approach using Tom Kyte´s method as showed in http://asktom.oracle.com/pls/ask/f?p...D:127412348064
You might also look at implementing a Resource Manager solution that would prioritise other sessions above your own.
Yeah, RM is a very good option - personally I prefer (where & when possible) to do things by hand (this way I´m in control, I can reduce/raise the time interval, I can resume/suspend the session, etc, at my will and easily), BUT there are no reason to not think in RM .
I can do it faster with global temporary table with parallel insert and parallel read on the BIG table.
With the above approach, I had tuned many batch jobs.
Click Here to Expand Forum to Full Width