Click to See Complete Forum and Search --> : SQL Loader - Parallel Load with Muiltiple input files


balajiyes
11-15-2005, 05:09 AM
Hi All,

I have an SQL Loader job which loads data into my target table using direct path (Control file has 100 input files).

Now, I am planning to ADD Parallel Load option to that.

I will have 100 input files. How the PARALLEL option work here?

Will it try to load all the files PARALLELY or will load file by file PARALLELY?

Thanks

tamilselvan
11-15-2005, 12:32 PM
Hi All,

I have an SQL Loader job which loads data into my target table using direct path (Control file has 100 input files).

Now, I am planning to ADD Parallel Load option to that.

I will have 100 input files. How the PARALLEL option work here?

Will it try to load all the files PARALLELY or will load file by file PARALLELY?

Thanks

Group 100 files into 10 files (concatenate 10 files into 1) , run 10 loader processes for 10 files in parallel.

Tamil

slimdave
11-15-2005, 12:33 PM
I believe that the PARALLEL option just modifies the way that the data is inserted to the table. Instead of being inserted above the high water mark in space already allocated to the data segment, a new extent is created. This allows multiple SQL*Loader "parallel" sessions to be run at the same time, each allocating its own extents. So you actually would run multiple SQL*Loader sessions, and the PARALLEL option is just an enabling mechanism for this. If you have a lot of free space above the HWM then this is less space efficient of course.

balajiyes
11-15-2005, 02:38 PM
Hi,

This is my SQL Loader parameter.

sqlldr control=JJ_EDWLoadCDR.ctl direct=true log=JJ_EDWLoadCDR_BIND_READSIZE_ARRAY.log READSIZE=20000000 COLUMNARRAYROWS=5000

This is my LOG...

Date conversion cache disabled due to overflow (default size: 1000)

Bind array size not used in direct path.
Column array rows : 5000
Stream buffer bytes: 256000
Read buffer bytes:20000000

Total logical records skipped: 0
Total logical records read: 77829
Total logical records rejected: 0
Total logical records discarded: 0
Total stream buffers loaded by SQL*Loader main thread: 16
Total stream buffers loaded by SQL*Loader load thread: 124

Run began on Tue Nov 15 18:31:48 2005
Run ended on Tue Nov 15 18:32:31 2005

Elapsed time was: 00:00:43.13
CPU time was: 00:00:36.93Date conversion cache disabled due to overflow (default size: 1000)

Bind array size not used in direct path.
Column array rows : 5000
Stream buffer bytes: 256000
Read buffer bytes:20000000

Total logical records skipped: 0
Total logical records read: 77829
Total logical records rejected: 0
Total logical records discarded: 0
Total stream buffers loaded by SQL*Loader main thread: 16
Total stream buffers loaded by SQL*Loader load thread: 124

Run began on Tue Nov 15 18:31:48 2005
Run ended on Tue Nov 15 18:32:31 2005

Elapsed time was: 00:00:43.13
CPU time was: 00:00:36.93

I want this loader job to complete under 20 seconds.

Which is the best approach?

Please guide.

Thanks,
Jegan

tamilselvan
11-15-2005, 02:55 PM
Where is the control file?

Tamil

balajiyes
11-15-2005, 03:16 PM
Hi Tamil,

I have 300 columns.

Please find the attached ctl file.

LOAD DATA
INFILE "/var/opt/ascential/data/jj/CDRNOK_MSCHEMCHU1_20051106010701_2547.dat.bman.1219332"
APPEND
INTO TABLE CDR_TMPJJ
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(...
)


Thanks

tamilselvan
11-15-2005, 06:03 PM
I think READSIZE=20000000 is killing the performance.

For direct load option, I use the following three parameters carefully.
streamsize=1048576 readsize=1048576 columnarrayrows=8000

Those 3 values are good for my system. They may not be good for your environment.

You need to find out after experimenting with many values.

But one thing you must remember:
You must ensure that the loader process seemlessly reads records and writes them into table blocks. Since, the number of columns are more than 300, I recommend to a low value for columnarrayrows, (may be 1000).

When you are in doubt, decrease the values.

Tamil

WilliamR
11-15-2005, 08:31 PM
The parallel option is documented here (http://download-west.oracle.com/docs/cd/B14117_01/server.101/b10825/ldr_modes.htm#i1008225).

balajiyes
11-16-2005, 04:58 AM
Thank You guys.

Tamilselvan: "For direct load option, I use the following three parameters carefully.
streamsize=1048576 readsize=1048576 columnarrayrows=8000"
How to decide which is the best option?

William: I did raise parallel mode, but our DBAs don't want to tamper the current applications.

I dont hv any other option other than tunning this job in DIRECT MODE.

Cheers

slimdave
11-16-2005, 11:56 AM
William: I did raise parallel mode, but our DBAs don't want to tamper the current applications.

They'd have to come up with a better reason than that for me. Under what conditions would they be prepared to "tamper" ... how about if you could load the data five times faster?

balajiyes
11-17-2005, 05:22 AM
Hi,

We have to load 2200+ files per day having 77,000 records per file.

We 3 more jobs running parallely which loads the same amount of records into a different tables.

This job is running in production for the last 2 years taking 11-12 hours to just load into the staging table.

DBA's fear that if I add parallel option to my script it will degrade the other jobs performance n blow the machine.

I tried my best and proved that if I execute parallely (5 parallel streams), the job will complete within in 4 hours.

They asked to to tune my SQL Loader files. I dont have any option.

Just to test.

Thanks guys.

slimdave
11-17-2005, 09:38 AM
Only adding the parallel option will do nothing except change the way that free space is used. If there is spare i/o and cpu capacity on the machine then adding more jobs (with parallel set) is a valid approach that they ought not to dismiss without testing

tamilselvan
11-17-2005, 10:24 AM
Hi,


They asked to to tune my SQL Loader files. I dont have any option.


Thanks guys.

You need to replace the DBA ASAP.

Tamil