Slow OLTP on 14CPU SunFire
We have a large (3TB) Oracle 8.1.7 OLTP database running on a 14CPU SunFire server (48GB RAM). This drives a website with around 100 users requests per second (transactional commits per second). We use MTS to manage web server connection requests.
Generally the system runs fine - easy sub second responses. However, this can be very easily upset if we need to do maintenance tasks (data loads and index rebuilds) and we find that this brings the system to a virtual halt.
We are planning a large upgrade and data migration on the system - which will have to include work during working hours (it's a 6 month project). As the DBA, I'm concerned that the system this powerful cannot support OLTP users and our data management tasks.
Is this typical? Do other large OLTP Oracle system have the same "feature" where a 5 minute "CREATE TABLE AS SELECT" command in SQLPLUS turns millisecond web responses into 5+ second responses.
Example - During normal working day, a look at V$SESSION will show 100 sessions, with only 5 or so "ACTIVE" at that very moment. (Each web request is sub second so you don't see them!) However, if I run a "CREATE TABLE newtable AS SELECT * FROM table_with_a_million_rows" then V$SESSION will show up all the active users (60+) and their active session time goes into many seconds. User response becomes very poor (the phone starts ringing off the hook!).
If I run the same in "NOLOGGING" mode, then there is no noticable problem. Is this a LGWR problem?
I know this post does not have many "facts" - I will be willing to supply as requred.
Thanks in advance - Guy
what are your users waiting?
and yes when you do CTAS you should do it in nologging
you might consider that your archiver process are not fast enough or the archive destination is not fast enough or your log files are too small or have you checked your alert?
I mean being a Senior DBA (I wouldnt let a junior manage a 3TB OLTP system) you cannot just based on v$session to perform your tuning tasks correct?
Only been an Oracle DBA for 15 years - so yes, I'm junior.
But thanks for the reply!
I could have posted hundreds of lines of statspack values, and will if applicable. But before I get deep in diagnositic mode (and I have been for the last few weeks) I want to make sure I actually have a problem!
The CTAS was an example. It's a way I can reproduce the "problem" in an repeatable manner. It is indicative of the problem when we have to do a lot of updates to the system. It would be easy to use NOLOGGING all the time - but it would rather negate the point of having Oracle! This would not be an option in production.
The V$SESSION example shows that usually, it's empty. But when I give the system a little work to do - then suddenly these 10ms sessions take 5000ms to complete and we can measurably see the system is slow for them, which is why we they come up on V$SESSION.
Alert shows nothing (just log switches).
The question remains - if you have a hight OLTP system with 100's of commits a second - and you run a "high redo" operation (create index / copy table, update many rows, insert many rows, CTAS) - would you EXPECT at 14CPU machine to stop processing the very light OLTP requests?
I honestly think no. I think a correctly configured 4 CPU intel box would easily cope with the same task. The point is "correctly configured" - which this system appears not to be.
My "weakness" is that I don't know much about the specific application, nor did I set up the server. To be honest, it's not well written (external company) and it uses no stored procedures, RI, views etc. The company who wrote the code have never implemented it on such a scale before. Unfortunatly we have no way of changing the code or the SQL (which is why we are migrating to the newer version which is a little more Oracle friendly) and 9r2.
We have a 2.5MB redo buffer - we are not getting redo buffer waits but we do get high 'log sync waits'. I feel this is a cumulative problem - when the session commits, it waits for LGWR to flush the redo buffer. As we have 100 commits a second, it means we have 100 times the "wait time" recorded - so this may not be indicative of the problem per say.
Yes - I think REDO speed is a part of this - we have dedicated REDO drives (currently a pair of JBODs multiplexed) with 6 members in 2 groups each 256MB. We switch every 15 mins or so during the day.
From what I see, ARCHIVE has little impact. It takes 10 to 20 seconds to copy the redo to archive - there is no difference between that 20 second copy remaining 14.5 minutes when it's just filling up redo logs.
We have experimented on a test system (without the same load) and get acceptable CTAS performance on it with single drives. It's a bit more difficult on the prod system to test.
Here is the top of the stats pack for a 10 minute snapshot:
STATSPACK report for
DB Name DB Id Instance Inst Num Release OPS Host
------------ ----------- ------------ -------- ----------- --- ------------
XXXXXX 532351126 XXXXXX 1 18.104.22.168.0 NO XXX-DATASRV
Snap Id Snap Time Sessions
------- ------------------ --------
Begin Snap: 7575 03-Jun-05 10:24:32 115
End Snap: 7576 03-Jun-05 10:34:48 115
Elapsed: 10.27 (mins)
db_block_buffers: 200000 log_buffer: 2621440
db_block_size: 8192 shared_pool_size: 200000000
~~~~~~~~~~~~ Per Second Per Transaction
Redo size: 236,471.74 2,580.18
Logical reads: 16,930.78 184.73
Block changes: 710.96 7.76
Physical reads: 2,810.45 30.67
Physical writes: 69.84 0.76
User calls: 5,009.34 54.66
Parses: 243.37 2.66
Hard parses: 11.26 0.12
Sorts: 9.51 0.10
Logons: 0.13 0.00
Executes: 2,677.66 29.22
% Blocks changed per Read: 4.20 Recursive Call %: 3.58
Rollback per transaction %: 0.00 Rows per Sort: 37.45
Instance Efficiency Percentages (Target 100%)
Buffer Nowait %: 100.00 Redo NoWait %: 100.00
Buffer Hit %: 83.40 In-memory Sort %: 100.00
Library Hit %: 98.98 Soft Parse %: 95.37
Execute to Parse %: 90.91 Latch Hit %: 99.93
Parse CPU to Parse Elapsd %: 96.01 % Non-Parse CPU: 98.03
Shared Pool Statistics Begin End
Memory Usage %: 43.71 43.73
% SQL with executions>1: 86.17 86.17
% Memory for SQL w/exec>1: 82.55 82.56
Top 15 Wait Events
~~~~~~~~~~~~~~~~~ Wait % Total
Event Waits Time (cs) Wt Time
-------------------------------------------- ------------ ------------ -------
log file sync 56,757 225,427 63.60
db file sequential read 1,729,916 93,853 26.48
log file parallel write 39,154 29,790 8.40
db file parallel write 576 2,203 .62
SQL*Net more data to client 540,735 2,161 .61
control file parallel write 206 508 .14
log file sequential read 4,194 163 .05
log file switch completion 5 75 .02
latch free 875 70 .02
SQL*Net break/reset to client 14,042 69 .02
file open 4,455 65 .02
db file scattered read 239 35 .01
enqueue 26 21 .01
buffer busy waits 103 13 .00
refresh controlfile command 110 4 .00
When you do CTAS do you check what are your users waiting for?
Log file sync is cause by commits, if your application must do vast amount of commits then the way to improve would be putting your redo logs on fast disks. Since you are suffering
log file sync
log file parallel write
log file switch completion
log file sequential read
You might as well check your redo log speed (all members and all groups), you should check you arch process as well, the arch destination must be fast.
I suspect the disks where the redo logs are located are not fast enough.
Also I see you suffer 5 log file switch completion in 10 minutes. When the logfile are switching and the switch is not complete all the sessions are hanged therefore you see the 10ms operations take 5000ms.
Dont you see messages such as, cannot allocate new log in your alert?
And I didnt mean you are Junior, I just thought that one would check first what are the users waiting for when the system hangs! And that info is certainly not in v$session!
Last edited by pando; 06-03-2005 at 09:15 AM.
And just curious, doesnt the App/Web Server have a connection pool?
In the past I have seen MTS + Connection Pooling combinations dont perform very well
Most of your wait time is log file sync 225,427
Perhaps you should convert the JBOD to striped.
All those CPU's wont help when your problem is I/O bottleneck..here is a look at CPU bottleneck and notice how quick all the log file synchs are:
Top 5 Timed Events
Event Waits Time (s) Ela Time
---------------------------------- ------------ ----------- --------
CPU time 4,851 4,042 55.76
db file sequential read 1,968 1,997 27.55
log file sync 299,097 369 5.08
db file scattered read 53,031 330 4.55
log file parallel write 302,680 190 2.62
Last edited by roadwarriorDBA; 06-03-2005 at 09:21 AM.
And what size of redo log buffer are you using?
He is using 2.5Mb redo buffer.
Don't know about connection pooling - will ask. What would be a better alternative?
How can I tell how many "sessions" or users passed thru the system in the 10 mins of the snap shot when in MTS mode?
I *think* the 'log sync wait' can be a red herring - if you follow this logic...
I have 91 transactions per second (from the statspack report) - and in the 10 minutes I waited 225,427 cs.
91 commits per second * 60 seconds * 10 minutes = 54600 commits in 10 minutes.
225,427 cs waited / 54,600 = 4.13 cs per commit. Each commit waits, on average 4 hundreths of a second.
I waited 56,757 times - each time 0.04 of a second. Sounds not too bad like that?
Or tell me I'm wrong!
From your stats, you can see that your log sync waits are caused by slow log file parallel writes.
I've basically got the following options (if redo speed is at fault)
1/ I have 6 JBOD drives on the system - I can create 3 pairs of redo drives - each "pair" would be hold one redo set. After a log switch, LGWR would move to the next drive and archiver would use the previous drive. No contention there. But is it fast enough?
2/ I stripe 3 drives and mirror them. Will this be faster that dedicated drives?
3/ I reclaim one of the R5 arrays as redo. (not nice)
4/ I spend many $$$ on solid state REDO drives. BUT NOT before I PROVE redo speed is at the root cause.
BTW - Our ARCH space is a 9 x 72GB RAID5 fibre channel array. We will soon be using a remote STANDBY database - so archives will be shipped over the WAN too.
One way to test this would be to do a statspack report while the system is running a CTAS with a snap before and after. I'll do this later today when no-one is looking
Thanks for your feedback so far guy's (and gal's) It's giving me plenty of idea's.
Now that we have the whole system view, I would kind of look at it from two different times. The most critical part to your system seems to be the web sessions, that's where I would start.
First, I would want to get a trace of a session during "Normal" time to see what my waits are. You may find that you are still waiting on "log file sync", just not as much. Since you have multiple connections, you might have to trace for an hour or so.
I would then do the same trace while I was doing a CTAS to see where my application thinks it is spending most of it's time.
I would then compare the two traces and see where the major differences are.
You may also want to monitor your disk I/O using iostat. If your log disks are really busy striping and raw disks may be in order.
When I see the "log file sync" event as the top wait event, I know it means my developers are over committing. However, in your case, you really have two different types of sessions, OLTP and long running transactions. When a "log file sync" happens, LGWR is working really hard. The condition may be exacerbated by your big redo log buffer.
Click Here to Expand Forum to Full Width