Only been an Oracle DBA for 15 years - so yes, I'm junior.

But thanks for the reply!

I could have posted hundreds of lines of statspack values, and will if applicable. But before I get deep in diagnositic mode (and I have been for the last few weeks) I want to make sure I actually have a problem!

The CTAS was an example. It's a way I can reproduce the "problem" in an repeatable manner. It is indicative of the problem when we have to do a lot of updates to the system. It would be easy to use NOLOGGING all the time - but it would rather negate the point of having Oracle! This would not be an option in production.


The V$SESSION example shows that usually, it's empty. But when I give the system a little work to do - then suddenly these 10ms sessions take 5000ms to complete and we can measurably see the system is slow for them, which is why we they come up on V$SESSION.

Alert shows nothing (just log switches).


The question remains - if you have a hight OLTP system with 100's of commits a second - and you run a "high redo" operation (create index / copy table, update many rows, insert many rows, CTAS) - would you EXPECT at 14CPU machine to stop processing the very light OLTP requests?


I honestly think no. I think a correctly configured 4 CPU intel box would easily cope with the same task. The point is "correctly configured" - which this system appears not to be.


My "weakness" is that I don't know much about the specific application, nor did I set up the server. To be honest, it's not well written (external company) and it uses no stored procedures, RI, views etc. The company who wrote the code have never implemented it on such a scale before. Unfortunatly we have no way of changing the code or the SQL (which is why we are migrating to the newer version which is a little more Oracle friendly) and 9r2.


Specifics:

We have a 2.5MB redo buffer - we are not getting redo buffer waits but we do get high 'log sync waits'. I feel this is a cumulative problem - when the session commits, it waits for LGWR to flush the redo buffer. As we have 100 commits a second, it means we have 100 times the "wait time" recorded - so this may not be indicative of the problem per say.

Yes - I think REDO speed is a part of this - we have dedicated REDO drives (currently a pair of JBODs multiplexed) with 6 members in 2 groups each 256MB. We switch every 15 mins or so during the day.

From what I see, ARCHIVE has little impact. It takes 10 to 20 seconds to copy the redo to archive - there is no difference between that 20 second copy remaining 14.5 minutes when it's just filling up redo logs.


We have experimented on a test system (without the same load) and get acceptable CTAS performance on it with single drives. It's a bit more difficult on the prod system to test.




Here is the top of the stats pack for a 10 minute snapshot:

Code:
STATSPACK report for

DB Name         DB Id    Instance     Inst Num Release     OPS Host
------------ ----------- ------------ -------- ----------- --- ------------
XXXXXX         532351126 XXXXXX 1 8.1.7.4.0   NO  XXX-DATASRV

                Snap Id     Snap Time      Sessions
                ------- ------------------ --------
 Begin Snap:       7575 03-Jun-05 10:24:32      115
   End Snap:       7576 03-Jun-05 10:34:48      115
    Elapsed:                  10.27 (mins)

Cache Sizes
~~~~~~~~~~~
           db_block_buffers:     200000          log_buffer:    2621440
              db_block_size:       8192    shared_pool_size:  200000000

Load Profile
~~~~~~~~~~~~                            Per Second       Per Transaction
                                   ---------------       ---------------
                  Redo size:            236,471.74              2,580.18
              Logical reads:             16,930.78                184.73
              Block changes:                710.96                  7.76
             Physical reads:              2,810.45                 30.67
            Physical writes:                 69.84                  0.76
                 User calls:              5,009.34                 54.66
                     Parses:                243.37                  2.66
                Hard parses:                 11.26                  0.12
                      Sorts:                  9.51                  0.10
                     Logons:                  0.13                  0.00
                   Executes:              2,677.66                 29.22
               Transactions:                 91.65

  % Blocks changed per Read:    4.20    Recursive Call %:    3.58
 Rollback per transaction %:    0.00       Rows per Sort:   37.45

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Buffer Nowait %:  100.00       Redo NoWait %:  100.00
            Buffer  Hit   %:   83.40    In-memory Sort %:  100.00
            Library Hit   %:   98.98        Soft Parse %:   95.37
         Execute to Parse %:   90.91         Latch Hit %:   99.93
Parse CPU to Parse Elapsd %:   96.01     % Non-Parse CPU:   98.03

 Shared Pool Statistics        Begin   End
                               ------  ------
             Memory Usage %:   43.71   43.73
    % SQL with executions>1:   86.17   86.17
  % Memory for SQL w/exec>1:   82.55   82.56

Top 15 Wait Events
~~~~~~~~~~~~~~~~~                                             Wait     % Total
Event                                               Waits  Time (cs)   Wt Time
-------------------------------------------- ------------ ------------ -------
log file sync                                      56,757      225,427   63.60
db file sequential read                         1,729,916       93,853   26.48
log file parallel write                            39,154       29,790    8.40
db file parallel write                                576        2,203     .62
SQL*Net more data to client                       540,735        2,161     .61
control file parallel write                           206          508     .14
log file sequential read                            4,194          163     .05
log file switch completion                              5           75     .02
latch free                                            875           70     .02
SQL*Net break/reset to client                      14,042           69     .02
file open                                           4,455           65     .02
db file scattered read                                239           35     .01
enqueue                                                26           21     .01
buffer busy waits                                     103           13     .00
refresh controlfile command                           110            4     .00