Slow OLTP on 14CPU SunFire - Page 3
DBAsupport.com Forums - Powered by vBulletin
Page 3 of 5 FirstFirst 12345 LastLast
Results 21 to 30 of 44

Thread: Slow OLTP on 14CPU SunFire

  1. #21
    Join Date
    Jun 2005
    Location
    Yorkshire, UK
    Posts
    12
    Currently we have 2 x 140GB drives reserved for REDO directly attached.

    I'm very much considering moving to raw for redo logs. We have 6 independant drives (each 140GB) on a fibre back plane - direct attached, not SAN. I don't think the array controller is a bottleneck - it would not even break into a sweat with 6 drives.

    Q. How would I configure them? My thoughts are:

    Three sets of mirrored drives. Each mirror contains a log member.

    i.e.
    Drive 1 - REDO1A
    Drive 2 - REDO1B

    Drive 3 - REDO2A
    Drive 4 - REDO2B

    Drive 5 - REDO3A
    Drive 6 - REDO3B

    and then Drive 1 & 2 would contain REDO4A/B, drive 3 & 4 REDO05A/B and drive 5 & 6 REDO06A/B to give 6 members - each 256MB in size to keep the 15 minute switch rate.

    Archiving goes to a 470GB 9*72GB RAID5 array (and soon to a Standby server over the WAN)

    Afriad my SUN hardware knowledge is a bit basic - I have experienced engineers to do the physical stuff - but they don't know Oracle very well!


    So - do we think this REDO configuration allow us to do DML during the working day without killing the OLTP users?


    __________________
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Guy Wicks
    Database Manager
    The British Library
    Last edited by hrishy; 06-06-2005 at 07:17 AM.
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Guy Wicks
    Database Manager
    The British Library

  2. #22
    Join Date
    Jan 2001
    Posts
    2,828
    Hi

    I think thats okay to have 6 groups with two members each per group.And your redolog layout also seems okay to me.

    Now whether this would solve your problem well that can only be answered by actually doing the task. :-)

    You can also clone your Prod db to a test instance and see whether we can really reproduce your problem ?

    regards
    Hrishy

  3. #23
    Join Date
    Nov 2000
    Location
    greenwich.ct.us
    Posts
    9,092
    Originally posted by blguy

    Q. How would I configure them? My thoughts are:

    Three sets of mirrored drives. Each mirror contains a log member.

    i.e.
    Drive 1 - REDO1A
    Drive 2 - REDO1B

    Drive 3 - REDO2A
    Drive 4 - REDO2B

    Drive 5 - REDO3A
    Drive 6 - REDO3B

    and then Drive 1 & 2 would contain REDO4A/B, drive 3 & 4 REDO05A/B and drive 5 & 6 REDO06A/B to give 6 members - each 256MB in size to keep the 15 minute switch rate.

    I don't see the point of spreading on three sets of drives. LGWR will write to one pair (on seperate controllers, right?). Then ARCH will read the logfile and send to log_archive_dest_n and/or standby while LGWR is working on another pair. Since the ARCH process is so quick (most likely), the time spent reading of the log that was just switched is minimal. Eight disks would be ideal. You could setup two 0+1 filesystems and write one member to each filesystem (as long as you're using hardware RAID).

    For what it's worth, I think you're masking your real problem with hardware.
    Jeff Hunter
    marist89@yahoo.com
    http://marist89.blogspot.com/
    Get Firefox!
    "I pledge to stop eating sharks fin soup and will not do so under any circumstances."

  4. #24
    Join Date
    Jun 2005
    Location
    Yorkshire, UK
    Posts
    12
    Jeff,

    I think your right - the fact the system runs OK during the day is just brute force - a massive overspec of the base system. Generally the system bumbles along on 4 to 6 CPU's active (using top) with the rest idle/sleeping. It may not be nice, but it works.

    (This is a basically a government procurement - of course we got screwed eh?)

    Will arrange for the redo drives to be reconfigured at the next scheduled down time (need to change control etc etc etc) and I'll see if it has any improvement on the flexibility of the system.

    In answer to my original question - Would you expect a CTAS to cripple this system? - I think the answer is a definate no, and I'm glad everyone else agrees. I thought I might be going mad!

    Anyway - got to get onto setting up a SQLServer cluster with replication. Oh Joy!
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Guy Wicks
    Database Manager
    The British Library

  5. #25
    Join Date
    Jun 2000
    Location
    Madrid, Spain
    Posts
    7,447
    Have you done the stress test and the new statspack report? I wouldnt suggest you to change anything without finding out the root cause!

    Your CTAS can hang your system because of redo log issues. I am saying it because you said if you use NOLOGGING the system runs fine, without your users complains.

  6. #26
    Join Date
    Jul 2000
    Posts
    521
    I feel there is some problem even when operations like CTAS are NOT running. Response time is comprised of @ 15% service time and @ 85% wait time. Almost every commit is waiting for "log file sync".

    I sincerely suggest you to check everything @ the disks where your online redo logs are hosted. Do your ever see at least one of the redo group status as INACTIVE ? How big are the online redo logs ? How many archive logs are getting generated in a day and during peak times ?
    svk

  7. #27
    Join Date
    Jun 2005
    Location
    Yorkshire, UK
    Posts
    12
    svk
    I've never seen INACTIVE redo groups.
    We have 5 members per group, each member 256MB. Average log switch time every 10 to 15 minutes during the day.


    Here are the results of the CTAS test just before the users arrive (the reading rooms open at 9:30 BST) - this would of upset a few internet web users tho!


    This is the SQLPLUS script

    Code:
    SQL> @lf0
    Connected.
    gwicks@aleph0.ils-datasrv>
    gwicks@aleph0.ils-datasrv>
    gwicks@aleph0.ils-datasrv> SELECT MAX( snap_id ) FROM perfstat.stats$snapshot;
    
    MAX(SNAP_ID)
    ------------
            8128
    
    Elapsed: 00:00:00.00
    gwicks@aleph0.ils-datasrv> DROP TABLE gwicks.z02_full3;
    
    Table dropped.
    
    Elapsed: 00:00:00.62
    gwicks@aleph0.ils-datasrv>
    gwicks@aleph0.ils-datasrv> exec perfstat.statspack.snap
    
    PL/SQL procedure successfully completed.
    
    Elapsed: 00:00:01.19
    gwicks@aleph0.ils-datasrv> CREATE TABLE gwicks.z02_full3
      2  NOLOGGING
      3  AS
      4  SELECT     *
      5  FROM       bll10.z02
      6  WHERE      z02_doc_number LIKE '005%';
    
    Table created.
    
    Elapsed: 00:02:166.40
    gwicks@aleph0.ils-datasrv> exec perfstat.statspack.snap
    
    PL/SQL procedure successfully completed.
    
    Elapsed: 00:00:01.72
    gwicks@aleph0.ils-datasrv> SELECT MAX( snap_id ) FROM perfstat.stats$snapshot;
    
    MAX(SNAP_ID)
    ------------
            8130
    
    Elapsed: 00:00:00.00
    gwicks@aleph0.ils-datasrv>
    gwicks@aleph0.ils-datasrv>
    gwicks@aleph0.ils-datasrv>
    gwicks@aleph0.ils-datasrv>
    gwicks@aleph0.ils-datasrv>
    gwicks@aleph0.ils-datasrv> SELECT MAX( snap_id ) FROM perfstat.stats$snapshot;
    
    MAX(SNAP_ID)
    ------------
            8130
    
    Elapsed: 00:00:00.00
    gwicks@aleph0.ils-datasrv> DROP TABLE gwicks.z02_full3;
    
    Table dropped.
    
    Elapsed: 00:00:00.81
    gwicks@aleph0.ils-datasrv>
    gwicks@aleph0.ils-datasrv> exec perfstat.statspack.snap
    
    PL/SQL procedure successfully completed.
    
    Elapsed: 00:00:01.41
    gwicks@aleph0.ils-datasrv> CREATE TABLE gwicks.z02_full3
      2  --NOLOGGING
      3  AS
      4  SELECT     *
      5  FROM       bll10.z02
      6  WHERE      z02_doc_number LIKE '005%';
    
    Table created.
    
    Elapsed: 00:02:167.59
    gwicks@aleph0.ils-datasrv> exec perfstat.statspack.snap
    
    PL/SQL procedure successfully completed.
    
    Elapsed: 00:00:01.88
    gwicks@aleph0.ils-datasrv> SELECT MAX( snap_id ) FROM perfstat.stats$snapshot;
    
    MAX(SNAP_ID)
    ------------
            8133
    
    Elapsed: 00:00:00.00
    gwicks@aleph0.ils-datasrv>
    gwicks@aleph0.ils-datasrv>
    gwicks@aleph0.ils-datasrv>
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Guy Wicks
    Database Manager
    The British Library

  8. #28
    Join Date
    Jun 2005
    Location
    Yorkshire, UK
    Posts
    12
    Stats pack report for processing during CTAS logging exercise

    sp_8131_8133.txt

    And a couple of Spotlight captures:



    Notice the RED on log writer - the average log write time can go over 1,000 ms (1 sec) - so is the drive is being swamped? With over 2,500 blocks per second (8K block) that is over to be 20GB/sec transferred. That's high for any single drive.



    See the orange block grow as the number of active sessions get seen. This is how I know I affect the online users.

    Why would the CTAS saturate the REDO logs and prevent the OLTP sessions from processing. Can it not prioritise the small OLTP sessions?



    I feel that doing the CTAS exacerbates a current problem - the system is 'out-of-balance' Very fast CPU / Datafile but slow REDO. However, this is all proper SUN kit!
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Guy Wicks
    Database Manager
    The British Library

  9. #29
    Join Date
    Jun 2000
    Location
    Madrid, Spain
    Posts
    7,447
    Man you didnt see what users were waiting by looking v$session_Wait when you ran CTAS?!

  10. #30
    Join Date
    Jun 2000
    Location
    Madrid, Spain
    Posts
    7,447
    Also, the first snap were 10 minutes, this second one is 4 minutes!

    However if you look the log file sync waits

    Code:
    First SNAP without CTAS:
    log file sync                                      56,757      225,427   63.60 
    
    Second SNAP with CTAS:
    log file sync                                       7,697      686,518   93.81
    The wait time of second SNAP with only around 13% of first SNAP waits is MUCH MUCH longer than the first SNAP, clearly shows you have serious I/O problems with your redo log I/O subsystem

    Code:
    SECOND SNAP 
    
    Event                               Waits   Timeouts  Time (cs)    (ms)   /txn
    ---------------------------- ------------ ---------- ----------- ------ ------
    log file sync                       7,697      3,637     686,518    892    1.9
    db file sequential read            87,472          0      28,547      3   21.4
    log file parallel write               349          0       9,263    265    0.1
    db file parallel write                256          0       2,547     99    0.1
    log file switch completion             26          3       1,179    453    0.0
    log buffer space                      108          0       1,043     97    0.0
    buffer busy waits                     264          2         898     34    0.1
    control file parallel write            82          0         677     83    0.0
    direct path write                     161          0         372     23    0.0
    
    
    FIRST SNAP
    
    Event                               Waits   Timeouts  Time (cs)    (ms)   /txn
    ---------------------------- ------------ ---------- ----------- ------ ------
    log file sync                      56,757          0     225,427     40    1.0
    db file sequential read         1,729,916          0      93,853      1   30.6
    log file parallel write            39,154          0      29,790      8    0.7
    db file parallel write                576          0       2,203     38    0.0
    SQL*Net more data to client       540,735          0       2,161      0    9.6
    control file parallel write           206          0         508     25    0.0
    log file sequential read            4,194          0         163      0    0.1
    log file switch completion              5          0          75    150    0.0

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Click Here to Expand Forum to Full Width