Slow OLTP on 14CPU SunFire - Page 2
DBAsupport.com Forums - Powered by vBulletin
Page 2 of 5 FirstFirst 1234 ... LastLast
Results 11 to 20 of 44

Thread: Slow OLTP on 14CPU SunFire

  1. #11
    Join Date
    Jun 2005
    Location
    Yorkshire, UK
    Posts
    12
    Originally posted by marist89
    The condition may be exacerbated by your big redo log buffer. [/B]
    I hear conflicting information on REDO BUFFER SIZE. Unfortunatly it's not something you can change on the fly!

    Why would you consider it big? Is it too big? And what effect is that having?



    Disk IO is quite high - I've got Spotlight so can produce purdy pictures... These are BEFORE during normal operations. I'll snap some more when I run a CTAS test.






    I would love to put a trace on the main system, just a little wary of the impact it would have. It's not something you can switch on and off quickly when using MTS. From my testing on the test machine, the traces will keep going while MTS keeps the connection and only creates the trace file(s) several hours later.

    It's an option - but I'll want to think about it...
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Guy Wicks
    Database Manager
    The British Library

  2. #12
    Join Date
    Jun 2000
    Location
    Madrid, Spain
    Posts
    7,447
    you dont have to snap, just run a CTAS with LOGGING and see what are your users waiting for

    I bet smth like log file switch completion

    also, do this and paste us the output

    grep -i "cannot" alert_SID.log

  3. #13
    Join Date
    Jun 2000
    Location
    Madrid, Spain
    Posts
    7,447
    Just wondering is your database 3TB or 559GB...?

    Application Servers connection pool is like MTS but managed by the App Server, what App Sevrer do you use?

  4. #14
    Join Date
    Jun 2005
    Location
    Yorkshire, UK
    Posts
    12
    In the interests of full disclosure...

    Code:
    ORA-00321: log 1 of thread 1, cannot update log file header
    ORA-00321: log 2 of thread 1, cannot update log file header
    ORA-00321: log 3 of thread 1, cannot update log file header
    ORA-00321: log 4 of thread 1, cannot update log file header
    ORA-00321: log 5 of thread 1, cannot update log file header
    ORA-00321: log 5 of thread 1, cannot update log file header
    ORA-00321: log 1 of thread 1, cannot update log file header
    ORA-00321: log 2 of thread 1, cannot update log file header
    ORA-00321: log 3 of thread 1, cannot update log file header
    ORA-00321: log 4 of thread 1, cannot update log file header
    ORA-00321: log 5 of thread 1, cannot update log file header
    ORA-00321: log 1 of thread 1, cannot update log file header
    ORA-00321: log 2 of thread 1, cannot update log file header
    ORA-00321: log 3 of thread 1, cannot update log file header
    ORA-00321: log 4 of thread 1, cannot update log file header
    ORA-00321: log 5 of thread 1, cannot update log file header
    ORA-00321: log 1 of thread 1, cannot update log file header
    ORA-00321: log 2 of thread 1, cannot update log file header
    ORA-00321: log 3 of thread 1, cannot update log file header
    ORA-00321: log 4 of thread 1, cannot update log file header
    ORA-00321: log 5 of thread 1, cannot update log file header
    ORA-00321: log 1 of thread 1, cannot update log file header
    ORA-00321: log 2 of thread 1, cannot update log file header
    ORA-00321: log 3 of thread 1, cannot update log file header
    ORA-00321: log 4 of thread 1, cannot update log file header
    ORA-00321: log 5 of thread 1, cannot update log file header
    ORA-00321: log 1 of thread 1, cannot update log file header
    ORA-00321: log 2 of thread 1, cannot update log file header
    ORA-00321: log 3 of thread 1, cannot update log file header
    ORA-00321: log 4 of thread 1, cannot update log file header
    ORA-00321: log 5 of thread 1, cannot update log file header
    ORA-00321: log 1 of thread 1, cannot update log file header
    ORA-00321: log 2 of thread 1, cannot update log file header
    ORA-00321: log 3 of thread 1, cannot update log file header
    ORA-00321: log 4 of thread 1, cannot update log file header
    ORA-00321: log 5 of thread 1, cannot update log file header
    ORA-16038: log 3 sequence# 97112 cannot be archived
     ORA-16038: log 3 sequence# 97112 cannot be archived
    Thread 1 cannot allocate new log, sequence 97117
    Found these - but the last one was several weeks ago when we ran out of ARCHIVE space (human error - we did not check the file system before we started a massive update - and we zoomed up 500GB of ARCH space overnight - old files did not get deleted - and before you ask, we do nightly hots and archives go to tape every 2 hours.)

    Code:
    Sat May  7 05:13:45 2005
    ORACLE Instance aleph0 - Archival Error
    ARCH: Connecting to console port...
    Sat May  7 05:13:45 2005
    ORA-16038: log 3 sequence# 97112 cannot be archived
    ORA-19502: write error on file "", blockno  (blocksize=)
    ORA-00312: online log 3 thread 1: '/exlibris8/oradata/aleph0/aleph0_redo_03.log'
    ORA-00312: online log 3 thread 1: '/exlibris9/oradata/aleph0/aleph0_redo_03.log'
    ARCH: Connecting to console port...
    ARCH:
     ORA-16038: log 3 sequence# 97112 cannot be archived
    ORA-19502: write error on file "", blockno  (blocksize=)
    ORA-00312: online log 3 thread 1: '/exlibris8/oradata/aleph0/aleph0_redo_03.log'
    ORA-00312: online log 3 thread 1: '/exlibris9/oradata/aleph0/aleph0_redo_03.log'
    Sat May  7 05:15:49 2005
    ARC0: Beginning to archive log# 3 seq# 97112
    ARC0: Archiving not possible: No primary destinations
    ARC0: Failed to archive log# 3 seq# 97112
    ARCH: Archival stopped, error occurred. Will continue retrying
    Sat May  7 05:15:50 2005
    ORACLE Instance aleph0 - Archival Error
    ARCH: Connecting to console port...
    Sat May  7 05:15:50 2005
    ORA-16014: log 3 sequence# 97112 not archived, no available destinations
    ORA-00312: online log 3 thread 1: '/exlibris8/oradata/aleph0/aleph0_redo_03.log'
    ORA-00312: online log 3 thread 1: '/exlibris9/oradata/aleph0/aleph0_redo_03.log'
    ARCH: Connecting to console port...

    Yes - its a 700GB DB today, was 1.5TB last week (we restructured and archived) but will be back upto 2 to 3 TB when we back load other catalogues.
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Guy Wicks
    Database Manager
    The British Library

  5. #15
    Join Date
    Jul 2000
    Posts
    521
    I must say this is a "best asked" question on this forum in quite some time. Good info to start with, details made available as and when needed. All responses have been equally sane, no humor, no jabs, no nothing...Good work Guy !! You are anyway a manager. I propose making you a Sr. Manager.

    Question about the statspack snapshot : The one you have posted reflects pure OLTP activity or does it cover a time space when you were doing some of you maintenance tasks ?

    Just last week I went though this for our OLTP system. System was as good as dead. "Log file sync" wait had shot through the roof. Rediculous "change write time" stats. And there was absolutely NOTHING unusual going on in the syatem. No app changes, no process changes, nothing...

    After coule of days it all boiled down to a loose cable that connected our SAN disks and fiber switches and servers and what not...Point is there can be something hiding under that waut event.
    svk

  6. #16
    Join Date
    Jun 2000
    Location
    Madrid, Spain
    Posts
    7,447
    I guess yuo have not placed your redo logs in the RAID 5 array?

  7. #17
    Join Date
    Jan 2001
    Posts
    2,828
    Hi Blguy

    At present whats evident to me is something strange with redologs..as pando mentions is your redo logs on raid5 ?

    The redo logs seem to be definately on slower disk please investigate that.

    I concurr this is one of the best threads here on DBA support.

    Code:
    Yes - I think REDO speed is a part of this -
    we have dedicated REDO drives 
    (currently a pair of JBODs multiplexed) with 
    6 members in 2 groups each 256MB. 
    We switch every 15 mins or so during the day.
    I would alter the redo groups to 2 members per group
    with 3-4 groups.

    Also i would switch in a 30 minutes or so .

    Hope it helps.

    regards
    Hrishy
    Last edited by hrishy; 06-04-2005 at 02:44 AM.

  8. #18
    Join Date
    Jun 2000
    Location
    Madrid, Spain
    Posts
    7,447
    Originally posted by blguy

    I've basically got the following options (if redo speed is at fault)

    1/ I have 6 JBOD drives on the system - I can create 3 pairs of redo drives - each "pair" would be hold one redo set. After a log switch, LGWR would move to the next drive and archiver would use the previous drive. No contention there. But is it fast enough?

    2/ I stripe 3 drives and mirror them. Will this be faster that dedicated drives?

    3/ I reclaim one of the R5 arrays as redo. (not nice)

    4/ I spend many $$$ on solid state REDO drives. BUT NOT before I PROVE redo speed is at the root cause.

    BTW - Our ARCH space is a 9 x 72GB RAID5 fibre channel array. We will soon be using a remote STANDBY database - so archives will be shipped over the WAN too.

    Your best option would be probably use dedicated drives (but not part of JBOD!) or use JBOD drives in different controllers. You dont need to stripe them, redo logs are written sequentially.

    Letīs say you have 4 Disk Arrays and 8 groups of redo logs then you should store your redo logs this way to avoid arch and lgwr contention:

    Code:
         CONTROLLER 1                    CONTROLLER 2
               /\                             /\
        -------  -------               -------  -------
       |                |             |                |
    Disk1	        Disk2           Disk3           Disk4
    -----------     -----------     -----------     -----------
    rdog1f1         rdog3f1         rdog2f1         rdog4f1
    rdog5f1         rdog7f1         rdog6f1         rdog8f1
    Last edited by pando; 06-04-2005 at 05:00 AM.

  9. #19
    Join Date
    May 2000
    Location
    ATLANTA, GA, USA
    Posts
    3,135
    Blguy,

    Interesting Topic...........

    The following suggestions may help you:

    1. Use RAW devices for all oracle files.
    For > 1 TB, I would always use RAW DEVICES with (RAID1+0).

    2. Use the DB server's internal disks for the redo logs and reduce the redo buffer size to 1 or 1.2 MB.
    Or use raw devices atleast for REDO LOGS. Defintely this will reduce "log fule sync" waits.

    3. Ratio pf Phy.Read to Log. Read
    From you statspack report:
    Logical reads: 16,930.78 184.73
    Physical reads: 2,810.45 30.67

    The ratio of phy to log is 16.59 percent.
    This seems to be high (how high I leave it to you).
    In general I expect this value should be around 3 to 5 percent.
    This indicates the more physical IO calls are issued by the system because of the data blocks not available in the SGA.
    Try to increase db_block_buffers parameter.

    How many "user commits" were executed during this 10.27 min?
    This statistic is in the statspack report.

    If possible, you can attach the entire statspack report in this thread.
    That may be helpful for all.

    By the way, did you tune all TOP-N SQL statements?

    If you say the SQL code cannot be modified, I will not accept it.

    I have modifed Siebel/Peoplesoft/Oracle CRM codes in the past 5 years.
    Those big companies all accepted my changed codes.

    Tamil

  10. #20
    Join Date
    Jun 2005
    Location
    Yorkshire, UK
    Posts
    12
    Here is the full statspack during NORMAL operations. I will arrange with users to run a stress test today (Monday) and take a new snapshot during that.

    sp_7575_7576.txt

    Thanks for your help.

    P.S. I apologise for the SQL - it's not mine! As I have said, the "developers" did not read the PL/SQL manual when they wrote the system - only using Oracle as a 'bit bucket'. I have it under oath that they have now read the manual and the next version will be more sympathetic.

    Also, the app is written in COBAL. If you think I'm going anywhere near that then you can...

    Tamil, we have a good relationship with the developers and they do take on board our observations. But we won't be able to do any SQL code changes to this version of the system.
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Guy Wicks
    Database Manager
    The British Library

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Click Here to Expand Forum to Full Width