Odd Database issue with sudden I/O wait
DBAsupport.com Forums - Powered by vBulletin
Results 1 to 6 of 6

Thread: Odd Database issue with sudden I/O wait

  1. #1
    Join Date
    May 2005
    Posts
    129

    Odd Database issue with sudden I/O wait

    This DB is the backend to Oracles AS/Portal, its the MDR/Infrastructure.

    The tuning has always been about CPU reduction, always been CPU bound. Twice Oracle recommended using less memory and I/O wait has never been the issue it usually is inside a Datawarehouse. We successfully load tested 2200 simulated concurrent users on the stack. We were all ready to head home when friday and launch sunday night when the application engineers were re-deploying their application and the Database went to 99% I/O Wait. Nothing could be identified as the cause. Eventually we restored our backups from the night before and to our astonishment the high I/O wait resumed immediately.

    I got a Sev 1 Tar opened up and worked it through the nite with support in India. They never turned anything up with the DB. Hours later it dropped from 99% I/O Wait to much lower but still higher than at any time in the past, much too high for an idle DB connected to 2 AS servers.

    I still think this is an I/O subsystem degradation due to a broken mirror or something, semi-failed controller. Hardware guys dismiss it immediately.

    Any ideas ? Statspack, Enterprise manager have been utilized / uploaded / pondered but to no avail.

    I say hardware.

  2. #2
    Join Date
    Jun 2000
    Location
    Madrid, Spain
    Posts
    7,447
    Well what process was haging the I/O system? Did you check what process at DB level is doing I/O activity?

  3. #3
    Join Date
    May 2005
    Posts
    129
    DBW0, we took an Oracle debug trace and sent it up to metalink. They said after an hour of web conferencing that all was normal. We got transferrred to the portal group at that point where we will likely go nowhere on this issue.

    Also its interesting its writing 903 blocks per second on a database that is not active...earlier we had

    Now its more normal..

    Sat Feb 4 12:13:39 2006
    Private_strands 7 at log switch
    Thread 1 advanced to log sequence 340
    Current log# 3 seq# 340 mem# 0: /oldoradata/ONLINE_LOGS/redo03.log
    Sat Feb 4 12:17:23 2006

    Earlier it was switching rapidly second apart, I would recreate larger online redo logs but this thing has been fine in this configuration all week under load test.

    ARCH: Connecting to console port...
    Committing creation of archivelog '/opt/backup/ARCHIVE_LOGS/1_40_581465171.dbf'
    Fri Feb 3 22:32:18 2006
    ARC1: Completed archiving thread 1 sequence 40 (19129030-19135397) (asdb)
    ARCH: Connecting to console port...
    Fri Feb 3 22:32:21 2006
    Thread 1 cannot allocate new log, sequence 42
    Checkpoint not complete


    I have disabled archive logging to reduce the contention, I am turning it back on now to see how it responds.

    Also system tablespace was seen growing around 2:00 am last night and had increased by 10MB, seems stable now and I cannot find any users set to use its temp tablespace.

    USERNAME TEMPORARY_TABLESPACE
    ------------------------------ ------------------------------
    PERFSTAT TEMP
    OWF_MGR TEMP
    ORASSO_PA TEMP
    ORASSO_PS TEMP
    ORASSO_DS TEMP
    ORASSO TEMP
    WIRELESS TEMP
    ORASSO_PUBLIC TEMP
    ODS TEMP
    ORABPEL TEMP
    BAM TEMP

    USERNAME TEMPORARY_TABLESPACE
    ------------------------------ ------------------------------
    B2B TEMP
    UDDISYS TEMP
    WCRSYS TEMP
    DSGATEWAY TEMP
    DISCOVERER5 TEMP
    ORAOCA_PUBLIC TEMP
    OCA TEMP
    PORTAL TEMP
    PORTAL_APP TEMP
    PORTAL_DEMO TEMP
    PORTAL_PUBLIC TEMP

    USERNAME TEMPORARY_TABLESPACE
    ------------------------------ ------------------------------
    DCM TEMP
    WEBDOMAIN TEMP
    RESQUEST TEMP
    PREFSTORE TEMP
    SPOTLIGHT TEMP
    SCOTT TEMP
    IP TEMP
    MDDATA TEMP
    DIP TEMP
    DBSNMP TEMP
    WKSYS TEMP

    USERNAME TEMPORARY_TABLESPACE
    ------------------------------ ------------------------------
    SYSMAN TEMP
    WKPROXY TEMP
    MDSYS TEMP
    WMSYS TEMP
    CTXSYS TEMP
    ANONYMOUS TEMP
    SI_INFORMTN_SCHEMA TEMP
    ORDSYS TEMP
    EXFSYS TEMP
    DMSYS TEMP
    WK_TEST TEMP

    USERNAME TEMPORARY_TABLESPACE
    ------------------------------ ------------------------------
    XDB TEMP
    ORDPLUGINS TEMP
    OLAPSYS TEMP
    MGMT_VIEW TEMP
    SYS TEMP
    SYSTEM TEMP
    OUTLN TEMP

    51 rows selected.
    Last edited by roadwarriorDBA; 02-04-2006 at 02:30 PM.

  4. #4
    Join Date
    May 2005
    Posts
    129
    TOP process is oracleasdb ( sid is asdb )

    Under cd /proc/6199 we have

    lrwx------ 1 oracle oinstall 64 Feb 4 12:32 0 -> /dev/null
    lrwx------ 1 oracle oinstall 64 Feb 4 12:32 1 -> /dev/null
    lrwx------ 1 oracle oinstall 64 Feb 4 12:32 10 -> /oradata/asdb/ptli dx.dbf
    lrwx------ 1 oracle oinstall 64 Feb 4 12:32 11 -> /oradata/asdb/port al.dbf
    lrwx------ 1 oracle oinstall 64 Feb 4 12:32 12 -> /oradata/asdb/temp 01.dbf
    lrwx------ 1 oracle oinstall 64 Feb 4 12:32 13 -> /oradata/asdb/ptld oc.dbf
    lrwx------ 1 oracle oinstall 64 Feb 4 12:32 2 -> /dev/null
    lrwx------ 1 oracle oinstall 64 Feb 4 12:32 27 -> socket:[10326829]
    lr-x------ 1 oracle oinstall 64 Feb 4 12:32 3 -> /dev/null
    lr-x------ 1 oracle oinstall 64 Feb 4 12:32 4 -> /dev/null
    lr-x------ 1 oracle oinstall 64 Feb 4 12:32 5 -> /dev/null
    lr-x------ 1 oracle oinstall 64 Feb 4 12:32 6 -> /dev/null
    lr-x------ 1 oracle oinstall 64 Feb 4 12:32 7 -> /dev/zero
    lr-x------ 1 oracle oinstall 64 Feb 4 12:32 8 -> /u01/app/oracle/pro duct/10.1.2.0.2/infra/rdbms/mesg/oraus.msb
    lrwx------ 1 oracle oinstall 64 Feb 4 12:32 9 -> /oradata/asdb/undot bs01.dbf



    iostat shows 900k/s second written and then this process dies and another replaces it and the song remains the same. This is generating the extra redo and I/O wait.

    cciss/c0d1 43.21 83.44 900.20 2445474 26383248
    cciss/c0d1p1 43.21 83.43 900.20 2445210 26383248

    Portal is doing the activity.

  5. #5
    Join Date
    Nov 2000
    Location
    greenwich.ct.us
    Posts
    9,092
    oracleasdb
    than that's the process that's doing all the work. I'd run logminer on the archived redo logs and see who's doing what...
    Jeff Hunter
    marist89@yahoo.com
    http://marist89.blogspot.com/
    Get Firefox!
    "I pledge to stop eating sharks fin soup and will not do so under any circumstances."

  6. #6
    Join Date
    May 2005
    Posts
    129
    The phantom load ( twin 99% IO WAIT on 2 cpus ) that returned immediately after restore is still a mystery. It concluded after 3 hours with archive logmode turned off.

    It likely had to do with this nasty problem.

    HUGE PROBLEM IDENTIFIED:

    Code:
    SUM(BYTES) SEGMENT_NAME
    ---------- --------------
    1539309568 WWCTX_SSOS_IX2
     134414336 WWCTX_SSOS_PK
    1213202432 WWCTX_SSO_SESSION$-->JOB NOT RUNNING TO CLEANUP TABLE
      64028672 WWSTO_SESSION$
     491847680 WWSTO_SESSION_DATA$
    RUNNING THE CLEANUP JOB IS NOT GOING TO WORK IT TOOK 1 HOUR TO DELETE JUST 1000 BLOCKS AND IT HAS 3 MILLION ROWS IN 140,000 BLOCKS.

    Contacting Oracle to see if they have a script for dropping recreating this session table as delete is taking too long so I killed it.

    The Portal session cleanup job not running is a common problem from what I am reading.
    Last edited by gandolf989; 02-06-2006 at 09:06 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width