Root cause for node eviction needed
DBAsupport.com Forums - Powered by vBulletin
Results 1 to 6 of 6

Thread: Root cause for node eviction needed

  1. #1
    Join Date
    Aug 2007
    Location
    Cyberjaya,kuala lumpur
    Posts
    339

    Root cause for node eviction needed

    Hi,
    Could you please help us to find out the root cause analysis on why orinoco1 server rebooted (CRS node eviction).

    below is the error message

    Jun 27 17:51:59 orinoco1 logger: Oracle CSSD failure 134.
    Jun 27 17:51:59 orinoco1 logger: Oracle CRS failure. Rebooting for cluster integrity.
    Jun 27 17:52:00 orinoco1 logger: Oracle clsomon failed with fatal status 12.
    Jun 27 17:52:00 orinoco1 logger: Oracle CRS failure. Rebooting for cluster integrity.



    ====== OCCSD LOG ==================================================================================================== ========
    [ CSSD]2011-06-27 17:43:21.978 [1199618400] >TRACE: clssgmClientConnectMsg: Connect from con(0x7ac510) proc(0x777f60) pid() proto(10:2:1:1)
    [ CSSD]2011-06-27 17:43:45.328 [1199618400] >TRACE: clssgmClientConnectMsg: Connect from con(0x7af170) proc(0x773c10) pid() proto(10:2:1:1)
    [ CSSD]2011-06-27 17:44:45.678 [1199618400] >TRACE: clssgmClientConnectMsg: Connect from con(0x7af170) proc(0x773c10) pid() proto(10:2:1:1)
    [ CSSD]2011-06-27 17:45:02.940 [1199618400] >TRACE: clssgmClientConnectMsg: Connect from con(0x7af170) proc(0x773c10) pid(11998) proto(10:2:1:1)
    [ CSSD]2011-06-27 17:45:16.233 [1199618400] >TRACE: clssgmClientConnectMsg: Connect from con(0x7a2d80) proc(0x77a900) pid(12822) proto(10:2:1:1)
    [ CSSD]2011-06-27 17:45:45.970 [1199618400] >TRACE: clssgmClientConnectMsg: Connect from con(0x77abf0) proc(0x777e60) pid() proto(10:2:1:1)
    [ CSSD]2011-06-27 17:46:46.330 [1199618400] >TRACE: clssgmClientConnectMsg: Connect from con(0x77abf0) proc(0x777e60) pid() proto(10:2:1:1)
    [ CSSD]2011-06-27 17:50:21.821 [1241577824] >WARNING: clssnmPollingThread: node orinoco2 (2) at 50% heartbeat fatal, eviction in 29.560 seconds
    [ CSSD]2011-06-27 17:50:22.823 [1241577824] >WARNING: clssnmPollingThread: node orinoco2 (2) at 50% heartbeat fatal, eviction in 28.560 seconds
    [ CSSD]2011-06-27 17:50:36.831 [1241577824] >WARNING: clssnmPollingThread: node orinoco2 (2) at 75% heartbeat fatal, eviction in 14.550 seconds
    [ CSSD]2011-06-27 17:50:37.823 [1241577824] >WARNING: clssnmPollingThread: node orinoco2 (2) at 75% heartbeat fatal, eviction in 13.560 seconds
    [ CSSD]2011-06-27 17:50:45.829 [1241577824] >WARNING: clssnmPollingThread: node orinoco2 (2) at 90% heartbeat fatal, eviction in 5.560 seconds
    [ CSSD]2011-06-27 17:50:46.831 [1241577824] >WARNING: clssnmPollingThread: node orinoco2 (2) at 90% heartbeat fatal, eviction in 4.560 seconds
    [ CSSD]2011-06-27 17:50:47.833 [1241577824] >TRACE: clssnmPollingThread: node orinoco2 (2) is impending reconfig
    [ CSSD]2011-06-27 17:50:47.833 [1241577824] >WARNING: clssnmPollingThread: node orinoco2 (2) at 90% heartbeat fatal, eviction in 3.550 seconds
    [ CSSD]2011-06-27 17:50:48.825 [1241577824] >TRACE: clssnmPollingThread: node orinoco2 (2) is impending reconfig
    [ CSSD]2011-06-27 17:50:48.825 [1241577824] >WARNING: clssnmPollingThread: node orinoco2 (2) at 90% heartbeat fatal, eviction in 2.560 seconds
    [ CSSD]2011-06-27 17:50:49.827 [1241577824] >TRACE: clssnmPollingThread: node orinoco2 (2) is impending reconfig
    [ CSSD]2011-06-27 17:50:49.827 [1241577824] >WARNING: clssnmPollingThread: node orinoco2 (2) at 90% heartbeat fatal, eviction in 1.560 seconds
    [ CSSD]2011-06-27 17:50:50.829 [1241577824] >TRACE: clssnmPollingThread: node orinoco2 (2) is impending reconfig
    [ CSSD]2011-06-27 17:50:50.829 [1241577824] >WARNING: clssnmPollingThread: node orinoco2 (2) at 90% heartbeat fatal, eviction in 0.560 seconds
    ==================================================================================================== ==========================


    ====== /var/log/messages ==================================================================================================== ========
    Jun 27 17:45:01 orinoco1 su(pam_unix)[11911]: session opened for user oracle by (uid=0)
    Jun 27 17:45:01 orinoco1 su(pam_unix)[11911]: session closed for user oracle
    Jun 27 17:47:40 orinoco1 kernel: bnx2: eth0 NIC Link is Down
    Jun 27 17:47:41 orinoco1 kernel: LLT INFO V-14-1-10205 link 2 (eth0) node 0 in trouble
    Jun 27 17:47:41 orinoco1 kernel: LLT INFO V-14-1-10205 link 2 (eth0) node 2 in trouble
    Jun 27 17:47:41 orinoco1 kernel: LLT INFO V-14-1-10205 link 2 (eth0) node 5 in trouble
    Jun 27 17:47:41 orinoco1 kernel: LLT INFO V-14-1-10205 link 2 (eth0) node 1 in trouble
    Jun 27 17:47:41 orinoco1 kernel: LLT INFO V-14-1-10205 link 2 (eth0) node 3 in trouble
    Jun 27 17:47:43 orinoco1 kernel: bnx2: eth0 NIC Link is Up, 1000 Mbps full duplex
    Jun 27 17:47:44 orinoco1 kernel: LLT INFO V-14-1-10024 link 2 (eth0) node 0 active
    Jun 27 17:47:44 orinoco1 kernel: LLT INFO V-14-1-10024 link 2 (eth0) node 2 active
    Jun 27 17:47:44 orinoco1 kernel: LLT INFO V-14-1-10024 link 2 (eth0) node 5 active
    Jun 27 17:47:44 orinoco1 kernel: LLT INFO V-14-1-10024 link 2 (eth0) node 1 active
    Jun 27 17:47:44 orinoco1 kernel: LLT INFO V-14-1-10024 link 2 (eth0) node 3 active
    Jun 27 17:47:45 orinoco1 kernel: o2net: connection to node orinoco2 (num 1) at 199.40.40.234:7777 has been idle for 10.0 seconds, shutting it down.
    Jun 27 17:47:45 orinoco1 kernel: (0,0)2net_idle_timer:1426 here are some times that might help debug the situation: (tmr 1309168055.597322 now 1309168065.596662 dr 1309168055.597308 adv 1309168055.597329:1309168055.597330 func (d5542a8e:504) 1309168035.598570:1309168035.598693)
    Jun 27 17:47:45 orinoco1 kernel: o2net: no longer connected to node orinoco2 (num 1) at 199.40.40.234:7777
    ==================================================================================================== ==========================


    From the above messages we confirmed that, server has been rebooted to keep cluster integrity due to network interface failure logged in /var/log/mesages...

    But can somebody confirm if this is due to :-
    i) private interconnect network failure
    or
    ii) vote disk issue

    Please also confirm that this is not due to glibc bug which causes random eviction. Note that O/S is running on Red Hat Enterprise Linux AS release 4 (Nahant Update 4) with 2.6.9-42.ELsmp.

    Glibc : glibc-2.3.4-2.25


    thanks
    Thanks/Gopu

  2. #2
    Join Date
    Mar 2007
    Location
    Ft. Lauderdale, FL
    Posts
    3,554
    Please post log section that shows the actual eviction error message.

    At first sight it looks like interconnect issue.
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

  3. #3
    Join Date
    Aug 2007
    Location
    Cyberjaya,kuala lumpur
    Posts
    339
    Quote Originally Posted by PAVB View Post
    Please post log section that shows the actual eviction error message.

    PAVB, am not clear with the question, i can upload the logs which all you needed..
    Thanks/Gopu

  4. #4
    Join Date
    Mar 2007
    Location
    Ft. Lauderdale, FL
    Posts
    3,554
    Quote Originally Posted by gopu_g View Post
    PAVB, am not clear with the question, i can upload the logs which all you needed..
    Search logs for the word "evicted" then copy/paste twenty previous lines, offending line and next twenty lines.
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

  5. #5
    Join Date
    Aug 2007
    Location
    Cyberjaya,kuala lumpur
    Posts
    339
    Root cause for the node Eviction:-


    === Update ===
    Glibc package versions have met pre-requisities for 10.2.0.2 RAC & RDBMS

    The reason why server has been evicted from cluster is visible in /var/log/messages:
    Jun 27 17:47:40 orinoco1 kernel: bnx2: eth0 NIC Link is Down

    OCFS stopped itself and node has been evicted. This is an expected behavior to avoid data corruption.

    We cannot say why NIC link went down. It can be a NIC bonding bug in RHEL, network switch outage or cable disconnected.

    Oracle RAC, OCFS neither RDBMS is not managing network devices so it is not a Oracle related issue.
    Thanks/Gopu

  6. #6
    Join Date
    Sep 2002
    Location
    England
    Posts
    7,333
    well thats extremely clear

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width