-
Root cause for node eviction needed
Hi,
Could you please help us to find out the root cause analysis on why orinoco1 server rebooted (CRS node eviction).
below is the error message
Jun 27 17:51:59 orinoco1 logger: Oracle CSSD failure 134.
Jun 27 17:51:59 orinoco1 logger: Oracle CRS failure. Rebooting for cluster integrity.
Jun 27 17:52:00 orinoco1 logger: Oracle clsomon failed with fatal status 12.
Jun 27 17:52:00 orinoco1 logger: Oracle CRS failure. Rebooting for cluster integrity.
====== OCCSD LOG ==================================================================================================== ========
[ CSSD]2011-06-27 17:43:21.978 [1199618400] >TRACE: clssgmClientConnectMsg: Connect from con(0x7ac510) proc(0x777f60) pid() proto(10:2:1:1)
[ CSSD]2011-06-27 17:43:45.328 [1199618400] >TRACE: clssgmClientConnectMsg: Connect from con(0x7af170) proc(0x773c10) pid() proto(10:2:1:1)
[ CSSD]2011-06-27 17:44:45.678 [1199618400] >TRACE: clssgmClientConnectMsg: Connect from con(0x7af170) proc(0x773c10) pid() proto(10:2:1:1)
[ CSSD]2011-06-27 17:45:02.940 [1199618400] >TRACE: clssgmClientConnectMsg: Connect from con(0x7af170) proc(0x773c10) pid(11998) proto(10:2:1:1)
[ CSSD]2011-06-27 17:45:16.233 [1199618400] >TRACE: clssgmClientConnectMsg: Connect from con(0x7a2d80) proc(0x77a900) pid(12822) proto(10:2:1:1)
[ CSSD]2011-06-27 17:45:45.970 [1199618400] >TRACE: clssgmClientConnectMsg: Connect from con(0x77abf0) proc(0x777e60) pid() proto(10:2:1:1)
[ CSSD]2011-06-27 17:46:46.330 [1199618400] >TRACE: clssgmClientConnectMsg: Connect from con(0x77abf0) proc(0x777e60) pid() proto(10:2:1:1)
[ CSSD]2011-06-27 17:50:21.821 [1241577824] >WARNING: clssnmPollingThread: node orinoco2 (2) at 50% heartbeat fatal, eviction in 29.560 seconds
[ CSSD]2011-06-27 17:50:22.823 [1241577824] >WARNING: clssnmPollingThread: node orinoco2 (2) at 50% heartbeat fatal, eviction in 28.560 seconds
[ CSSD]2011-06-27 17:50:36.831 [1241577824] >WARNING: clssnmPollingThread: node orinoco2 (2) at 75% heartbeat fatal, eviction in 14.550 seconds
[ CSSD]2011-06-27 17:50:37.823 [1241577824] >WARNING: clssnmPollingThread: node orinoco2 (2) at 75% heartbeat fatal, eviction in 13.560 seconds
[ CSSD]2011-06-27 17:50:45.829 [1241577824] >WARNING: clssnmPollingThread: node orinoco2 (2) at 90% heartbeat fatal, eviction in 5.560 seconds
[ CSSD]2011-06-27 17:50:46.831 [1241577824] >WARNING: clssnmPollingThread: node orinoco2 (2) at 90% heartbeat fatal, eviction in 4.560 seconds
[ CSSD]2011-06-27 17:50:47.833 [1241577824] >TRACE: clssnmPollingThread: node orinoco2 (2) is impending reconfig
[ CSSD]2011-06-27 17:50:47.833 [1241577824] >WARNING: clssnmPollingThread: node orinoco2 (2) at 90% heartbeat fatal, eviction in 3.550 seconds
[ CSSD]2011-06-27 17:50:48.825 [1241577824] >TRACE: clssnmPollingThread: node orinoco2 (2) is impending reconfig
[ CSSD]2011-06-27 17:50:48.825 [1241577824] >WARNING: clssnmPollingThread: node orinoco2 (2) at 90% heartbeat fatal, eviction in 2.560 seconds
[ CSSD]2011-06-27 17:50:49.827 [1241577824] >TRACE: clssnmPollingThread: node orinoco2 (2) is impending reconfig
[ CSSD]2011-06-27 17:50:49.827 [1241577824] >WARNING: clssnmPollingThread: node orinoco2 (2) at 90% heartbeat fatal, eviction in 1.560 seconds
[ CSSD]2011-06-27 17:50:50.829 [1241577824] >TRACE: clssnmPollingThread: node orinoco2 (2) is impending reconfig
[ CSSD]2011-06-27 17:50:50.829 [1241577824] >WARNING: clssnmPollingThread: node orinoco2 (2) at 90% heartbeat fatal, eviction in 0.560 seconds
==================================================================================================== ==========================
====== /var/log/messages ==================================================================================================== ========
Jun 27 17:45:01 orinoco1 su(pam_unix)[11911]: session opened for user oracle by (uid=0)
Jun 27 17:45:01 orinoco1 su(pam_unix)[11911]: session closed for user oracle
Jun 27 17:47:40 orinoco1 kernel: bnx2: eth0 NIC Link is Down
Jun 27 17:47:41 orinoco1 kernel: LLT INFO V-14-1-10205 link 2 (eth0) node 0 in trouble
Jun 27 17:47:41 orinoco1 kernel: LLT INFO V-14-1-10205 link 2 (eth0) node 2 in trouble
Jun 27 17:47:41 orinoco1 kernel: LLT INFO V-14-1-10205 link 2 (eth0) node 5 in trouble
Jun 27 17:47:41 orinoco1 kernel: LLT INFO V-14-1-10205 link 2 (eth0) node 1 in trouble
Jun 27 17:47:41 orinoco1 kernel: LLT INFO V-14-1-10205 link 2 (eth0) node 3 in trouble
Jun 27 17:47:43 orinoco1 kernel: bnx2: eth0 NIC Link is Up, 1000 Mbps full duplex
Jun 27 17:47:44 orinoco1 kernel: LLT INFO V-14-1-10024 link 2 (eth0) node 0 active
Jun 27 17:47:44 orinoco1 kernel: LLT INFO V-14-1-10024 link 2 (eth0) node 2 active
Jun 27 17:47:44 orinoco1 kernel: LLT INFO V-14-1-10024 link 2 (eth0) node 5 active
Jun 27 17:47:44 orinoco1 kernel: LLT INFO V-14-1-10024 link 2 (eth0) node 1 active
Jun 27 17:47:44 orinoco1 kernel: LLT INFO V-14-1-10024 link 2 (eth0) node 3 active
Jun 27 17:47:45 orinoco1 kernel: o2net: connection to node orinoco2 (num 1) at 199.40.40.234:7777 has been idle for 10.0 seconds, shutting it down.
Jun 27 17:47:45 orinoco1 kernel: (0,0)2net_idle_timer:1426 here are some times that might help debug the situation: (tmr 1309168055.597322 now 1309168065.596662 dr 1309168055.597308 adv 1309168055.597329:1309168055.597330 func (d5542a8e:504) 1309168035.598570:1309168035.598693)
Jun 27 17:47:45 orinoco1 kernel: o2net: no longer connected to node orinoco2 (num 1) at 199.40.40.234:7777
==================================================================================================== ==========================
From the above messages we confirmed that, server has been rebooted to keep cluster integrity due to network interface failure logged in /var/log/mesages...
But can somebody confirm if this is due to :-
i) private interconnect network failure
or
ii) vote disk issue
Please also confirm that this is not due to glibc bug which causes random eviction. Note that O/S is running on Red Hat Enterprise Linux AS release 4 (Nahant Update 4) with 2.6.9-42.ELsmp.
Glibc : glibc-2.3.4-2.25
thanks
Thanks/Gopu
-
Please post log section that shows the actual eviction error message.
At first sight it looks like interconnect issue.
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
-
Originally Posted by PAVB
Please post log section that shows the actual eviction error message.
PAVB, am not clear with the question, i can upload the logs which all you needed..
Thanks/Gopu
-
Originally Posted by gopu_g
PAVB, am not clear with the question, i can upload the logs which all you needed..
Search logs for the word "evicted" then copy/paste twenty previous lines, offending line and next twenty lines.
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
-
Root cause for the node Eviction:-
=== Update ===
Glibc package versions have met pre-requisities for 10.2.0.2 RAC & RDBMS
The reason why server has been evicted from cluster is visible in /var/log/messages:
Jun 27 17:47:40 orinoco1 kernel: bnx2: eth0 NIC Link is Down
OCFS stopped itself and node has been evicted. This is an expected behavior to avoid data corruption.
We cannot say why NIC link went down. It can be a NIC bonding bug in RHEL, network switch outage or cable disconnected.
Oracle RAC, OCFS neither RDBMS is not managing network devices so it is not a Oracle related issue.
Thanks/Gopu
-
well thats extremely clear
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|