We had a scenario here in our 3-node Prod RAC. Primary node went down for some reason. And it couldn't bring itself up, had to reboot the node. Upon checking crsd.log in that node, it had multiple occurrence of these lines:
2009-03-31 05:04:41.763: [ CRSEVT][1487063392]0CAAMonitorHandler :: 0:Action Script /u01/crs/oracle/product/10.2.0/crs/bin/racgwrap(check) timed out for ora.jumla.vip! (timeout=60)
2009-03-31 05:04:41.763: [ CRSAPP][1487063392]0CheckResource error for ora.jumla.vip error code = -2
2009-03-31 05:06:14.792: [ CRSEVT][1487063392]0CAAMonitorHandler :: 0:Could not join /u01/crs/oracle/product/10.2.0/crs/bin/racgwrap(check)
category: 1234, operation: scls_process_join, loc: childcrash, OS error: 0, other: Abnormal termination of the child
Attached is imon.txt taken from imon.log. Anybody can suggest here why this node was not brought up by other two nodes?
crsd.log in node3 says:
2009-03-31 02:45:35.578: [ CRSRES][1522735456]0Attempting to start `ora.node1.vip` on member `node3`
2009-03-31 02:45:57.961: [ CRSAPP][1522735456]0StartResource error for ora.node1.vip error code = 1
2009-03-31 02:45:58.319: [ CRSRES][1522735456]0Start of `ora.node1.vip` on member `node3` failed.
2009-03-31 02:45:58.501: [ CRSRES][1522735456]0Attempting to start `ora.node1.vip` on member `node2' failed.
crsd.log in node1 says:
2009-03-31 05:04:41.763: [ CRSEVT][1487063392]0CAAMonitorHandler :: 0:Action Script /u01/crs/oracle/product/10.2.0/crs/bin/racgwrap(check) timed out for ora.node1.vip! (timeout=60)
2009-03-31 05:04:41.763: [ CRSAPP][1487063392]0CheckResource error for ora.node1.vip error code = -2
2009-03-31 05:06:14.792: [ CRSEVT][1487063392]0CAAMonitorHandler :: 0:Could not join /u01/crs/oracle/product/10.2.0/crs/bin/racgwrap(check)
category: 1234, operation: scls_process_join, loc: childcrash, OS error: 0, other: Abnormal termination of the child
Is it that crsd couldn't auto start? seems like node1 couldn't gain a vip, thus evicted from the cluster.