Hi, not too familiar with linux from a sysadmin point of view. We have a box (RH 3.0) that regularly crashes around once a month, I can't see anything database/application wise but the sysadmin can't see anything their end either.
You could also try searching for core files which might be produced when the system crashes, and use strings or similar to try to discover what caused the problem:
find / -name core - to find any core files.
strings /path/to/core | pg - to read any readable text in the core file.
Hope this helps.
If I have to choose between two evils, I always like to choose the one I haven't tried yet.
I have spotted something odd, Grid Control thinks OPEN_MAX has changed from 1024 to 65536, then to 1024 and then back to 65536 over a period of 3 weeks. The changes tie in with reboots/crashes but sysadmin don't know what causing this and didn't do anything. Yet if i do a getconf OPEN_MAX it returns 1024. It may just be a problem/bug with grid and yes I'm clutching at straws!
We run 3 databases, a couple of agents and grid on this box. We also see the same pattern on the DR server (and that also has the mysterious, ostensibly changing open_max value).
crash in the sense you can no longer connect to the box or even do a simple ping. Everytime its happened, its been a case of powering the box on directly. The DR server was down for a matter of hours last time it happened on there.
No core dumps.
Redhat bugs...perhaps thats where I should look next.
Bookmarks