Shell Script: Timeout and return error if Sqlplus hangs
This morning, our production database hung with lots of ORA-600 and 7445. The issue has been escalated to Oracle Tech support but my monitoring script which tries to make a connection every 5 mins to the database to see if its up, did not alert me. The reason is that, it connected to the database and hung, never came out to report an error and so I never got alerted until a user called me.
Can any one tell me how I can exit from the sqlplus block if I dont get a response in x seconds? This sqlplus block is being called with in a shell script.
Any help is highly apprciated.
If your monitoring script is actually hanging -as in doing nothing- what you need is something to monitor your monitoring script.
This is what I would do.
1- In your shell script create a flag.txt file, this way you have the host time written down in flag.txt file label.
2- Immediately after that run your sqlplus monitoring script; you want to run a "nohup of a secondary shell script" so your initial shell script goes to the next step.
3- Your sqlplus monitoring script shall write kinda of a log or status into your flag.txt file
4- Your initial shell script can now monitor the status and time of flag.txt file; if nothing happens in "x" seconds as you say, the shell script can either send a page or email you.
Pablo (Paul) Berzukov
Author of Understanding Database Administration
available at amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
Click Here to Expand Forum to Full Width