Because the Veritas backup has no Oracle Agent, I execute a simple SQL file at 11pm that connects as sysdba and issues a shutdown immediate. (For both DBs) This works fine.
At 2am I use another simple SQL script to restart them.
Both DBs begin to open but hang at the MOUNT stage with:
ORA-01102: cannot mount database in EXCLUSIVE mode
ORA-09341: scumnt: unable to mount database
OSD-04400: unable to acquire internal semaphore for process
Now I've tracked a bug (855721) that describes this functionality (!) perfectly. It advocated restarting the services as the only way to start an instance that has previously failed at the MOUNT stage before.
Okay - its a bug, but how can I trace what is causing the DBs to fail to MOUNT in the first place?
Until very recently the SQL file just stopped and started 1 DB. This worked without problem. I added a startup and shutdown for a 2nd DB and this problem has begun to occur - if intermittently.
The scripts are very simple:
CONNECT sys/pwd as sysdba@db1
shutdown immediate
CONNECT sys/pwd as sysdba@db2
shutdown immediate
and the same for startup. Could the nature of the scripts be causing the DBs to hang at MOUNT stage?
Tue May 25 02:00:11 2004
ARC0: changing ARC0 KCRRSTART->KCRRACTIVE
ARC0 started with pid=8
Tue May 25 02:00:11 2004
ARCH: Initializing ARC0
ARCH: ARC0 invoked
Tue May 25 02:00:11 2004
ARCH: STARTING ARCH PROCESSES COMPLETE
Tue May 25 02:00:11 2004
ALTER DATABASE MOUNT
Tue May 25 02:00:11 2004
ORA-09341: scumnt: unable to mount database
OSD-04400: unable to acquire internal semaphore for process
O/S-Error: (OS 183) Cannot create a file when that file already exists.
Tue May 25 02:00:11 2004
ORA-1102 signalled during: ALTER DATABASE MOUNT...
Tue May 25 02:00:12 2004
ARC0: Archival started
and it hangs here. Just brute rebooting the server causes both DBs to start fine, with this alert log:
ue May 25 09:50:28 2004
ARC0: changing ARC0 KCRRSTART->KCRRACTIVE
ARC0 started with pid=8
Tue May 25 09:50:28 2004
ARCH: Initializing ARC0
ARCH: ARC0 invoked
Tue May 25 09:50:28 2004
ARCH: STARTING ARCH PROCESSES COMPLETE
Tue May 25 09:50:28 2004
alter database mount exclusive
Tue May 25 09:50:29 2004
ARC0: Archival started
Tue May 25 09:50:34 2004
Successful mount of redo thread 1, with mount id 1461560362.
Tue May 25 09:50:34 2004
Database mounted in Exclusive Mode.
Completed: alter database mount exclusive
Tue May 25 09:50:34 2004
alter database open
Picked broadcast on commit scheme to generate SCNs
Tue May 25 09:50:36 2004
Thread 1 opened at log sequence 6439
Current log# 1 seq# 6439 mem# 0: I:\ALPS\ALPSLIVE\ALPL_REDO01.LOG
Successful open of redo thread 1.
Tue May 25 09:50:36 2004
sql: prodding the archiver
Tue May 25 09:50:36 2004
SMON: enabling cache recovery
Tue May 25 09:50:36 2004
ARC0: received prod
Tue May 25 09:50:39 2004
SMON: enabling tx recovery
Tue May 25 09:50:39 2004
Completed: alter database open
Tue May 25 10:00:59 2004
LGWR: prodding the archiver
Tue May 25 10:00:59 2004
Windows is still holding the semaphores for this database.
Reboot the machine.
Also this problem occurs when there cannot be 2 instances running with the same db name on the machine
Nope - we have it, but the IT Manager and the Net Admins say they are too busy to fully test the impact of the Agent (I think, in reality, they don't want to spend a week documenting it and writing up the validation docs and change controls). In the meantime the system crashes and has to be rebooted by support.
Its frustrating, but the balls firmly in their court. Management know whats happening as I've made it clear whats going on.
Bookmarks