This might be a dumb question, but I can't find explicit information about it.
The question is this:
I have my controlfiles multiplexed across different disks (or in this case across a number of mounted NFS volumes). If one of the those volumes became unavailable and made one of the controlfile copies unavailable, would it cause the database to crash?
I have never had an experience were one of the volumes hosting a copy of the controlfile became unavailable. I was always under the assumption that multiplexing controlfiles, redo, and archive would prevent a crash as long as there was one copy of each file available.
Your database will not be available but you can recover easily from a lost controlfile if they are multiplexed. You can either remove the unavailable controlfile from your pfile/spfile and startup the database. Or you can shutdown the database, copy the controlfile to another location, register the changes in your pfile/spfile and then startup the database
Ok... so you are guys are telling that...
"...once any copy (multiplexed) control file goes missin'... your db will come down!"
So.. I've tried to put this to the test. On a sandbox, I started the instance. It's got 3 redo log groups, each with 3 members multiplexed across 3 different volumes. It's controlfile is multiplexed 3-way across 3 different volumes.
I deleted ALL the datafiles, one of the controlfiles, and on redo member from each group.
The database is still running.
I've done some DML's since then. I've also done a "logfile switch" and forced a "checkpoint". And then did some more DMLs, but the database is still running.
What am I not simulating correctly to prove your point that the database will come down?
Also, I see errors in the alert log about missing redo logs when I do a logfile switch, but I see no errors when I force a checkpoint. Shouldn't I see errors regarding missing control files in one of both of these commands?
I found (I think) my answer at Metalink...
It has something to do with being on a Unix box and Oracle still having a hold on the Inode even after I "rm" one of the controlfiles.
I guess my real quest in this thread is to find a way to provide point-of-failure recovery using new hardware I just received. And I think I found my answer, but please confirm....
I thought I needed the following to be able to have complete (point-of-failure) recovery in the event of media failure:
- Data Files (from last backup)
- Archived Redo Logs (since last backup)
- At least one member of any of the online redo log group
- AND... a copy of the controlfile
But as it turns out (when I tested this) I don't need a copy of the controlfile at all. As long as I has a trace of of control file which reflects all the current datafiles, then I can run a create control file script to recreate the control file and having the online redo (along with Archived ones) will allow for the database to recover to point-of-failure.
I ran DMLs to insert 5 records.
I removed all files in /u01.
I aborted the database.
I replaced datafile in /u01 from backup in /u03.
Ran create control script using controlfile trace that was taken from previous backup.
Performed recovery.
Opened the the database.
All previously-inserted five records are still in the database.
If all this is true, then I guess control file has less factor when you have copy of online redo log.
what if you added a file after the last controlfile to trace backup? add more redo groups, change any paths?
Understood. These changes wouldn't be captured in the trace of the control, but not impossible to overcome. And this is really why one should perform a backup after any major changes, such as these.
At any rate, I think my point is the same and true that controlfiles don't play a huge factor in point-of-failure recovery... at least NOT if you have the other components I previously listed.
This thread really started because I thought I needed to multiplex controlfiles to a remote location to ensure point-of-failure recoverability. But there could be connectivity issues with that remote location. I wanted to multiplex the control file to another building, but didn't want any hick-ups in the network to cause a database outage due to missing control file.
But now, I think I've confirmed that I don't need to multiplex the control to the remote location - I mean, I still need to multiplex it, but only on local storage where connectivity is not an issue. I only need to multiplex the online redo logs and archive location to the remote location. Should connectivity issue arise, the database can tolerate not finding one of the members of a redo group and one of the archived destination.
Yeah.. I think the complexity introduced would be unjustifable. Beside, I have existing infrastructure that I can take advantage of that would meet the same requirements with a much simpler implementation.
Bookmarks