Non Oracle failover software solution - Advice needed

**JMac** · 05-19-2006, 06:57 AM

Hi Davey.

Maybe the product is dodgy, but I still think my theory holds water ... no?

FAIL2 - the target failover DB - is always shutdown on the server whilst the data is being replicated to it. Including any change to the controlfiles that are flushed to disk from buffer.

What if the replication stops - because the source server falls out of the network - before the controlfile(s) can be updated with whatever is in the buffers? Its not data, so COMMITs etc don't come into it.

The startup on FAIL2 after a failover uses whatever datafiles were copied over or updated from the Source system. So the state of the physical controlfile(s) on disk might not be compliant with a database thats shut and is about to open (as they were last updated whilst the Source DB was open).

Here's a section of the TAR notes I have from Support - they missed the point a bit but kind of support my thinking:

ISSUE CLARIFICATION
====================

+ Platform : Windows 2003 server.
+ DB Server software 9.2.0.6
+ Are attempting to implement a hardware failover, high-availability solution for a production system using
'Sunbelt Software Double Take'.
+ SQL attempted is the MOUNT statement at at startup.
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kccsbck_first], [1], [1943107820], [], [], [], [], []
Current SQL statement for this session:
ALTER DATABASE MOUNT

UPDATE
=======
1/ As Oracle support doesn't give support on 'Sunbelt Software Double Take' , Please could you clarify the issue
as it seems you are using a High availability system environment without using our HA system named RAC
( Real Application cluster ) ?

2/ I have found the following article 243549.1 ( ONLY USE IN A RAC ENV.)
this issue could be related as the main problem sound like a NIC issue and Microsoft windows.
Comments :
---------------------
The Windows OS disables the internal NIC after experiencing the communications failure on
the interconnect, causing all the locally bound socket to fail. This causes the clusterware (CM Service) to exist in a dead state
although the database instance is actually up and running.

Please could you check this article , there is a link to microsoft support web site.

Thank you,
Regards.
PAUL GAMEIRO.

DATA COLLECTED
===============
ALERT LOG
-----------
The alert_vald33.log file shows:

Wed Apr 19 17:25:28 2006
Current log# 1 seq# 2560 mem# 0: E:\ORADATA\PMX33\VALD33\REDO01AVALD33.LOG
Current log# 1 seq# 2560 mem# 1: F:\ORADATA\PMX33\VALD33\REDO01BVALD33.LOG
Current log# 1 seq# 2560 mem# 2: G:\ORADATA\PMX33\VALD33\REDO01CVALD33.LOG
Wed Apr 19 17:25:28 2006
ARC0: Beginning to archive log 3 thread 1 sequence 2559
Creating archive destination LOG_ARCHIVE_DEST_1: 'J:\ORADATA\PMX33\VALD33\ARCHIVE\VALD332559.ARCLOG'
ARC0: Completed archiving log 3 thread 1 sequence 2559
Dump file e:\oradata\pmx33\vald33\admin\bdump\alert_vald33.log
Wed Apr 19 17:29:26 2006
ORACLE V9.2.0.6.0 - Production vsnsta=0
vsnsql=12 vsnxtr=3
Windows 2000 Version 5.2 Service Pack 1, CPU type 586
Wed Apr 19 17:29:26 2006
Starting ORACLE instance (normal)
Wed Apr 19 17:29:26 2006
Running with 1 strand for Non-Enterprise Edition
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
SCN scheme 2
Running with 1 strand for Non-Enterprise Edition
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up ORACLE RDBMS Version: 9.2.0.6.0.
System parameters with non-default values:
processes = 500
timed_statistics = TRUE
shared_pool_size = 159383552
sga_max_size = 1074341756
java_pool_size = 159383552
control_files = E:\oradata\pmx33\VALD33\controlAVALD33.ctl, F:\oradata\pmx33\
VALD33\controlBVALD33.ctl, G:\oradata\pmx33\VALD33\controlCVALD33.ctl
db_block_size = 8192
db_cache_size = 67108864
compatible = 9.2.0
log_archive_start = TRUE
log_archive_dest = J:\oradata\pmx33\VALD33\archive
log_archive_format = VALD33%s.arclog
log_buffer = 8192
log_checkpoint_interval = 40000
db_files = 20
db_file_multiblock_read_count= 8
fast_start_mttr_target = 0
dml_locks = 100
undo_management = AUTO
undo_tablespace = UNDOTBS
remote_login_passwordfile= EXCLUSIVE
db_domain =
instance_name = VALD33
background_dump_dest = E:\oradata\pmx33\VALD33\admin\bdump
user_dump_dest = E:\oradata\pmx33\VALD33\admin\udump
max_dump_file_size = 10240
core_dump_dest = E:\oradata\pmx33\VALD33\admin\cdump
sort_area_size = 524288
db_name = valid33
open_cursors = 300
PMON started with pid=2
DBW0 started with pid=3
LGWR started with pid=4
CKPT started with pid=5
SMON started with pid=6
RECO started with pid=7
Wed Apr 19 17:29:29 2006
ARCH: STARTING ARCH PROCESSES
ARC0 started with pid=8
ARC0: Archival started
ARC1 started with pid=9
Wed Apr 19 17:29:30 2006
ARCH: STARTING ARCH PROCESSES COMPLETE
Wed Apr 19 17:29:30 2006
Oracle Data Guard is not available in this edition of Oracle.
Wed Apr 19 17:29:30 2006
ARC0: Thread not mounted
Wed Apr 19 17:29:30 2006
alter database mount exclusive
Wed Apr 19 17:29:31 2006
ARC1: Archival started
Wed Apr 19 17:29:31 2006
ARC1: Thread not mounted
Wed Apr 19 17:29:35 2006
Errors in file e:\oradata\pmx33\vald33\admin\udump\vald33_ora_5600.trc:
ORA-00600: internal error code, arguments: [kccsbck_first], [1], [1943107820], [], [], [
], [], []
Wed Apr 19 17:29:37 2006
ORA-600 signalled during: alter database mount exclusive...
Starting ORACLE instance (normal)
Shutting down instance: further logons disabled
Shutting down instance (immediate)
...

TRACE FILE
------------
The vald33_ora_636.trc trace file shows:

*** SESSION ID:(9.1) 2006-04-19 17:31:24.961
*** 2006-04-19 17:31:24.961
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kccsbck_first], [1], [1943107820], [], [], [],
[], []
Current SQL statement for this session:
ALTER DATABASE MOUNT
----- Call Stack Trace -----
ksedmp ksfdmp kgerinv kgesinv ksesin kccsbck kccocf kcfcmb kcfmdb adbdrv opiexe opiosq0 kpooprx kpoal8 opiodr ttcpip opitsk opiino opiodr opi
drv sou2o opimai OracleThreadStart@4
======

The process state dump shows:

Process global information:
process: 45E617EC, call: 46077C5C, xact: 468BAB74, curses: 45EE23B4, usrses: 45EE23B4
----------------------------------------
SO: 45E617EC, type: 2, owner: 00000000, flag: INIT/-/-/0x00
(process) Oracle pid=10, calls cur/top: 46077C5C/46077C5C, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
...
SO: 45EE23B4, type: 4, owner: 45E617EC, flag: INIT/-/-/0x00
(session) trans: 468BAB74, creator: 45E617EC, flag: (41) USR/- BSY/-/-/-/-/-
DID: 0000-000A-00000008, short-term DID: 0000-0000-00000000
txn branch: 00000000
oct: 35, prv: 0, sql: 47FDCD00, psql: 47FDCD00, user: 0/SYS
O/S info: user: ad, term: CA1971, ospid: 4900:4232, machine: AEL\CA1971
program: sqlplus.exe
application name: sqlplus.exe, hash value=0
last wait for 'control file sequential read' blocking sess=0x0 seq=29 wait_time=42808
file#=2, block#=3, blocks=1
temporary object counter: 0
...
SO: 46077C5C, type: 3, owner: 45E617EC, flag: INIT/-/-/0x00
(call) sess: cur 45ee23b4, rec 45ee2d24, usr 45ee23b4; depth: 0
----------------------------------------
SO: 460E0510, type: 6, owner: 46077C5C, flag: INIT/-/-/0x00
(enqueue) CF-00000000-00000004 DID: 0000-000A-00000008
lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
res: 4615b960, mode: S, prv: 4615b968, sess: 45ee23b4, proc: 45e617ec
----------------------------------------
SO: 460E04C4, type: 6, owner: 46077C5C, flag: INIT/-/-/0x00
(enqueue) CF-00000000-00000000 DID: 0000-000A-00000007
lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
res: 4615b90c, mode: X, prv: 4615b914, sess: 45ee23b4, proc: 45e617ec
----------------------------------------
SO: 460E0478, type: 6, owner: 46077C5C, flag: INIT/-/-/0x00
(enqueue) IS-00000000-00000000 DID: 0000-000A-00000004
lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
res: 4615b864, mode: X, prv: 4615b86c, sess: 45ee23b4, proc: 45e617ec
----------------------------------------
SO: 45EE2D24, type: 4, owner: 46077C5C, flag: INIT/-/-/0x00
(session) trans: 00000000, creator: 00000000, flag: (2) -/REC -/-/-/-/-/-
DID: 0000-0000-00000000, short-term DID: 0000-0000-00000000
txn branch: 00000000
oct: 0, prv: 0, sql: 00000000, psql: 00000000, user: 0/SYS
temporary object counter: 0
===================================================

RESEARCH
=========
(Note: This is INTERNAL ONLY research. No action should be taken by the customer on this information.
This is research only, and may NOT be applicable to your specific situation.)

The ORA-600 [kccsbck_first] error here is described in Note:139013.1 as:

We receive this error because we are attempting to be the first thread/instance to mount the database and cannot because it appears that
at least one other thread has mounted the database already.

We therefore abort the mount attempt and log this error.

This could certainly be due to the Microsoft network configuration setting as described in Note:243549.1 due to
Microsoft bug Q239924. Basically, as the instance restarts successfully on the original server, it looks like not all the data has been flushed to the controlfiles before they are copied to the new server, hence causing the ORA-600 [kccsbck_first] as we get a different mount Id and find the controlfiles already marked as mounted by the previous instance.

21-APR-06 08:16:49 GMT

UPDATE
=======
Called +44 1244 845700.
Left a message for John advising him of the above updates. Informed him that the error here is occurring because the original instance is not fully shutdown correctly with ALL buffers flushed to disk, and so when the controlfiles are copied, they still show the instance mounted, which then conflicts when trying to mount the instance on the failover server.

Thread: Non Oracle failover software solution - Advice needed

Thread Tools

Display

Threaded View

Posting Permissions