I have some problema with HA database application on Oracle cluster.
------- 1 Enviromnment -------------
Cluster (2 nodes)
---- Solaris 8.0, SUN Cluster 3.0, Veritas Vol Manager.
DB 2 instanses (1 per node)
---- Oracle 9i Real Application cluster
Clients (2 types)
---- 1. Solaris 8.0, SUN Box, Java Application server (java application)
---- 2. Linux Red Hat 7.x (sqlplus, java application)
--- 2 Net Config --------------------
on each nodes we use same listener configs:
-- dynamic instances registration and load balansing
on both types of clients and on servers we tuned system tcp (kernel) parameter:
TCP_KEEPALIVE_TIMEOUT (~30 sec)
JAVA CLIENT APPLICATION use OCI Driver Transparent Application Failover (public int callbackFn()
from Oracle OSI JDBC Driver) for check condition of network connection, instances ...
----- Report from developers (with my comments) ---------------------
Here’s the status of TAF as of Tues, 02/12/02 :
Manage to get a TAF callback from OCI driver but it crashes the JVM. An internal error in JVM is
experienced with JDK 1.3.1 and 1.4rc;a segmentation violation is encountered with JDK 1.2.
This applies to both pooled and non-pooled connections(OracleConnection object).
TAF callback wasn’t at all possible before, because f problems with JDK 1.4 beta3.
Interesting enough, Oracle only supports TAF callback from OCI driver
version 901 and JDK 1.2 – but we all know it doesn’t work anyways.
The test uses OCI Driver of version 901 with different versions of JDK from 1.2 to 1.4.
TAF works only on non-pooled connections for select queries(without registering callback)
for all versions of JDK tested.
Comment: JAVA PROG. got: -- ORA-3113 - End of communication channel
SQLPLUS got -- ORA-3113 - End of communication channel (if client box lost cable connection)
and still alive in any others cases (i killed user processes, listeners and shutduwned abort instatnces)
in both applications i can use (METHOD=preconnect) and it works good
in any way and can't use (TYPE=select) if i had lost cable connection, i lost last transaction.
The problem still exists for TAF using non-pooled resources for update queries.
Gene of Quadrix is investigating into this. There still remains a glimmer of hope that if the rollback
problem is resolved, TAF might work as in select queries.
Comment: JAVA PROG. got oracle error 25402 and 25425 (oracle can't use rollback segment for transaction)
because only 1 instance from cluster can write to rbs and each instance can only read from any rbs)
Java client simulated preConnect still works for select. I’ll test to see if it works for insert though
it might be a little tricky because we’re trying to rollback on a segment that sits on the dead instance.
------------ end of report ------------------------------------------
My questions are:
1) if i executinig UPDATE, INSERT, DELERE can i solve problem with RBS, if i lost instance?
2) if i lost cable connection, can i solve problem with (ORA-3113 - End of communication channel)
using ((METHOD=preconnect) (TYPE=select))
Thanks a lot Julian for your answers.
Because its weren't simple questions.
About first question:
I knew about that only 1 instance can own 1 RBS (and doesn't matter it is PUBLIC or not)
But i had very small hope, that i can still "update.." transaction alive.
(for examlpe using same like ROLLBACK FORCE ...).
About second question:
i tested both connection methods (PRECONNECT and BASIC) with some results.
On all cases exept loosing cable connection i had positive result.
(..(backup=sun21) switched "select" on next listener and after short timeout,
~2-4 sec, "select" restored)
When i loose cable connection i saw following:
- (on server side - in tcp log solaris wrote "...problem with cable connection) and
turn on timeout ~30 sec (tcp_keepalive_timeout)
after this timeout local listener sent message to user process ".. we lost session's connection"
and user process (on primary instance) killes user session.
- (on client side, linux turn on timeout ~30 sec (tcp_keepalive_timeout) and
didn't say to sqlnet nothing.
sqlnet was winting 30 sec and then switch connection on BACKUP instance,
(i saw that, when analyzed sqlnet and listener's log files (in admin and support modes)
but in that time server side (oracle instance) had been killed user session.