I have some problema with HA database application on Oracle cluster.

------- 1 Enviromnment -------------
Cluster (2 nodes)
---- Solaris 8.0, SUN Cluster 3.0, Veritas Vol Manager.
DB 2 instanses (1 per node)
---- Oracle 9i Real Application cluster
Clients (2 types)
---- 1. Solaris 8.0, SUN Box, Java Application server (java application)
---- 2. Linux Red Hat 7.x (sqlplus, java application)

--- 2 Net Config --------------------
SERVER SIDE:
on each nodes we use same listener configs:
-- dynamic instances registration and load balansing



----listener.ora------------------------------------------
LISTENER =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = sun-rac01)(PORT = 1521))
)
)
)

SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(SID_NAME = PLSExtProc)
(ORACLE_HOME = /opt/oracle/product/9.0.1)
(PROGRAM = extproc)
)
)



----tnsnames.ora------------------------------------------

EXTPROC_CONNECTION_DATA =
(DESCRIPTION =
(ADDRESS = (PROTOCOL=IPC) (KEY=EXTPROC))
(CONNECT_DATA =
(SID=PLSExtProc) (PRESENTATION=RO)
)
)
sun1 =
(DESCRIPTION =
(ADDRESS = (PROTOCOL=TCP) (HOST=sun-rac01) (PORT=1521))
(CONNECT_DATA =
(SERVICE_NAME=base) (INSTANCE_NAME=base1)
)
)
sun2 =
(DESCRIPTION =
(ADDRESS = (PROTOCOL=TCP) (HOST=sun-rac02) (PORT=1521))
(CONNECT_DATA =
(SERVICE_NAME=base) (INSTANCE_NAME=base2)
)
)

----init.ora------------------------------------------

cluster_database_instances=2
local_listener=sun1
remote_listener=sun2

and

cluster_database_instances=1
local_listener=sun2
remote_listener=sun1
....
undo_management=AUTO
base2.undo_tablespace=ROLL_02
base1.undo_tablespace=ROLL_01

CLIENT SIDE:

----tnsnames.ora------------------------------------------
sun12 =
(DESCRIPTION=
(ADDRESS = (PROTOCOL=TCP) (HOST=sun-rac01) (PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=base) (INSTANCE_NAME=base1)
(FAILOVER_MODE=
(BACKUP=sun21) (TYPE=select) (METHOD=preconnect)
)
)
)
sun21 =
(DESCRIPTION=
(ADDRESS = (PROTOCOL=TCP) (HOST=sun-ps-rac02) (PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=base) (INSTANCE_NAME=base2)
(FAILOVER_MODE=
(BACKUP=sun12) (TYPE=select) (METHOD=preconnect)
)
)
)

...
sun_g =
(DESCRIPTION=
(LOAD_BALANCE = OFF)
(FAILOVER = ON)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL=TCP) (HOST=sun-rac01) (PORT=1521))
(ADDRESS = (PROTOCOL=TCP) (HOST=sun-rac02) (PORT=1521))
)
(CONNECT_DATA=
(SERVICE_NAME=base)
(FAILOVER_MODE=
(BACKUP=sun21) (TYPE=select) (METHOD=preconnect)
)
)
)

on both types of clients and on servers we tuned system tcp (kernel) parameter:
TCP_KEEPALIVE_TIMEOUT (~30 sec)

JAVA CLIENT APPLICATION use OCI Driver Transparent Application Failover (public int callbackFn()
from Oracle OSI JDBC Driver) for check condition of network connection, instances ...

----- Report from developers (with my comments) ---------------------

All,

Here’s the status of TAF as of Tues, 02/12/02 :

Manage to get a TAF callback from OCI driver but it crashes the JVM. An internal error in JVM is
experienced with JDK 1.3.1 and 1.4rc;a segmentation violation is encountered with JDK 1.2.
This applies to both pooled and non-pooled connections(OracleConnection object).
TAF callback wasn’t at all possible before, because f problems with JDK 1.4 beta3.
Interesting enough, Oracle only supports TAF callback from OCI driver
version 901 and JDK 1.2 – but we all know it doesn’t work anyways.
The test uses OCI Driver of version 901 with different versions of JDK from 1.2 to 1.4.

TAF works only on non-pooled connections for select queries(without registering callback)
for all versions of JDK tested.

****
Comment: JAVA PROG. got: -- ORA-3113 - End of communication channel
SQLPLUS got -- ORA-3113 - End of communication channel (if client box lost cable connection)
and still alive in any others cases (i killed user processes, listeners and shutduwned abort instatnces)
!!!
in both applications i can use (METHOD=preconnect) and it works good
in any way and can't use (TYPE=select) if i had lost cable connection, i lost last transaction.
****

The problem still exists for TAF using non-pooled resources for update queries.
Gene of Quadrix is investigating into this. There still remains a glimmer of hope that if the rollback
problem is resolved, TAF might work as in select queries.

***
Comment: JAVA PROG. got oracle error 25402 and 25425 (oracle can't use rollback segment for transaction)
because only 1 instance from cluster can write to rbs and each instance can only read from any rbs)
***

Java client simulated preConnect still works for select. I’ll test to see if it works for insert though
it might be a little tricky because we’re trying to rollback on a segment that sits on the dead instance.

------------ end of report ------------------------------------------

My questions are:

1) if i executinig UPDATE, INSERT, DELERE can i solve problem with RBS, if i lost instance?

2) if i lost cable connection, can i solve problem with (ORA-3113 - End of communication channel)
using ((METHOD=preconnect) (TYPE=select))