9i RAC problems with Transparent Application Failover
DBAsupport.com Forums - Powered by vBulletin
Results 1 to 3 of 3

Thread: 9i RAC problems with Transparent Application Failover

Hybrid View

  1. #1
    Join Date
    Sep 2001
    Location
    NJ, USA
    Posts
    1,287
    I have some problema with HA database application on Oracle cluster.

    ------- 1 Enviromnment -------------
    Cluster (2 nodes)
    ---- Solaris 8.0, SUN Cluster 3.0, Veritas Vol Manager.
    DB 2 instanses (1 per node)
    ---- Oracle 9i Real Application cluster
    Clients (2 types)
    ---- 1. Solaris 8.0, SUN Box, Java Application server (java application)
    ---- 2. Linux Red Hat 7.x (sqlplus, java application)

    --- 2 Net Config --------------------
    SERVER SIDE:
    on each nodes we use same listener configs:
    -- dynamic instances registration and load balansing



    ----listener.ora------------------------------------------
    LISTENER =
    (DESCRIPTION_LIST =
    (DESCRIPTION =
    (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
    )
    (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = TCP)(HOST = sun-rac01)(PORT = 1521))
    )
    )
    )

    SID_LIST_LISTENER =
    (SID_LIST =
    (SID_DESC =
    (SID_NAME = PLSExtProc)
    (ORACLE_HOME = /opt/oracle/product/9.0.1)
    (PROGRAM = extproc)
    )
    )



    ----tnsnames.ora------------------------------------------

    EXTPROC_CONNECTION_DATA =
    (DESCRIPTION =
    (ADDRESS = (PROTOCOL=IPC) (KEY=EXTPROC))
    (CONNECT_DATA =
    (SID=PLSExtProc) (PRESENTATION=RO)
    )
    )
    sun1 =
    (DESCRIPTION =
    (ADDRESS = (PROTOCOL=TCP) (HOST=sun-rac01) (PORT=1521))
    (CONNECT_DATA =
    (SERVICE_NAME=base) (INSTANCE_NAME=base1)
    )
    )
    sun2 =
    (DESCRIPTION =
    (ADDRESS = (PROTOCOL=TCP) (HOST=sun-rac02) (PORT=1521))
    (CONNECT_DATA =
    (SERVICE_NAME=base) (INSTANCE_NAME=base2)
    )
    )

    ----init.ora------------------------------------------

    cluster_database_instances=2
    local_listener=sun1
    remote_listener=sun2

    and

    cluster_database_instances=1
    local_listener=sun2
    remote_listener=sun1
    ....
    undo_management=AUTO
    base2.undo_tablespace=ROLL_02
    base1.undo_tablespace=ROLL_01

    CLIENT SIDE:

    ----tnsnames.ora------------------------------------------
    sun12 =
    (DESCRIPTION=
    (ADDRESS = (PROTOCOL=TCP) (HOST=sun-rac01) (PORT=1521))
    (CONNECT_DATA=
    (SERVICE_NAME=base) (INSTANCE_NAME=base1)
    (FAILOVER_MODE=
    (BACKUP=sun21) (TYPE=select) (METHOD=preconnect)
    )
    )
    )
    sun21 =
    (DESCRIPTION=
    (ADDRESS = (PROTOCOL=TCP) (HOST=sun-ps-rac02) (PORT=1521))
    (CONNECT_DATA=
    (SERVICE_NAME=base) (INSTANCE_NAME=base2)
    (FAILOVER_MODE=
    (BACKUP=sun12) (TYPE=select) (METHOD=preconnect)
    )
    )
    )

    ...
    sun_g =
    (DESCRIPTION=
    (LOAD_BALANCE = OFF)
    (FAILOVER = ON)
    (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL=TCP) (HOST=sun-rac01) (PORT=1521))
    (ADDRESS = (PROTOCOL=TCP) (HOST=sun-rac02) (PORT=1521))
    )
    (CONNECT_DATA=
    (SERVICE_NAME=base)
    (FAILOVER_MODE=
    (BACKUP=sun21) (TYPE=select) (METHOD=preconnect)
    )
    )
    )

    on both types of clients and on servers we tuned system tcp (kernel) parameter:
    TCP_KEEPALIVE_TIMEOUT (~30 sec)

    JAVA CLIENT APPLICATION use OCI Driver Transparent Application Failover (public int callbackFn()
    from Oracle OSI JDBC Driver) for check condition of network connection, instances ...

    ----- Report from developers (with my comments) ---------------------

    All,

    Here’s the status of TAF as of Tues, 02/12/02 :

    Manage to get a TAF callback from OCI driver but it crashes the JVM. An internal error in JVM is
    experienced with JDK 1.3.1 and 1.4rc;a segmentation violation is encountered with JDK 1.2.
    This applies to both pooled and non-pooled connections(OracleConnection object).
    TAF callback wasn’t at all possible before, because f problems with JDK 1.4 beta3.
    Interesting enough, Oracle only supports TAF callback from OCI driver
    version 901 and JDK 1.2 – but we all know it doesn’t work anyways.
    The test uses OCI Driver of version 901 with different versions of JDK from 1.2 to 1.4.

    TAF works only on non-pooled connections for select queries(without registering callback)
    for all versions of JDK tested.

    ****
    Comment: JAVA PROG. got: -- ORA-3113 - End of communication channel
    SQLPLUS got -- ORA-3113 - End of communication channel (if client box lost cable connection)
    and still alive in any others cases (i killed user processes, listeners and shutduwned abort instatnces)
    !!!
    in both applications i can use (METHOD=preconnect) and it works good
    in any way and can't use (TYPE=select) if i had lost cable connection, i lost last transaction.
    ****

    The problem still exists for TAF using non-pooled resources for update queries.
    Gene of Quadrix is investigating into this. There still remains a glimmer of hope that if the rollback
    problem is resolved, TAF might work as in select queries.

    ***
    Comment: JAVA PROG. got oracle error 25402 and 25425 (oracle can't use rollback segment for transaction)
    because only 1 instance from cluster can write to rbs and each instance can only read from any rbs)
    ***

    Java client simulated preConnect still works for select. I’ll test to see if it works for insert though
    it might be a little tricky because we’re trying to rollback on a segment that sits on the dead instance.

    ------------ end of report ------------------------------------------

    My questions are:

    1) if i executinig UPDATE, INSERT, DELERE can i solve problem with RBS, if i lost instance?

    2) if i lost cable connection, can i solve problem with (ORA-3113 - End of communication channel)
    using ((METHOD=preconnect) (TYPE=select))


  2. #2
    Join Date
    Jun 2001
    Location
    Helsinki. Finland
    Posts
    3,938
    My questions are:

    1) if i executinig UPDATE, INSERT, DELERE can i solve problem with RBS, if i lost instance?
    Every intsance has its own RBSs, so if you loose an instance with an uncommited transaction you loose the transaction. The select will continue but only if TYPE=SELECT in the clients tnsnames.ora.

    2) if i lost cable connection, can i solve problem with (ORA-3113 - End of communication channel)
    using ((METHOD=preconnect) (TYPE=select))
    Yes, but what do you mean by cable connection? I would not suggest preconnect (except if you can afford the memory).

    Use this for TAF:

    Code:
    OPS1_ADS4.WORLD =
     (DESCRIPTION_LIST = (LOAD_BALANCE=OFF)
      (DESCRIPTION = 
        (ADDRESS = (PROTOCOL = TCP)(HOST = 199.55.227.55)(PORT = 1521))
        (CONNECT_DATA = (SERVICE_NAME = ADS4)(INSTANCE_NAME=ADS01)
                        (FAILOVER_MODE = (TYPE = SESSION)(METHOD = BASIC)(BACKUP=OPS2_ADS4))
        )
      )
      )
     )
    
    OPS2_ADS4.WORLD =
     (DESCRIPTION_LIST = (LOAD_BALANCE=OFF)
      (DESCRIPTION = 
        (ADDRESS = (PROTOCOL = TCP)(HOST = 199.55.227.55)(PORT = 1521))
        (CONNECT_DATA = (SERVICE_NAME = ADS4)(INSTANCE_NAME=ADS02)
                        (FAILOVER_MODE = (TYPE = SESSION)(METHOD = BASIC)(BACKUP=OPS1_ADS4))
        )
      )
      )
     )
    You may replace SESSION with SELECT.

  3. #3
    Join Date
    Sep 2001
    Location
    NJ, USA
    Posts
    1,287
    Thanks a lot Julian for your answers.
    Because its weren't simple questions.

    About first question:
    I knew about that only 1 instance can own 1 RBS (and doesn't matter it is PUBLIC or not)
    But i had very small hope, that i can still "update.." transaction alive.
    (for examlpe using same like ROLLBACK FORCE ...).

    About second question:
    i tested both connection methods (PRECONNECT and BASIC) with some results.
    On all cases exept loosing cable connection i had positive result.
    (..(backup=sun21) switched "select" on next listener and after short timeout,
    ~2-4 sec, "select" restored)

    When i loose cable connection i saw following:
    1.
    - (on server side - in tcp log solaris wrote "...problem with cable connection) and
    turn on timeout ~30 sec (tcp_keepalive_timeout)
    after this timeout local listener sent message to user process ".. we lost session's connection"
    and user process (on primary instance) killes user session.
    2.
    - (on client side, linux turn on timeout ~30 sec (tcp_keepalive_timeout) and
    didn't say to sqlnet nothing.
    sqlnet was winting 30 sec and then switch connection on BACKUP instance,
    (i saw that, when analyzed sqlnet and listener's log files (in admin and support modes)
    but in that time server side (oracle instance) had been killed user session.

    and this situation is my problem.



    [Edited by Shestakov on 02-14-2002 at 01:41 PM]

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Click Here to Expand Forum to Full Width