Oracle Dataguard errors/confusion
DBAsupport.com Forums - Powered by vBulletin
Page 1 of 2 12 LastLast
Results 1 to 10 of 12

Thread: Oracle Dataguard errors/confusion

  1. #1
    Join Date
    Dec 2001
    Posts
    337

    Oracle Dataguard errors/confusion

    Hi all,

    I am trying to build a single standby db for a 2 node RAC db on RHEL 4 using Oracle 10.2.0.1

    I am having the following errors:

    In node1 of the LIVE RAC db:

    Code:
    select message from v$dataguard_status;
    Error 1017 received logging on to the standby
    PING[ARC0]: Heartbeat failed to connect to standby 'mmgdrs'. Error is 16191.
    Error 1017 received logging on to the standby
    PING[ARC0]: Heartbeat failed to connect to standby 'mmgdrs'. Error is 16191.
    Error 1017 received logging on to the standby
    PING[ARC0]: Heartbeat failed to connect to standby 'mmgdrs'. Error is 16191.
    Error 1017 received logging on to the standby
    PING[ARC0]: Heartbeat failed to connect to standby 'mmgdrs'. Error is 16191.
    Error 1017 received logging on to the standby
    PING[ARC0]: Heartbeat failed to connect to standby 'mmgdrs'. Error is 16191.
    Error 1017 received logging on to the standby
    I understand one of the causes for the the above is not having the same password for sys in the password file. I have ensured that this is the same. When I ty a remote connect from node 1 to DRS i can occasionally log in but sometimes I get SQL connection timeout errors!

    On node 2 I am seeing the following errors:

    Code:
    PING[ARC0]: Heartbeat failed to connect to standby 'mmgdrs'. Error is 1031.
    Error 1031 received logging on to the standby
    PING[ARC0]: Heartbeat failed to connect to standby 'mmgdrs'. Error is 1031.
    Error 1031 received logging on to the standby
    PING[ARC0]: Heartbeat failed to connect to standby 'mmgdrs'. Error is 1031.
    
    MESSAGE
    --------------------------------------------------------------------------------
    Error 1031 received logging on to the standby
    PING[ARC0]: Heartbeat failed to connect to standby 'mmgdrs'. Error is 1031.
    PING[ARC0]: Heartbeat failed to connect to standby 'mmgdrs'. Error is 16058.
    Now for the strangest thing: In the standby alert log i see:

    Code:
    Wed May  7 00:23:32 2008
    RFS[1]: Archived Log: '+MMGDRS1/mmgdrs/archivelog/2008_05_07/thread_2_seq_6938.1128.654049215'
    Wed May  7 00:26:49 2008
    RFS[1]: Archived Log: '+MMGDRS1/mmgdrs/archivelog/2008_05_07/thread_2_seq_6939.1127.654049413'
    Wed May  7 00:30:07 2008
    RFS[1]: Archived Log: '+MMGDRS1/mmgdrs/archivelog/2008_05_07/thread_2_seq_6940.1130.654049611'
    Wed May  7 00:33:23 2008
    RFS[1]: Archived Log: '+MMGDRS1/mmgdrs/archivelog/2008_05_07/thread_2_seq_6941.1120.654049807'
    Wed May  7 00:36:38 2008
    RFS[1]: Archived Log: '+MMGDRS1/mmgdrs/archivelog/2008_05_07/thread_2_seq_6942.1103.654050003'
    It seems to be fetching the logs from somewhere, but only for thread 2! However its not applying them. I am posting the Dataguard specific parameters :

    LIVE RAC:
    Code:
    SQL> show parameter log_archive
    
    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- ------------------------------
    log_archive_config                   string      dg_config=(mmgdrs,BOF2)
    log_archive_dest                     string
    log_archive_dest_1                   string      LOCATION=+LOG/BOF2
    log_archive_dest_10                  string
    log_archive_dest_2                   string      service=mmgdrs valid_for=(ALL_
                                                     LOGFILES,PRIMARY_ROLE) db_uniq
                                                     ue_name=mmgdrs
    DRS:

    Code:
    SQL> show parameter log_archive
    
    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- ------------------------------
    log_archive_config                   string      dg_config=(mmgdrs,bof2)
    log_archive_dest                     string
    log_archive_dest_1                   string      LOCATION=+MMGDRS1/
    log_archive_dest_10                  string
    log_archive_dest_2                   string      service=bof2 valid_for=(online
                                                     _logfiles,primary_role) db_uni
                                                     que_name=bof2
    Any help will be highy appreciated
    Chucks

  2. #2
    Join Date
    Jul 2002
    Location
    Lake Worth, FL
    Posts
    1,483

    Talking

    It may be a password issue.

    Check that remote_login_passwordfile is set to SHARED or EXCLUSIVE.

    Recreate the password file of standby database with the same password of primary database password file.

    "The person who says it cannot be done should not interrupt the person doing it." --Chinese Proverb

  3. #3
    Join Date
    Dec 2001
    Posts
    337
    Hi there,

    This is what i initially thought. The password sharing is set to EXCLUSIVE and the sys password is the same for all nodes. What i cannot understand is how the archive logs are the standby when there are errors on both primary nodes!

    Any advice will be highly appreciated,

    Thanks in advance,
    Chucks

  4. #4
    Join Date
    Jan 2001
    Posts
    2,828
    Hi

    Try recreating the password file on the standby database and see with the same database as the primary and give it a try.

    Your alert log seems to point to a password mismtach only

    regards
    Hrishy

  5. #5
    Join Date
    Dec 2001
    Posts
    337
    Hi Hrishy,

    Thanks for the reply. I did recreate the password file but to no avail. To confirm i can log on the standby remotely from the live nodes like:

    sys/pass@mmgdrs as sysdba <- where pass is the same for all nodes.

    One thing I have noticed is that if a do the above from the first node i get:

    Code:
    Copyright (c) 1982, 2005, Oracle.  All rights reserved.
    
    ERROR:
    ORA-12170: TNS:Connect timeout occurred
    If i try again, i am able to connect. I am able to connect from the second node.

    Just to confirm the sys user password is the same as the password in the password file?

    Thanks in advance,
    Chucks

  6. #6
    Join Date
    Jan 2001
    Posts
    2,828
    Hi Chucks

    I have been thinking about this issue.

    Delete the password file from first node and recreate it and then give it a try.

    regards
    Hrishy

  7. #7
    Join Date
    Dec 2001
    Posts
    337
    Hi Hrishy,

    Thanks for that,

    As this is a LIVE db, will recreating the password file have any problems ? Why do you suggest only only on node 1 ? Is it because no logs are going through from this node?

    Thanks again

  8. #8
    Join Date
    Jan 2001
    Posts
    2,828
    Hi

    At the next oppurtunity you should again try recreating the password file when the database is shutdown on node 1.

    and then restart the database and attempt a sysdba connection from both nodes to the datagaurd database

    Going by the symptoms i am having a feeling that there is a problem with the password file of node1 .
    (i am not 100% sure though)

    regards
    Hrishy

  9. #9
    Join Date
    Dec 2001
    Posts
    337
    Hi Hrishy,

    I just noticed something on the db environment (as you can probably tell, I did not set up this RAC cluster). On both nodes in $ORACLE_HOME/dbs

    Node 1
    lrwxrwxrwx 1 oracle dba 9 Dec 14 11:18 orapwbof212 -> orapwbof2
    lrwxrwxrwx 1 oracle dba 9 Dec 14 11:18 orapwbof21 -> orapwbof2

    Node 2
    -rw-r----- 2 oracle dba 1536 Mar 12 15:57 orapwbof212
    -rw-r----- 2 oracle dba 1536 Mar 12 15:57 orapwbof2

    Now as you can see obviously the set up is a bit weird and i not sure why on node 1 the password file is called bof212 when the actual instance is bof211. I also need to say that the application has been designed so that it accesses node bof212 ( to be honest there is no need for the RAC config as node 1 is redundant but i need its logs in order to synch the standby!). How can i tell which password file is being used?

    Thanks in advance

  10. #10
    Join Date
    Jul 2002
    Location
    Lake Worth, FL
    Posts
    1,483

    Talking

    That is the problem! password files should be named: orapwbof211 and orapwbof212 (like orapw${ORACLE_SID})
    "The person who says it cannot be done should not interrupt the person doing it." --Chinese Proverb

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Click Here to Expand Forum to Full Width