-
Oracle Dataguard errors/confusion
Hi all,
I am trying to build a single standby db for a 2 node RAC db on RHEL 4 using Oracle 10.2.0.1
I am having the following errors:
In node1 of the LIVE RAC db:
Code:
select message from v$dataguard_status;
Error 1017 received logging on to the standby
PING[ARC0]: Heartbeat failed to connect to standby 'mmgdrs'. Error is 16191.
Error 1017 received logging on to the standby
PING[ARC0]: Heartbeat failed to connect to standby 'mmgdrs'. Error is 16191.
Error 1017 received logging on to the standby
PING[ARC0]: Heartbeat failed to connect to standby 'mmgdrs'. Error is 16191.
Error 1017 received logging on to the standby
PING[ARC0]: Heartbeat failed to connect to standby 'mmgdrs'. Error is 16191.
Error 1017 received logging on to the standby
PING[ARC0]: Heartbeat failed to connect to standby 'mmgdrs'. Error is 16191.
Error 1017 received logging on to the standby
I understand one of the causes for the the above is not having the same password for sys in the password file. I have ensured that this is the same. When I ty a remote connect from node 1 to DRS i can occasionally log in but sometimes I get SQL connection timeout errors!
On node 2 I am seeing the following errors:
Code:
PING[ARC0]: Heartbeat failed to connect to standby 'mmgdrs'. Error is 1031.
Error 1031 received logging on to the standby
PING[ARC0]: Heartbeat failed to connect to standby 'mmgdrs'. Error is 1031.
Error 1031 received logging on to the standby
PING[ARC0]: Heartbeat failed to connect to standby 'mmgdrs'. Error is 1031.
MESSAGE
--------------------------------------------------------------------------------
Error 1031 received logging on to the standby
PING[ARC0]: Heartbeat failed to connect to standby 'mmgdrs'. Error is 1031.
PING[ARC0]: Heartbeat failed to connect to standby 'mmgdrs'. Error is 16058.
Now for the strangest thing: In the standby alert log i see:
Code:
Wed May 7 00:23:32 2008
RFS[1]: Archived Log: '+MMGDRS1/mmgdrs/archivelog/2008_05_07/thread_2_seq_6938.1128.654049215'
Wed May 7 00:26:49 2008
RFS[1]: Archived Log: '+MMGDRS1/mmgdrs/archivelog/2008_05_07/thread_2_seq_6939.1127.654049413'
Wed May 7 00:30:07 2008
RFS[1]: Archived Log: '+MMGDRS1/mmgdrs/archivelog/2008_05_07/thread_2_seq_6940.1130.654049611'
Wed May 7 00:33:23 2008
RFS[1]: Archived Log: '+MMGDRS1/mmgdrs/archivelog/2008_05_07/thread_2_seq_6941.1120.654049807'
Wed May 7 00:36:38 2008
RFS[1]: Archived Log: '+MMGDRS1/mmgdrs/archivelog/2008_05_07/thread_2_seq_6942.1103.654050003'
It seems to be fetching the logs from somewhere, but only for thread 2! However its not applying them. I am posting the Dataguard specific parameters :
LIVE RAC:
Code:
SQL> show parameter log_archive
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
log_archive_config string dg_config=(mmgdrs,BOF2)
log_archive_dest string
log_archive_dest_1 string LOCATION=+LOG/BOF2
log_archive_dest_10 string
log_archive_dest_2 string service=mmgdrs valid_for=(ALL_
LOGFILES,PRIMARY_ROLE) db_uniq
ue_name=mmgdrs
DRS:
Code:
SQL> show parameter log_archive
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
log_archive_config string dg_config=(mmgdrs,bof2)
log_archive_dest string
log_archive_dest_1 string LOCATION=+MMGDRS1/
log_archive_dest_10 string
log_archive_dest_2 string service=bof2 valid_for=(online
_logfiles,primary_role) db_uni
que_name=bof2
Any help will be highy appreciated
Chucks
-
It may be a password issue.
Check that remote_login_passwordfile is set to SHARED or EXCLUSIVE.
Recreate the password file of standby database with the same password of primary database password file.
"The person who says it cannot be done should not interrupt the person doing it." --Chinese Proverb
-
Hi there,
This is what i initially thought. The password sharing is set to EXCLUSIVE and the sys password is the same for all nodes. What i cannot understand is how the archive logs are the standby when there are errors on both primary nodes!
Any advice will be highly appreciated,
Thanks in advance,
Chucks
-
Hi
Try recreating the password file on the standby database and see with the same database as the primary and give it a try.
Your alert log seems to point to a password mismtach only
regards
Hrishy
-
Hi Hrishy,
Thanks for the reply. I did recreate the password file but to no avail. To confirm i can log on the standby remotely from the live nodes like:
sys/pass@mmgdrs as sysdba <- where pass is the same for all nodes.
One thing I have noticed is that if a do the above from the first node i get:
Code:
Copyright (c) 1982, 2005, Oracle. All rights reserved.
ERROR:
ORA-12170: TNS:Connect timeout occurred
If i try again, i am able to connect. I am able to connect from the second node.
Just to confirm the sys user password is the same as the password in the password file?
Thanks in advance,
Chucks
-
Hi Chucks
I have been thinking about this issue.
Delete the password file from first node and recreate it and then give it a try.
regards
Hrishy
-
Hi Hrishy,
Thanks for that,
As this is a LIVE db, will recreating the password file have any problems ? Why do you suggest only only on node 1 ? Is it because no logs are going through from this node?
Thanks again
-
Hi
At the next oppurtunity you should again try recreating the password file when the database is shutdown on node 1.
and then restart the database and attempt a sysdba connection from both nodes to the datagaurd database
Going by the symptoms i am having a feeling that there is a problem with the password file of node1 .
(i am not 100% sure though)
regards
Hrishy
-
Hi Hrishy,
I just noticed something on the db environment (as you can probably tell, I did not set up this RAC cluster). On both nodes in $ORACLE_HOME/dbs
Node 1
lrwxrwxrwx 1 oracle dba 9 Dec 14 11:18 orapwbof212 -> orapwbof2
lrwxrwxrwx 1 oracle dba 9 Dec 14 11:18 orapwbof21 -> orapwbof2
Node 2
-rw-r----- 2 oracle dba 1536 Mar 12 15:57 orapwbof212
-rw-r----- 2 oracle dba 1536 Mar 12 15:57 orapwbof2
Now as you can see obviously the set up is a bit weird and i not sure why on node 1 the password file is called bof212 when the actual instance is bof211. I also need to say that the application has been designed so that it accesses node bof212 ( to be honest there is no need for the RAC config as node 1 is redundant but i need its logs in order to synch the standby!). How can i tell which password file is being used?
Thanks in advance
-
That is the problem! password files should be named: orapwbof211 and orapwbof212 (like orapw${ORACLE_SID})
"The person who says it cannot be done should not interrupt the person doing it." --Chinese Proverb
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|