-
Hello , I need help despartly.
I was running a stored preceedure and it came back with an error then I ran a commit by accident. Then the error I got was ORA-01591 lock held by in doubt distributed transaction.
So I check my error books and it told me to match the transaction number in the mesasge with the GLOBAL_TRAN_ID column of the dba_2pc_pending table, so I look and there was only one row, the transaction that it had locked by me. Our DBA is gone on vaction and stupid me delete that row from the dba_2pc_pending table thinking that would fix my problem, but it never I still get the same error and now I have nothing i the dba_2pc_pending table to even try to fix, I tried a rollback but that did not work. Does Anybody have any ideas??? I am really desparate.
Scott
-
Take a look a note 100664.1 on Metalink.
Tittle : HOW TO TROUBLESHOOT DISTRIBUTED TRANSACTIONS
Hope this helps
Gert
-
Our DBA is the only person who knows the login info, he is gone and I can't get a hold of him, I found some info on forcing a commit but I think where I delete the record form the dba_2pc_pending table, that it does not find the transaction to force the commit on. I think I am gonning up the river without a paddle.
I have shut down the DB and started it back up but the lock is still there.
Got any more ideas??
Thanks
Scott
-
Somethink interseting.
Do you have another server?
Type OS?
-
Here it is :
HOW TO TROUBLESHOOT DISTRIBUTED TRANSACTIONS
=============================================
You need to resolve a distributed transaction erorr because of
errors received. But before you try to implement any other steps,
please diagnose that it is or it is not a REAL distributed
transaction.
STEPS TO TROUBLESHOOT DISTRIBUTED TRANSACTIONS.
1. Get errors from alert.log.
2. Make sure that network is up and all db_links are valid (V$DBLINK AND
GV$DBLINK show the db_links currently used by distributed transactions)
SVRMGR> desc v$dblink
Column Name Null? Type
------------------------------ -------- ----
DB_LINK VARCHAR2(128)
OWNER_ID NUMBER
LOGGED_ON VARCHAR2(3)
HETEROGENEOUS VARCHAR2(3)
PROTOCOL VARCHAR2(6)
OPEN_CURSORS NUMBER
IN_TRANSACTION VARCHAR2(3)
UPDATE_SENT VARCHAR2(3)
COMMIT_POINT_STRENGTH NUMBER
SVRMGR> desc gv$dblink
Column Name Null? Type
------------------------------ -------- ----
INST_ID NUMBER
DB_LINK VARCHAR2(128)
OWNER_ID NUMBER
LOGGED_ON VARCHAR2(3)
HETEROGENEOUS VARCHAR2(3)
PROTOCOL VARCHAR2(6)
OPEN_CURSORS NUMBER
IN_TRANSACTION VARCHAR2(3)
UPDATE_SENT VARCHAR2(3)
COMMIT_POINT_STRENGTH NUMBER
3. Run query against DBA_2PC_PENDING:
SELECT
LOCAL_TRAN_ID, GLOBAL_TRAN_ID, STATE, MIXED, HOST, COMMIT#
FROM
DBA_2PC_PENDING
WHERE
LOCAL _TRAN_ID = '??.';
LOCAL _TRAN_ID is Local transactionID (number that is given when error is
reported). If LOCAL _TRAN_ID = GLOBAL_TRAN_ID that means that it means this
site is Global Coordinator, i.e. where transaction originated from.
You can also get LOCAL _TRAN_ID from the alert.log.
4. Run query against DBA_2PC_NEIGHBORS view on all Nodes:
SELECT LOCAL_TRAN_ID, IN_OUT, DATABASE, INTERFACE
FROM DBA_2PC_NEIGHBORS;
5. Get values for COMMIT_POINT_STRENGTH init.ora parameter on all Nodes.
If you are not aware of what other Nodes are participating in the distributed
transaction, query DBA_2PC_NEIGHBORS.
The Node with COMMIT_POINT_STRENGTH that has highest value in init.ora
has to commit first.
6. Look for LOCAL_TRAN_ID and GLOBAL_TRAN_ID from DBA_2PC_PENDING.
If LOCAL_TRAN_ID = GLOBAL_TRAN_ID it means this site is Global Coordinator,
i.e. where transaction originated from. Please note this site for recovery
issues.
7. Check STATE column from DBA_2PC_PENDING.
If STATE column states COMMIT, it means local database has successfully
committed.
That also means you most likely do not have to force anything on that Node.
Or if other Nodes are not commited and this particular Node happens to be a
Global Coordinator - you need to note COMMIT# i.e. SCN number from
DBA_2PC_PENDING view as it is THE SCN# you will need to use if you need to
perform COMMIT FORCE.
8. Take the GLOBAL_TRAN_ID and COMMIT# (SCN number from DBA_2PC_PENDING) and
compare it to the value on the other nodes.
On Nodes where there are no entries for this COMMIT# and transactionID in
DBA_2PC_PENDING, RECO process has resolved the problem and we don't have to
do anything.
Note: RECO is always automatically started whenever Distributed Transactions
are enabled.
If you issue a UNIX command
ps - ef | grep -i reco
and see RECO running, distributed transactions
were correctly enabled.
If you combine this output with values of init.ora parameter
DISTRIBUTED_TRANSACTONS (which should be higher than 0 for distributed
environment), you may safely say that customer is running in a distributed
environment even if your particular customer has not set it up.
Database links have to also be valid and operational. If they seem not to be
operational, please check V$DBLINK AND GV$DBLINK (if active transaction is
currently using db_links) because you are operating in a distributed
environment we have to apply distributed environment principles
to troubleshoot the problem.
9. If the STATE (from DBA_2PC_PENDING view) is PREPARED then this Node has not
completed transaction.
Take GLOBAL_TRAN_ID (from DBA_2PC_PENDING) and COMMIT# (SCN number) and
compare it to the value on the other nodes.
You need to also check DBA_2PC_NEIGHBORS and see whether there are any other
Nodes that reference same transactionID in their DBA_2PC_PENDING.
If you see it referenced, that means there are children and they need to be
resolved (either committed or rollback) as well. Please take 'children' i.e.
Nodes that other Nodes (not Global Coordinator) into account.
Note: for clarification on terminilogy please see · Note:13229.1
10. If there are no children, it is safe to manually interfere and either
commit force or rollback force. For global intergrity, use the same COMMIT#
(SCN) when you force the transaction. To force the transaction please see
below.
11. After transaction is forced, make sure that both DBA_2PC_PENDING and
DBA_2PC_NEIGHBORS data dictionary views are empty.
If for some reason it is necessary to clean those views, use packaged
procedure DBMS_TRANSACTION.purge_lost_db_entry.
See [NOTE:1012842.102] ORA-2019 ORA-2058 ORA-2068 ORA-2050: FAILED
DISTRIBUTED TRANSACTIONS
Note: Be very careful with that procedure as it is always more reliable and
more consistent to let RECO resolve distributed transaction conflict.
Please attempt to use DBMS_TRANSACTION.purge_lost_db_entry only as a last
resort after every other measure was implemented including restarting the
database to resolve locking conflict.
12. If even then there are entries in the views and errors still appear, please
check for bug 684157.
SYNTAX NOTES:
==============
COMMIT FORCE command
--------------------
Example of syntax when highest committed SCN is 88123887
(from dba_2pc_pending) and local transaction ID is 1.13.5197
(from either dba_2pc_pending or alert.log):
SVRMGR> COMMIT FORCE 'your local transactionID on this node',
'highest SCN from already committed site';
SVRMGR> COMMIT FORCE '1.13.5197', '88123887';
ROLLBACK FORCE command
----------------------
Example of syntax local transaction ID is 1.13.5197
from either dba_2pc_pending or alert.log):
SVRMGR> ROLLBACK FORCE 'your local transactionID on this node';
SVRMGR> ROLLBACK FORCE '1.13.5197';
PURGING VIEWS:
--------------
Example of syntax for lost local transaction ID 1.13.5197 (from either
dba_2pc_pending or alert.log):
SVRMGR> Execute
DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY ('1.13.5197');
NOTE: Run as SYS where 1.13.5197 is the local_tran_id
Example of syntax for mixed local transaction ID 1.13.5197
(from either dba_2pc_pending or alert.log):
SVRMGR> Execute
DBMS_TRANSACTION.PURGE_MIXED ('1.13.5197');
NOTE: Run as SYS
Run PURGE_MIXED Procedure only if significant reconfiguration has occurred so
that automatic recover (RECO) can not resolve transaction.
· Example of appropriate cases when it can be used:
· Total loss of the remote database, reconfiguration of software resulting in
loss of two-phase commit capability.
SCN RECOVERY STEPS
------------------omplete Recovery:
1. At the down site recover completely if possible (treat as regular recovery).
2. SCN will appear in alert.log after recover is done on the crashed Node.
Incomplete Recovery:
1. If time-based or cancel-based recovery was used on crashed node, other
sites must be placed back to the same point in time for global consistency.
Get last SCN from crashed node alert.log.
2. At each node shut down normal or immediate.
3. Perform a cold backup.
4. Restore control file if necessary.
5. Restore last backup of all datafiles along with archived redo logs.
6. Choose which tool to use to perform SCN recovery - either Server Manager
line mode or RMAN.
7. Connect internal, startup mount, select * from v$datafile to make sure all
datafiles are online. Alter database datafile '?/?/?' online; is several
datafiles are offline.
8. Issue the following command using latest SCN from alert.log on the Node
that had to be recovered:
RECOVER DATABASE UNTIL CHANGE '1.13.5197';
NOTE: If for some reason (e.g.when issuing commit force command) automatic
recovery (RECO process) needs to be disabled, use the following command:
ALTER SYSTEM DISABLE DISTRIBUTED RECOVERY;
(will put RECO to sleep).
Make sure to wake RECO up after that:
ALTER SYSTEM ENABLE DISTRIBUTED RECOVERY;
-
I have tried all the above, but I guess where I deleted the file from the dba_2pc_pending table there svrmgr does see any pending jobs. I have also restarted the entire unix box, and then when I go to do a dbstart, it starts the instances and says 'instance name' warm start. Would a differrent kind of start work, like a cold start if so how do you do this. Every time I run the commands you told me it says did not find any pending job with the local tan id that I speciifed, there is nothing in my dba_2pc_pending table, thus it can not force a commit on somehting that it can't see.
What do I do?
I am starting to get worried now!
THanks
Scott