-
ASM diskgroup problems
O/S: RHEL 3
DB: 10.2.0.3
Hi all,
Our SAN storage failed with its zoning whilst the db was running. This has caused ASM to go into a panic. This is how the ASM storage looked before this problem occurred (the asm dg is setup as normal redundancy):
Code:
GROUP_NUMBER DISK_NUMBER COMPOUND_INDEX INCARNATION MOUNT_S HEADER_STATU MODE_ST STATE REDUNDA LIBRARY TOTAL_MB FREE_MB NAME FAILGROUP LABEL PATH UDID PRODUCT CREATE_DA MOUNT_DAT REPAIR_TIMER READS WRITES READ_ERRS WRITE_ERRS READ_TIME WRITE_TIME BYTES_READ BYTES_WRITTEN
1 0 16777216 4042741803 CACHED MEMBER ONLINE NORMAL UNKNOWN System 1048575 23255 DGGROUP_0000 PRIMARY /dev/raw/raw1 30-NOV-07 28-OCT-08 0 36804 46018 0 0 786.1 582.06 6755644416 6763955200
1 1 16777217 4042741804 CACHED MEMBER ONLINE NORMAL UNKNOWN System 1048575 23370 DGGROUP_0001 PRIMARY /dev/raw/raw2 30-NOV-07 28-OCT-08 0 44119 42268 0 0 867.22 603.79 7411873280 5409548288
1 2 16777218 4042741805 CACHED MEMBER ONLINE NORMAL UNKNOWN System 1048575 23271 DGGROUP_0002 PRIMARY /dev/raw/raw3 30-NOV-07 28-OCT-08 0 32283 45450 0 0 869.36 649.66 7197868032 6651507712
1 3 16777219 4042741809 CACHED MEMBER ONLINE NORMAL UNKNOWN System 1048575 23398 DGGROUP_0003 PRIMARY /dev/raw/raw7 25-FEB-08 28-OCT-08 0 34146 45752 0 0 711.88 687.74 9106022400 6570381824
1 4 16777220 4042741810 CACHED MEMBER ONLINE NORMAL UNKNOWN System 1048575 23272 DGGROUP_0004 FAILURE /dev/raw/raw8 25-FEB-08 28-OCT-08 0 50672 42625 0 0 420.74 627.37 6999826432 5928142336
1 5 16777221 4042741813 CACHED MEMBER ONLINE NORMAL UNKNOWN System 1126399 21329 DGGROUP_0005 PRIMARY /dev/raw/raw11 26-APR-08 28-OCT-08 0 43892 54946 0 0 815.75 880.13 8084622848 8770541056
1 6 16777222 4042741806 CACHED MEMBER ONLINE NORMAL UNKNOWN System 1048575 23342 DGGROUP_0006 FAILURE /dev/raw/raw4 04-JAN-08 28-OCT-08 0 31169 45148 0 0 856.28 703.76 6300344320 5618552832
1 7 16777223 4042741807 CACHED MEMBER ONLINE NORMAL UNKNOWN System 1048575 23309 DGGROUP_0007 FAILURE /dev/raw/raw5 04-JAN-08 28-OCT-08 0 37962 47718 0 0 1243.8 784.62 8271555072 9542723584
1 8 16777224 4042741808 CACHED MEMBER ONLINE NORMAL UNKNOWN System 1048575 23362 DGGROUP_0008 FAILURE /dev/raw/raw6 04-JAN-08 28-OCT-08 0 35696 47765 0 0 1063.55 810.58 7211213312 8064837120
1 9 16777225 4042741814 CACHED MEMBER ONLINE NORMAL UNKNOWN System 1126399 21366 DGGROUP_0009 PRIMARY /dev/raw/raw12 26-APR-08 28-OCT-08 0 42290 52654 0 0 695.94 2057.64 8111098880 8949030912
1 10 16777226 4042741811 CACHED MEMBER ONLINE NORMAL UNKNOWN System 1126399 21371 DGGROUP_0010 FAILURE /dev/raw/raw9 26-APR-08 28-OCT-08 0 57631 53387 0 0 973.89 947.48 8225914880 7936991232
1 11 16777227 4042741812 CACHED MEMBER ONLINE NORMAL UNKNOWN System 1126399 21308 DGGROUP_0011 FAILURE /dev/raw/raw10 26-APR-08 28-OCT-08 0 45823 52156 0 0 926.72 2098.44 7714097152 7713911808
1 12 16777228 4042741815 CACHED MEMBER ONLINE NORMAL UNKNOWN System 1048575 1034652 DGGROUP_0012 DGGROUP_0012 /dev/raw/raw13 28-OCT-08 28-OCT-08 0 2700 48486 0 0 159.19 469.68 918773760 1.0007E+10
1 13 16777229 4042741816 CACHED MEMBER ONLINE NORMAL UNKNOWN System 1048575 1034661 DGGROUP_0013 DGGROUP_0013 /dev/raw/raw14 28-OCT-08 28-OCT-08 0 2629 45224 0 0 72.31 1566.82 782823424 9782869504
14 rows selected.
SQL>
The above was taken whilst a rebalance was going on as we had added disks.
This is how the ASM diskgroup looks now:
Code:
GROUP_NUMBER DISK_NUMBER HEADER_STATU MOUNT_S PATH FAILGROUP
------------ ----------- ------------ ------- -------------- --------------------
0 0 MEMBER CLOSED /dev/raw/raw3
0 1 MEMBER CLOSED /dev/raw/raw7
0 2 MEMBER CLOSED /dev/raw/raw11
1 0 MEMBER CACHED /dev/raw/raw1 PRIMARY
1 1 MEMBER CACHED /dev/raw/raw2 PRIMARY
1 2 CANDIDATE MISSING should be raw3
1 6 MEMBER CACHED /dev/raw/raw4 FAILURE
1 7 MEMBER CACHED /dev/raw/raw5 FAILURE
1 8 MEMBER CACHED /dev/raw/raw6 FAILURE
1 3 CANDIDATE MISSING should be raw7
1 4 MEMBER CACHED /dev/raw/raw8 FAILURE
1 10 MEMBER CACHED /dev/raw/raw9 FAILURE
1 11 MEMBER CACHED /dev/raw/raw10 FAILURE
1 5 CANDIDATE MISSING should be raw11
1 9 MEMBER CACHED /dev/raw/raw12 PRIMARY
1 12 MEMBER CACHED /dev/raw/raw13 DGGROUP_0012
1 13 MEMBER CACHED /dev/raw/raw14 DGGROUP_0013
As can me seen raw3,7 and 11 have gone missing. Can anyone shed any light into how we get these disks back into the diskgroup. We tried:
1/ A repair, this succedded but didnt touch the offline disks.
2/ dropping the disks but this resulted in an error as well altough it started a rebalance which we stopped as we dont have enough space.
Please can anyone let us know how we can get these disks back?
Thanks in advance,
Chucks
-
Hi
Whats the ownership and permissions on
Code:
/dev/raw/raw3
/dev/raw/raw7
/dev/raw/raw11
anything in the alert log of ASM instance regarding this.
you can also use the kfed utility to find out which diskgroup the disks belong to
make -f ins_rdbms.mk ikfed
kfed read devicename
ASM should be able to tell you which diskgroups the disks belonged to
http://askdba.org/weblog/?p=104
regards
Hrishy
Last edited by hrishy; 11-06-2008 at 05:30 AM.
-
Hi Hrishy,
Thanks for that
We checked the ownerships and thy are fine.
We used od -c to check the devices and the output shows that those missing disks belong to the diskgroup, yet they are not part of the dg in asm. We have raised this with oracle. let's see what they say!
-
Hi Chucks
What is od -c command ?
Please when you get a response from Oracle post the solution here.
did you try to run the kfed utility ?
curious to know if the rebalancing still going on from the previous operation before the SAN failed ?
regards
Hrishy
Last edited by hrishy; 11-06-2008 at 08:20 AM.
-
Hi Hrishy,
od -c is similar to kfed but i feel gives more info:
Code:
$ od -c /dev/raw/raw3 | head
0000000 001 202 001 001 \0 \0 \0 \0 002 \0 \0 200 323 g ) 215
0000020 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000040 O R C L D I S K \0 \0 \0 \0 \0 \0 \0 \0
0000060 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000100 \0 \0 020 \n 002 \0 002 003 D G G R O U P _
0000120 0 0 0 2 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000140 \0 \0 \0 \0 \0 \0 \0 \0 D G G R O U P \0
0000160 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000200 \0 \0 \0 \0 \0 \0 \0 \0 P R I M A R Y \0
0000220 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
As can be seen above, the disk is being recognised as an oracle disk and part of the DGGROUP diskgroup in the PRIMARY failuere failgroup.
The previous rebalance finished on the first of Nov. As this dg is normal redundancy i was wonder if oracle would use its mirrors (failure failgroup) if it cant find the primary failgroup?
Thanks,
Chucks
-
Hi Chucks
I beg to differ od is a unix command and just give you a octal dump of a raw device.
kfed is a C program by oracle to read a ASM disk and show its contents see the attached link above for a sample output.
Yes in this case since the primary group is not available it will start reading from the failgroup.
regards
Hrishy
-
Hi Hrishy,
We have sorted the problem by adding the disks using the force clause:
Code:
ALTER DISKGROUP DGGROUP ADD FAILGROUP PRIMARY DISK
'/dev/raw/raw3' FORCE,'/dev/raw/raw7' FORCE,
'/dev/raw/raw11' FORCE;
There is a rebalance going on now but we have our system back now.
od -c - sorry yes i know its a unix cmd, i found this before i knew about kfed.
Thanks again,
Chucks
-
Hi Chucks
You might want to investigate ASM 11g.
I for sure know that it has fast resilvering option especiually for the case that you outlined above.Wherein ASM would start noting all the changes that happened after the disks went missing and would apply those changes back once the disks are discovered and there wouldnt be a rebalance
regards
Hrishy
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|