3Par - Volumes going into a failed (Preserved) State
Posted: Sun May 16, 2021 12:54 pm
Hi All,
Fairly new to 3par so please forgive me if my issue is an easy one.
I have been asked to look at an old F400 (Sadly no Vendor support) that was having issues after a recent power failure. The array came back up, however shortly afterwards 14 vvols appeared in the failed state (preserved).
i logged on and noticed a few issues, not least were 4 failed disks and 3 in degraded, expired power supply batteries on all 4 nodes and a cache dimm reporting errors.
The failed disks, the batteries and the DIMM all errored before the power loss.
The 3 disks that are in a degraded state errored after the power failure
The 4 failed disks have been replaced with no issues.
The 3 degraded ones we tried to remove gracefully using the servicemag start -pdid however all 3 failed.
One of the failures was due to a chunklet mapped to a failed LD. Check the LD was not mapped to a VV (which it wasn't thankfully), deleted the LD and reran the servicemag operation. This time it worked for that one disk.
The failed disk was replaced and the disk rebuild started. After a couple of hours it failed again with chunklet move errors and said invalid device?
it was then left at this point to discuss options with the customer. However, in the meantime, 5 more volumes went into a failed state.
I have scoured google etc trying to look for similar issues but cant find any, so thought someone may know on here.
below are a few outputs
cli% showvv -failed
Id Name Prov Type CopyOf BsId Rd -Detailed_State- Adm Snp Usr VSize
12 CSV01_HUHYPERVCLUS_VR5 tpvv base --- 12 RW preserved 2816 0 1671296 4194304
13 CSV02_HUHYPERVCLUS_VR5 tpvv base --- 13 RW preserved 3072 0 2106624 4194304
14 CSV03_HUHYPERVCLUS_VR5 tpvv base --- 14 RW preserved 3072 0 2336000 4194304
21 CSV03_HYPERVDT_VR5 tpvv base --- 21 RW preserved 2816 0 1881600 4194304
15 CSV04_HUHYPERVCLUS_VR5 tpvv base --- 15 RW preserved 2816 0 1961088 4194304
25 CSV05_HUHYPERVCLUS_VR5 tpvv base --- 25 RW preserved 2816 0 2670976 4194304
26 CSV06_HUHYPERVCLUS_VR5 tpvv base --- 26 RW preserved 2560 0 1951104 4194304
28 CSV08_HUHYPERVCLUS_VR5 tpvv base --- 28 RW preserved 2560 0 1529216 4194304
62 CSV10_HUHYPERVCLUS_VR5 tpvv base --- 62 RW preserved 2304 0 1239552 4194304
63 CSV11_HUHYPERVCLUS_VR5 tpvv base --- 63 RW preserved 2304 0 2211328 4194304
31 PS02_VMCA_VR5_HU3PAR01 tpvv base --- 31 RW preserved 1280 0 1426688 1996800
32 PS03_VMCA_VR5_HU3PAR01 tpvv base --- 32 RW preserved 1536 0 1678976 1996800
33 PS04_VMCA_VR5_HU3PAR01 tpvv base --- 33 RW preserved 1536 0 1731584 1996800
35 PS06_VMCA_VR5_HU3PAR01 tpvv base --- 35 RW preserved 1280 0 1551360 1996800
36 PS07_VMCA_VR5_HU3PAR01 tpvv base --- 36 RW preserved 1536 0 1907968 1996800
37 PS08_VMCA_VR5_HU3PAR01 tpvv base --- 37 RW preserved 1280 0 1664000 1996800
40 PS11_VMCA_VR5_HU3PAR01 tpvv base --- 40 RW preserved 768 0 651264 1996800
42 PS13_VMCA_VR5_HU3PAR01 tpvv base --- 42 RW preserved 768 0 706560 1996800
47 PS18_VMCA_VR5_HU3PAR01 tpvv base --- 47 RW preserved 1536 0 1897472 1996800
-----------------------------------------------------------------------------------------------
19 total 38656 0 32774656 59914240
Time : 2021-05-16 10:57:29 BST
Severity : Minor
Type : LD rset I/O error
Message : Ldsk 170 RAID set number 80 new state rs_pinned
Time : 2021-05-16 10:57:29 BST
Severity : Major
Type : Component state change
Message : Virtual Volume 13(CSV02_HUHYPERVCLUS_VR5) Failed (Preserved {0x4})
Time : 2021-05-16 10:57:29 BST
Severity : Major
Type : Component state change
Message : Virtual Volume 40(PS11_VMCA_VR5_HU3PAR01) Failed (Preserved {0x4})
Time : 2021-05-16 10:57:29 BST
Severity : Major
Type : Component state change
Message : Virtual Volume 47(PS18_VMCA_VR5_HU3PAR01) Failed (Preserved {0x4})
Time : 2021-05-16 10:57:29 BST
Severity : Major
Type : Component state change
Message : Virtual Volume 42(PS13_VMCA_VR5_HU3PAR01) Failed (Preserved {0x4})
Time : 2021-05-16 10:57:29 BST
Severity : Major
Type : Component state change
Message : Virtual Volume 33(PS04_VMCA_VR5_HU3PAR01) Failed (Preserved {0x4})
cli% servicemag status -d
Cage 2, magazine 7:
A servicemag start command failed on this magazine.
The command completed at Sun May 16 02:40:13 2021.
The output of the servicemag start was:
servicemag start -pdid 33
... servicing disks in mag: 2 7
... normal disks:
... not normal disks: WWN [5000CCA0298340F8] Id [33] diskpos [0]
... relocating chunklets to spare space...
... chunklet 33:120 - move_error,src_set_invalid, will not move
... chunklet 33:809 - move_error,src_set_invalid, will not move
... chunklet 33:1148 - move_error,src_set_invalid, will not move
... chunklet 33:1297 - move_error,src_set_invalid, will not move
... chunklet 33:1775 - move_error,src_set_invalid, will not move
... chunklet 33:2079 - move_error,src_set_invalid, will not move
... chunklet 33:2089 - move_error,src_set_invalid, will not move
servicemag start -pdid 33 -- Failed
Cage 7, magazine 7:
A servicemag start command failed on this magazine.
The command completed at Sun May 16 02:40:49 2021.
The output of the servicemag start was:
servicemag start -pdid 101
... servicing disks in mag: 7 7
... normal disks:
... not normal disks: WWN [5000CCA029839D50] Id [101] diskpos [0]
... relocating chunklets to spare space...
... chunklet 101:99 - move_error,src_set_invalid, will not move
... chunklet 101:714 - move_error,src_set_invalid, will not move
... chunklet 101:1018 - move_error,src_set_invalid, will not move
... chunklet 101:1146 - move_error,src_set_invalid, will not move
... chunklet 101:1571 - move_error,src_set_invalid, will not move
... chunklet 101:1816 - move_error,src_set_invalid, will not move
... chunklet 101:1824 - move_error,src_set_invalid, will not move
servicemag start -pdid 101 -- Failed
Cage 9, magazine 6:
A servicemag resume command failed on this magazine.
The command completed at Sun May 16 08:23:16 2021.
The output of the servicemag resume was:
servicemag resume 9 6
... onlooping mag 9 6
... firmware is current on pd WWN [5000CCA02982EE50] Id [126]
... checking for valid disks...
... checking for valid disks...
... disks not normal yet..trying admit/onloop again
... onlooping mag 9 6
... checking for valid disks...
... checking for valid disks...
... disks in mag : 9 6
... normal disks: WWN [5000CCA029B07110] Id [23] diskpos [0]
... not normal disks: WWN [5000CCA02982EE50] Id [126]
... verifying spare space for disks 126 and 23
... playback chunklets from pd WWN [5000CCA029B07110] Id [23]
... All chunklets played back / relocated.
... cleared logging mode for cage 9 mag 6
... relocating chunklets from spare space
... chunklet 26:917 - move_error,move_failed, failed move
... chunklet 26:918 - move_error,move_failed, failed move
... chunklet 26:919 - move_error,move_failed, failed move
... chunklet 26:920 - move_error,move_failed, failed move
... chunklet 26:921 - move_error,move_failed, failed move
... chunklet 26:922 - move_error,move_failed, failed move
... chunklet 26:923 - move_error,move_failed, failed move
... chunklet 26:924 - move_error,move_failed, failed move
... chunklet 26:925 - move_error,move_failed, failed move
... chunklet 26:926 - move_error,move_failed, failed move
... chunklet 26:927 - move_error,move_failed, failed move
... chunklet 26:928 - move_error,move_failed, failed move
... chunklet 26:929 - move_error,move_failed, failed move
... chunklet 26:930 - move_error,move_failed, failed move
... chunklet 26:931 - move_error,move_failed, failed move
Failed --
PD 23 is not valid
servicemag resume 9 6 -- Failed
<i deleted a lot of the move errors above so it fit on the post>
Any help appreciated
Fairly new to 3par so please forgive me if my issue is an easy one.
I have been asked to look at an old F400 (Sadly no Vendor support) that was having issues after a recent power failure. The array came back up, however shortly afterwards 14 vvols appeared in the failed state (preserved).
i logged on and noticed a few issues, not least were 4 failed disks and 3 in degraded, expired power supply batteries on all 4 nodes and a cache dimm reporting errors.
The failed disks, the batteries and the DIMM all errored before the power loss.
The 3 disks that are in a degraded state errored after the power failure
The 4 failed disks have been replaced with no issues.
The 3 degraded ones we tried to remove gracefully using the servicemag start -pdid however all 3 failed.
One of the failures was due to a chunklet mapped to a failed LD. Check the LD was not mapped to a VV (which it wasn't thankfully), deleted the LD and reran the servicemag operation. This time it worked for that one disk.
The failed disk was replaced and the disk rebuild started. After a couple of hours it failed again with chunklet move errors and said invalid device?
it was then left at this point to discuss options with the customer. However, in the meantime, 5 more volumes went into a failed state.
I have scoured google etc trying to look for similar issues but cant find any, so thought someone may know on here.
below are a few outputs
cli% showvv -failed
Id Name Prov Type CopyOf BsId Rd -Detailed_State- Adm Snp Usr VSize
12 CSV01_HUHYPERVCLUS_VR5 tpvv base --- 12 RW preserved 2816 0 1671296 4194304
13 CSV02_HUHYPERVCLUS_VR5 tpvv base --- 13 RW preserved 3072 0 2106624 4194304
14 CSV03_HUHYPERVCLUS_VR5 tpvv base --- 14 RW preserved 3072 0 2336000 4194304
21 CSV03_HYPERVDT_VR5 tpvv base --- 21 RW preserved 2816 0 1881600 4194304
15 CSV04_HUHYPERVCLUS_VR5 tpvv base --- 15 RW preserved 2816 0 1961088 4194304
25 CSV05_HUHYPERVCLUS_VR5 tpvv base --- 25 RW preserved 2816 0 2670976 4194304
26 CSV06_HUHYPERVCLUS_VR5 tpvv base --- 26 RW preserved 2560 0 1951104 4194304
28 CSV08_HUHYPERVCLUS_VR5 tpvv base --- 28 RW preserved 2560 0 1529216 4194304
62 CSV10_HUHYPERVCLUS_VR5 tpvv base --- 62 RW preserved 2304 0 1239552 4194304
63 CSV11_HUHYPERVCLUS_VR5 tpvv base --- 63 RW preserved 2304 0 2211328 4194304
31 PS02_VMCA_VR5_HU3PAR01 tpvv base --- 31 RW preserved 1280 0 1426688 1996800
32 PS03_VMCA_VR5_HU3PAR01 tpvv base --- 32 RW preserved 1536 0 1678976 1996800
33 PS04_VMCA_VR5_HU3PAR01 tpvv base --- 33 RW preserved 1536 0 1731584 1996800
35 PS06_VMCA_VR5_HU3PAR01 tpvv base --- 35 RW preserved 1280 0 1551360 1996800
36 PS07_VMCA_VR5_HU3PAR01 tpvv base --- 36 RW preserved 1536 0 1907968 1996800
37 PS08_VMCA_VR5_HU3PAR01 tpvv base --- 37 RW preserved 1280 0 1664000 1996800
40 PS11_VMCA_VR5_HU3PAR01 tpvv base --- 40 RW preserved 768 0 651264 1996800
42 PS13_VMCA_VR5_HU3PAR01 tpvv base --- 42 RW preserved 768 0 706560 1996800
47 PS18_VMCA_VR5_HU3PAR01 tpvv base --- 47 RW preserved 1536 0 1897472 1996800
-----------------------------------------------------------------------------------------------
19 total 38656 0 32774656 59914240
Time : 2021-05-16 10:57:29 BST
Severity : Minor
Type : LD rset I/O error
Message : Ldsk 170 RAID set number 80 new state rs_pinned
Time : 2021-05-16 10:57:29 BST
Severity : Major
Type : Component state change
Message : Virtual Volume 13(CSV02_HUHYPERVCLUS_VR5) Failed (Preserved {0x4})
Time : 2021-05-16 10:57:29 BST
Severity : Major
Type : Component state change
Message : Virtual Volume 40(PS11_VMCA_VR5_HU3PAR01) Failed (Preserved {0x4})
Time : 2021-05-16 10:57:29 BST
Severity : Major
Type : Component state change
Message : Virtual Volume 47(PS18_VMCA_VR5_HU3PAR01) Failed (Preserved {0x4})
Time : 2021-05-16 10:57:29 BST
Severity : Major
Type : Component state change
Message : Virtual Volume 42(PS13_VMCA_VR5_HU3PAR01) Failed (Preserved {0x4})
Time : 2021-05-16 10:57:29 BST
Severity : Major
Type : Component state change
Message : Virtual Volume 33(PS04_VMCA_VR5_HU3PAR01) Failed (Preserved {0x4})
cli% servicemag status -d
Cage 2, magazine 7:
A servicemag start command failed on this magazine.
The command completed at Sun May 16 02:40:13 2021.
The output of the servicemag start was:
servicemag start -pdid 33
... servicing disks in mag: 2 7
... normal disks:
... not normal disks: WWN [5000CCA0298340F8] Id [33] diskpos [0]
... relocating chunklets to spare space...
... chunklet 33:120 - move_error,src_set_invalid, will not move
... chunklet 33:809 - move_error,src_set_invalid, will not move
... chunklet 33:1148 - move_error,src_set_invalid, will not move
... chunklet 33:1297 - move_error,src_set_invalid, will not move
... chunklet 33:1775 - move_error,src_set_invalid, will not move
... chunklet 33:2079 - move_error,src_set_invalid, will not move
... chunklet 33:2089 - move_error,src_set_invalid, will not move
servicemag start -pdid 33 -- Failed
Cage 7, magazine 7:
A servicemag start command failed on this magazine.
The command completed at Sun May 16 02:40:49 2021.
The output of the servicemag start was:
servicemag start -pdid 101
... servicing disks in mag: 7 7
... normal disks:
... not normal disks: WWN [5000CCA029839D50] Id [101] diskpos [0]
... relocating chunklets to spare space...
... chunklet 101:99 - move_error,src_set_invalid, will not move
... chunklet 101:714 - move_error,src_set_invalid, will not move
... chunklet 101:1018 - move_error,src_set_invalid, will not move
... chunklet 101:1146 - move_error,src_set_invalid, will not move
... chunklet 101:1571 - move_error,src_set_invalid, will not move
... chunklet 101:1816 - move_error,src_set_invalid, will not move
... chunklet 101:1824 - move_error,src_set_invalid, will not move
servicemag start -pdid 101 -- Failed
Cage 9, magazine 6:
A servicemag resume command failed on this magazine.
The command completed at Sun May 16 08:23:16 2021.
The output of the servicemag resume was:
servicemag resume 9 6
... onlooping mag 9 6
... firmware is current on pd WWN [5000CCA02982EE50] Id [126]
... checking for valid disks...
... checking for valid disks...
... disks not normal yet..trying admit/onloop again
... onlooping mag 9 6
... checking for valid disks...
... checking for valid disks...
... disks in mag : 9 6
... normal disks: WWN [5000CCA029B07110] Id [23] diskpos [0]
... not normal disks: WWN [5000CCA02982EE50] Id [126]
... verifying spare space for disks 126 and 23
... playback chunklets from pd WWN [5000CCA029B07110] Id [23]
... All chunklets played back / relocated.
... cleared logging mode for cage 9 mag 6
... relocating chunklets from spare space
... chunklet 26:917 - move_error,move_failed, failed move
... chunklet 26:918 - move_error,move_failed, failed move
... chunklet 26:919 - move_error,move_failed, failed move
... chunklet 26:920 - move_error,move_failed, failed move
... chunklet 26:921 - move_error,move_failed, failed move
... chunklet 26:922 - move_error,move_failed, failed move
... chunklet 26:923 - move_error,move_failed, failed move
... chunklet 26:924 - move_error,move_failed, failed move
... chunklet 26:925 - move_error,move_failed, failed move
... chunklet 26:926 - move_error,move_failed, failed move
... chunklet 26:927 - move_error,move_failed, failed move
... chunklet 26:928 - move_error,move_failed, failed move
... chunklet 26:929 - move_error,move_failed, failed move
... chunklet 26:930 - move_error,move_failed, failed move
... chunklet 26:931 - move_error,move_failed, failed move
Failed --
PD 23 is not valid
servicemag resume 9 6 -- Failed
<i deleted a lot of the move errors above so it fit on the post>
Any help appreciated