3Par - Volumes going into a failed (Preserved) State

njdgomez23 · Post by **njdgomez23** » Sun May 16, 2021 12:54 pm

Hi All,

Fairly new to 3par so please forgive me if my issue is an easy one.

I have been asked to look at an old F400 (Sadly no Vendor support) that was having issues after a recent power failure. The array came back up, however shortly afterwards 14 vvols appeared in the failed state (preserved).

i logged on and noticed a few issues, not least were 4 failed disks and 3 in degraded, expired power supply batteries on all 4 nodes and a cache dimm reporting errors.

The failed disks, the batteries and the DIMM all errored before the power loss.

The 3 disks that are in a degraded state errored after the power failure

The 4 failed disks have been replaced with no issues.

The 3 degraded ones we tried to remove gracefully using the servicemag start -pdid however all 3 failed.

One of the failures was due to a chunklet mapped to a failed LD. Check the LD was not mapped to a VV (which it wasn't thankfully), deleted the LD and reran the servicemag operation. This time it worked for that one disk.

The failed disk was replaced and the disk rebuild started. After a couple of hours it failed again with chunklet move errors and said invalid device?

it was then left at this point to discuss options with the customer. However, in the meantime, 5 more volumes went into a failed state.

I have scoured google etc trying to look for similar issues but cant find any, so thought someone may know on here.

below are a few outputs

cli% showvv -failed
Id Name Prov Type CopyOf BsId Rd -Detailed_State- Adm Snp Usr VSize
12 CSV01_HUHYPERVCLUS_VR5 tpvv base --- 12 RW preserved 2816 0 1671296 4194304
13 CSV02_HUHYPERVCLUS_VR5 tpvv base --- 13 RW preserved 3072 0 2106624 4194304
14 CSV03_HUHYPERVCLUS_VR5 tpvv base --- 14 RW preserved 3072 0 2336000 4194304
21 CSV03_HYPERVDT_VR5 tpvv base --- 21 RW preserved 2816 0 1881600 4194304
15 CSV04_HUHYPERVCLUS_VR5 tpvv base --- 15 RW preserved 2816 0 1961088 4194304
25 CSV05_HUHYPERVCLUS_VR5 tpvv base --- 25 RW preserved 2816 0 2670976 4194304
26 CSV06_HUHYPERVCLUS_VR5 tpvv base --- 26 RW preserved 2560 0 1951104 4194304
28 CSV08_HUHYPERVCLUS_VR5 tpvv base --- 28 RW preserved 2560 0 1529216 4194304
62 CSV10_HUHYPERVCLUS_VR5 tpvv base --- 62 RW preserved 2304 0 1239552 4194304
63 CSV11_HUHYPERVCLUS_VR5 tpvv base --- 63 RW preserved 2304 0 2211328 4194304
31 PS02_VMCA_VR5_HU3PAR01 tpvv base --- 31 RW preserved 1280 0 1426688 1996800
32 PS03_VMCA_VR5_HU3PAR01 tpvv base --- 32 RW preserved 1536 0 1678976 1996800
33 PS04_VMCA_VR5_HU3PAR01 tpvv base --- 33 RW preserved 1536 0 1731584 1996800
35 PS06_VMCA_VR5_HU3PAR01 tpvv base --- 35 RW preserved 1280 0 1551360 1996800
36 PS07_VMCA_VR5_HU3PAR01 tpvv base --- 36 RW preserved 1536 0 1907968 1996800
37 PS08_VMCA_VR5_HU3PAR01 tpvv base --- 37 RW preserved 1280 0 1664000 1996800
40 PS11_VMCA_VR5_HU3PAR01 tpvv base --- 40 RW preserved 768 0 651264 1996800
42 PS13_VMCA_VR5_HU3PAR01 tpvv base --- 42 RW preserved 768 0 706560 1996800
47 PS18_VMCA_VR5_HU3PAR01 tpvv base --- 47 RW preserved 1536 0 1897472 1996800
-----------------------------------------------------------------------------------------------
19 total 38656 0 32774656 59914240

Time : 2021-05-16 10:57:29 BST
Severity : Minor
Type : LD rset I/O error
Message : Ldsk 170 RAID set number 80 new state rs_pinned

Time : 2021-05-16 10:57:29 BST
Severity : Major
Type : Component state change
Message : Virtual Volume 13(CSV02_HUHYPERVCLUS_VR5) Failed (Preserved {0x4})

Time : 2021-05-16 10:57:29 BST
Severity : Major
Type : Component state change
Message : Virtual Volume 40(PS11_VMCA_VR5_HU3PAR01) Failed (Preserved {0x4})

Time : 2021-05-16 10:57:29 BST
Severity : Major
Type : Component state change
Message : Virtual Volume 47(PS18_VMCA_VR5_HU3PAR01) Failed (Preserved {0x4})

Time : 2021-05-16 10:57:29 BST
Severity : Major
Type : Component state change
Message : Virtual Volume 42(PS13_VMCA_VR5_HU3PAR01) Failed (Preserved {0x4})

Time : 2021-05-16 10:57:29 BST
Severity : Major
Type : Component state change
Message : Virtual Volume 33(PS04_VMCA_VR5_HU3PAR01) Failed (Preserved {0x4})

cli% servicemag status -d
Cage 2, magazine 7:
A servicemag start command failed on this magazine.
The command completed at Sun May 16 02:40:13 2021.
The output of the servicemag start was:
servicemag start -pdid 33
... servicing disks in mag: 2 7
... normal disks:
... not normal disks: WWN [5000CCA0298340F8] Id [33] diskpos [0]
... relocating chunklets to spare space...
... chunklet 33:120 - move_error,src_set_invalid, will not move
... chunklet 33:809 - move_error,src_set_invalid, will not move
... chunklet 33:1148 - move_error,src_set_invalid, will not move
... chunklet 33:1297 - move_error,src_set_invalid, will not move
... chunklet 33:1775 - move_error,src_set_invalid, will not move
... chunklet 33:2079 - move_error,src_set_invalid, will not move
... chunklet 33:2089 - move_error,src_set_invalid, will not move
servicemag start -pdid 33 -- Failed

Cage 7, magazine 7:
A servicemag start command failed on this magazine.
The command completed at Sun May 16 02:40:49 2021.
The output of the servicemag start was:
servicemag start -pdid 101
... servicing disks in mag: 7 7
... normal disks:
... not normal disks: WWN [5000CCA029839D50] Id [101] diskpos [0]
... relocating chunklets to spare space...
... chunklet 101:99 - move_error,src_set_invalid, will not move
... chunklet 101:714 - move_error,src_set_invalid, will not move
... chunklet 101:1018 - move_error,src_set_invalid, will not move
... chunklet 101:1146 - move_error,src_set_invalid, will not move
... chunklet 101:1571 - move_error,src_set_invalid, will not move
... chunklet 101:1816 - move_error,src_set_invalid, will not move
... chunklet 101:1824 - move_error,src_set_invalid, will not move
servicemag start -pdid 101 -- Failed

Cage 9, magazine 6:
A servicemag resume command failed on this magazine.
The command completed at Sun May 16 08:23:16 2021.
The output of the servicemag resume was:
servicemag resume 9 6
... onlooping mag 9 6
... firmware is current on pd WWN [5000CCA02982EE50] Id [126]
... checking for valid disks...
... checking for valid disks...
... disks not normal yet..trying admit/onloop again
... onlooping mag 9 6
... checking for valid disks...
... checking for valid disks...
... disks in mag : 9 6
... normal disks: WWN [5000CCA029B07110] Id [23] diskpos [0]
... not normal disks: WWN [5000CCA02982EE50] Id [126]
... verifying spare space for disks 126 and 23
... playback chunklets from pd WWN [5000CCA029B07110] Id [23]
... All chunklets played back / relocated.
... cleared logging mode for cage 9 mag 6
... relocating chunklets from spare space
... chunklet 26:917 - move_error,move_failed, failed move
... chunklet 26:918 - move_error,move_failed, failed move
... chunklet 26:919 - move_error,move_failed, failed move
... chunklet 26:920 - move_error,move_failed, failed move
... chunklet 26:921 - move_error,move_failed, failed move
... chunklet 26:922 - move_error,move_failed, failed move
... chunklet 26:923 - move_error,move_failed, failed move
... chunklet 26:924 - move_error,move_failed, failed move
... chunklet 26:925 - move_error,move_failed, failed move
... chunklet 26:926 - move_error,move_failed, failed move
... chunklet 26:927 - move_error,move_failed, failed move
... chunklet 26:928 - move_error,move_failed, failed move
... chunklet 26:929 - move_error,move_failed, failed move
... chunklet 26:930 - move_error,move_failed, failed move
... chunklet 26:931 - move_error,move_failed, failed move

Failed --
PD 23 is not valid
servicemag resume 9 6 -- Failed

<i deleted a lot of the move errors above so it fit on the post>

Any help appreciated

MammaGutt · Post by **MammaGutt** » Mon May 17, 2021 1:16 am

I'm taking a stab that the 3 degraded drives all contained data for one or more LD which were a part of the VVs that are failed. The VVs go into preserved mode to protect the remaining data in case you can "unfail" the drive, you simply lost a lot of drive due to cage failure or something and get them back up or accept the data loss and zero out the missing data on the volume.

I'm guessing most of this is RAID5, so once you lose 2 drives on the same LD you're out of luck.

The failed drives shouldn't be an issue as those drives have most likely copied there data to spare chunklets or rebuilt to spare chunklets a long time ago. The degraded are your problem... If my memory serves me right they say degraded until it is safe to replace them even if the disk is dead.

For the servicemags that are failing I would try and move those chunklets manually.... for the rest.... this is where I pay the vendor the big bucks.

njdgomez23 · Post by **njdgomez23** » Mon May 17, 2021 2:45 am

Hi, thanks for the reply.

Yeh, 2 of the disks that failed in servicemag have 7 chunklets remaining on each that are in preserved LD's that are mapped to those vols.

As for Vendor support, exactly, always a must

n4ch0 · Post by **n4ch0** » Tue May 10, 2022 3:57 am

Hi njdgomez23,

Did you solve the situation, i get the same scenario and a lot of data are inaccessible! , i get one disk on degraded state and the other is normal, but for those chunklets i get move_error.

Thanks in advance!!

HPE Storage Users Group

3Par - Volumes going into a failed (Preserved) State

3Par - Volumes going into a failed (Preserved) State

Re: 3Par - Volumes going into a failed (Preserved) State

Re: 3Par - Volumes going into a failed (Preserved) State

Re: 3Par - Volumes going into a failed (Preserved) State