Page 1 of 1

3Par Disks errors

Posted: Tue Aug 30, 2022 3:31 am
by modi
Hi All

I have 3Par 7200 ( expired support and license), some disks have been changed by removing the faulty ones and replacing the new ones without following the correct way to do, which generated a lot of failed disks on the 3Par Consol.

Can anyone help me to remove the errors, I’m really new to this and I’m not very confident with the commands that I saw on the net ( forums, HP 3par documentations).
I’ll be very thankful if anyone can help me with one fail chunklet remove and I’ll do the rest.
Regards,

Re: 3Par Disks errors

Posted: Tue Aug 30, 2022 5:34 am
by MammaGutt
Not sure where to start .... As long as the system isn't running one of the very first OS versions, servicemag should auto-trigger upon disk failure. And as long as the drives were replaced, everything "should" have been good.....

Not sure where to start here, but ....
"checkhealth -svc detail"
"servicemag status"
"showpd -c"
"showpd -s"

would probably give a better picture on where you are....

Re: 3Par Disks errors

Posted: Tue Aug 30, 2022 7:08 am
by modi
Hi MammaGutt,

Thanks for your reply.

cli% checkhealth -svc detail
checkhealth : Permission denied

''it is previllages issue ''

the other commands in the attachment.

Re: 3Par Disks errors

Posted: Tue Aug 30, 2022 9:16 am
by MammaGutt
My bad.... "checkhealth -svc -detail"

And yeah ... there is no quick fix.... You would have to manually empty the "ghost" PDs which are the ones you've simply pulled without a working servicemag process having completed....

You're also pretty much in a bad situation with FC drives being full and probably using NL drives in the FC CPG....

It also looks like you've ran out of disk licenses as the old ones haven't been removed correctly...

0:2:0
0:20:0
0:22:0
1:1:0
1:10:0
1:11:0
1:14:0
1:18:0
1:19:0
1:22:0
1:23:0
2:8:0

are slots with "ghost PDs"... They have a total of ~7TB of data "attached" to them... most are NL which probably will take 2-3 minutes per chunklet (GB) to move/fix....

So you probably have about 350 hours or just over 14 days of consistent TLC to clean up this mess.....

And after everything is done, you probably need to do tunesys, fo cleanup the NL usage in FC CPG ... assuming that you actually have enough capacity in FC tier.


That means doing

"showpdch -mov"
With that list, you'll do:
"movech -perm -ovrd <PDID>:<ChkID>" for all lines in that output (which just be just above 7000 lines).

When you've completed that in a month or two, you should be able to do:
"dismisspd <PDID>" for PD 2,13,32,44,46,52,53,54,56,57,58 and 59.

Once that is done, you should do another recap on the status of the system to prepare a plan to clean up the rest :)

Re: 3Par Disks errors

Posted: Tue Aug 30, 2022 10:42 am
by MammaGutt
And if you haven't done it in a while and you've deleted stuff ... Run a compact cpg first. That might save you some work.

Re: 3Par Disks errors

Posted: Tue Aug 30, 2022 1:09 pm
by modi
Hello MammaGutt,


You're also pretty much in a bad situation with FC drives being full and probably using NL drives in the FC CPG.... you are totally right, and that confirms the slowness on some VM's,

It also looks like you've run out of disk licenses as the old ones haven't been removed correctly... Yes, appears on the Management Console


And after everything is done, you probably need to do tunesys, to clean up the NL usage in FC CPG ... assuming that you actually have enough capacity in FC tier. do you think that i need to unmount some datastor to make more FC space?

My point of view is to remove "ghost PDs" as a first step, correct me if I'm wrong.

is it possible that removing chunklets one by one, i just know from you that each chunklet takes 3mnts.

Your Advices are well appreaciated.

Regards

Re: 3Par Disks errors

Posted: Wed Aug 31, 2022 12:16 am
by MammaGutt
Let me re-phrase :)

If you have anything that you are planning on deleting, or if you have another storage system you can temporarily store some data on, you should do that first as that would help. Any data you can move out of the system is data you don't have to shuffle around to fix your problem.

If you're running Vmware and not using VMFS6 (with thin VMs) with reclamation/unmap on, you should do an unmap on every datastore after doing a clean-up.

https://kb.vmware.com/s/article/2057513 (basically "esxcli storage vmfs unmap -l <casesensitive name of datastore>")

You should look at the output from showvv -s after unmap is done, to see if something can be released. You will see that in the difference between Usr Used and Usr Rsvd columns. If you need help, share the output here after cleanup and unmap.

Once that is done, you should do a compact cpg on all CPGs to free up and free space in the CPGs to the system wide pool.


You probably also need to do "setpd ldalloc on 73", "setpd ldalloc on 76" and "setpd ldalloc on 77" .. Not sure if license issues will prevent you from this, but maybe not.


After this, you've hopefully reduced the amount of data you need to move ... and should proceed to move (not remove) chunklets one by one, to remove the reference of the removed disks.

Re: 3Par Disks errors

Posted: Thu Sep 01, 2022 1:37 am
by modi
Hello MammaGutt,

Attached the output.

Cross fingers ;)

Thanks

Re: 3Par Disks errors

Posted: Thu Sep 01, 2022 2:05 am
by MammaGutt
modi wrote:Hello MammaGutt,

Attached the output.

Cross fingers ;)

Thanks


Ouch....

You have a few more problems here.....

What 3PAR OS version are you running (showversion in CLI) ?

If seems like your write cache is degraded due to problems with reading power supply and battery status... That will make a big impact on performance.
The system has found drives that have a performance degrade, but can't remove them from the storage system as you have too many failed/degraded disks in the system.
It has done a chunk of data that might be corrupted (but unknown if there should be any actual data there)
A few tasks have failed due to a node being down at a point in time...
Seems like there are been some updates not being installed correctly(or completed)
There seems to be temperature issues with at least one disk.
You have used NL capacity to write data to what is assumed to be FC.