Page 1 of 1

3PAR 7200 replacement disk marked as Slow Drive and fails

Posted: Thu Feb 17, 2022 3:16 am
by sivah
Hi all,

I have this weird issue with 3PAR 7200. I have a failed disk with specs 900GB FC 10K 6G Encrypted HDD.

Each time I replace it, servicemag resume will succeed.
However after a couple of hours, the disk will fail again. Also showing that servicemag start succeeds. I have tried 3 disks already, each with different DOM (2013, 2014, 2015) and it is still the same.

I then further dig into the logs. Each replacement that I have, I noticed that after servicemag completes, the replacement disk is always marked as a candidate for check_slow_disk task.

The IOPS for the replaced disk is between the range of 105 to 135. While the ideal should be 140 for a 10K HDD.

This is the last extract of the check_slow_disk before failing, for the 4th time.

2022-02-05 20:07:01 +08 Updated Executing "check_slow_disk" as 0:29843
2022-02-05 20:07:01 +08 Updated RPM 100 -> Good IOPS 2000
2022-02-05 20:07:01 +08 Updated RPM 10 -> Good IOPS 140
2022-02-05 20:07:01 +08 Updated RPM 150 -> Good IOPS 2000
2022-02-05 20:07:01 +08 Updated RPM 15 -> Good IOPS 180
2022-02-05 20:07:01 +08 Updated RPM 7 -> Good IOPS 60
2022-02-05 20:07:01 +08 Updated Running at interval 840 for 3360 seconds
2022-02-05 20:21:01 +08 Updated
2022-02-05 20:21:01 +08 Updated Starting next iteration
2022-02-05 20:21:01 +08 Updated
2022-02-05 20:21:01 +08 Updated Checking speed 7 drives
2022-02-05 20:21:01 +08 Updated Candidate:PDID: 27, adj_svct: 7.0, idle%: 99.7, iops: 0.5, kbps: 15.4, svct: 7.2
2022-02-05 20:21:01 +08 Updated Next:PDID: 19, adj_svct: 6.6, idle%: 99.8, iops: 0.4, kbps: 12.6, svct: 6.8
2022-02-05 20:21:01 +08 Updated Checking speed 10 drives
2022-02-05 20:21:01 +08 Updated Candidate:PDID: 64, adj_svct: 59.4, idle%: 7.6, iops: 109.7, kbps: 3027.2, svct: 98.3
2022-02-05 20:21:01 +08 Updated Next:PDID: 11, adj_svct: 15.3, idle%: 19.6, iops: 122.1, kbps: 3355.2, svct: 58.6
2022-02-05 20:35:01 +08 Updated
2022-02-05 20:35:01 +08 Updated Starting next iteration
2022-02-05 20:35:01 +08 Updated
2022-02-05 20:35:01 +08 Updated Checking speed 7 drives
2022-02-05 20:35:01 +08 Updated Candidate:PDID: 26, adj_svct: 4.2, idle%: 99.7, iops: 0.8, kbps: 33.2, svct: 4.6
2022-02-05 20:35:01 +08 Updated Next:PDID: 19, adj_svct: 3.9, idle%: 99.9, iops: 0.3, kbps: 11.6, svct: 4.1
2022-02-05 20:35:01 +08 Updated Checking speed 10 drives
2022-02-05 20:35:01 +08 Updated Candidate:PDID: 64, adj_svct: 113.1, idle%: 1.8, iops: 129.2, kbps: 3842.5, svct: 159.6
2022-02-05 20:35:01 +08 Updated Next:PDID: 36, adj_svct: 45.9, idle%: 10.5, iops: 143.5, kbps: 3871.3, svct: 96.7
2022-02-05 20:49:02 +08 Updated
2022-02-05 20:49:02 +08 Updated Starting next iteration
2022-02-05 20:49:02 +08 Updated
2022-02-05 20:49:02 +08 Updated Checking speed 7 drives
2022-02-05 20:49:02 +08 Updated Candidate:PDID: 19, adj_svct: 4.9, idle%: 99.8, iops: 0.4, kbps: 13.5, svct: 5.1
2022-02-05 20:49:02 +08 Updated Next:PDID: 27, adj_svct: 4.1, idle%: 99.8, iops: 0.5, kbps: 17.3, svct: 4.4
2022-02-05 20:49:02 +08 Updated Checking speed 10 drives
2022-02-05 20:49:02 +08 Updated Candidate:PDID: 64, adj_svct: 96.6, idle%: 2.0, iops: 128.5, kbps: 3936.1, svct: 143.0
2022-02-05 20:49:02 +08 Updated Next:PDID: 36, adj_svct: 29.2, idle%: 11.7, iops: 136.4, kbps: 3825.2, svct: 77.8
2022-02-05 21:03:02 +08 Updated
2022-02-05 21:03:02 +08 Updated Starting next iteration
2022-02-05 21:03:02 +08 Updated
2022-02-05 21:03:02 +08 Updated Checking speed 7 drives
2022-02-05 21:03:02 +08 Updated Candidate:PDID: 29, adj_svct: 12.5, idle%: 99.2, iops: 1.6, kbps: 289.4, svct: 13.7
2022-02-05 21:03:02 +08 Updated Next:PDID: 21, adj_svct: 12.5, idle%: 99.2, iops: 1.5, kbps: 282.6, svct: 13.6
2022-02-05 21:03:02 +08 Updated Checking speed 10 drives
2022-02-05 21:03:02 +08 Updated Candidate:PDID: 64, adj_svct: 105.8, idle%: 1.6, iops: 130.9, kbps: 4184.9, svct: 153.4
2022-02-05 21:03:02 +08 Updated Next:PDID: 35, adj_svct: 22.6, idle%: 12.0, iops: 142.0, kbps: 4152.3, svct: 73.5
2022-02-05 21:03:02 +08 Updated
2022-02-05 21:03:02 +08 Updated FOUND SLOW DRIVE: PDID: 64, adj_svct: 105.8, idle%: 1.6, iops: 130.9, kbps: 4184.9, svct: 153.4
2022-02-05 21:03:02 +08 Updated Marking slow disk 64 failed
2022-02-05 21:03:02 +08 Updated Failed PDID 64
2022-02-05 21:03:02 +08 Updated
2022-02-05 21:03:02 +08 Completed.


The latest servicemag start

2022-02-06 00:30:36 +08 Updated Executing "sstart_pd_64" as 1:15777
2022-02-06 00:30:36 +08 Updated servicemag start -wait -pdid 64
2022-02-06 00:30:36 +08 Updated ... servicing disks in mag: 3 0
2022-02-06 00:30:36 +08 Updated ... normal disks:
2022-02-06 00:30:36 +08 Updated ... not normal disks: WWN [5000C5007F6EFABC] Id [64] diskpos [0]
2022-02-06 00:30:36 +08 Updated ... relocating chunklets to spare space...
2022-02-06 00:30:47 +08 Updated ... bypassing mag 3 0
2022-02-06 00:31:27 +08 Updated ... bypassed mag 3 0
2022-02-06 00:31:27 +08 Updated servicemag start -wait -pdid 64 -- Succeeded
2022-02-06 00:31:27 +08 Completed scheduled task.


I noticed that the replacement disk is a candidate for checking for 10 consecutive times then the system will mark it as Failed.

Has anyone experienced this same issue? Is there a way to not make the disk on the specific slot not to be slow?

Re: 3PAR 7200 replacement disk marked as Slow Drive and fail

Posted: Thu Feb 17, 2022 4:50 pm
by MammaGutt
Just asking, could the issue be the cage slot and not PDs? Are you seeing SAS errors or such on the slot?

From what I see, the drive has very high svct (service time or latency in plain english) which is probably why it is always a candidate.

Re: 3PAR 7200 replacement disk marked as Slow Drive and fail

Posted: Wed Mar 09, 2022 12:35 am
by sivah
Hi,

Just an update to this.
I have searched and found that HPE actually phased out the 900GB Encrypted HDDs that we are currently using and gave an advisory of using 1.2TB Encrypted HDDs instead

Advisory: (Revised) HPE 3PAR StoreServ 7000 Storage And HPE 3PAR StoreServ 10000 Storage - Transitioning From HCBRE, HCEP, And Certain SLTN HDD Spare Parts To Alternate Replacement HDD Spare Parts

https://support.hpe.com/hpesc/public/do ... 28695en_us

I finally ordered the 1.2TB disk instead which have a DOM of 2018 and now finally works after replacement for 5 days with no signs of being a "slow drive"

It seems those 900GB Encrypted HDDs we were using for replacement were just old and bad. Even though those parts were bought from multiple suppliers.

Re: 3PAR 7200 replacement disk marked as Slow Drive and fail

Posted: Wed Mar 09, 2022 3:12 am
by MammaGutt
I was told back in the days that 900 GB drives were discontinued as no vendor continued to make them when they released new series of drives.