3PAR 7200 replacement disk marked as Slow Drive and fails
Posted: Thu Feb 17, 2022 3:16 am
Hi all,
I have this weird issue with 3PAR 7200. I have a failed disk with specs 900GB FC 10K 6G Encrypted HDD.
Each time I replace it, servicemag resume will succeed.
However after a couple of hours, the disk will fail again. Also showing that servicemag start succeeds. I have tried 3 disks already, each with different DOM (2013, 2014, 2015) and it is still the same.
I then further dig into the logs. Each replacement that I have, I noticed that after servicemag completes, the replacement disk is always marked as a candidate for check_slow_disk task.
The IOPS for the replaced disk is between the range of 105 to 135. While the ideal should be 140 for a 10K HDD.
This is the last extract of the check_slow_disk before failing, for the 4th time.
2022-02-05 20:07:01 +08 Updated Executing "check_slow_disk" as 0:29843
2022-02-05 20:07:01 +08 Updated RPM 100 -> Good IOPS 2000
2022-02-05 20:07:01 +08 Updated RPM 10 -> Good IOPS 140
2022-02-05 20:07:01 +08 Updated RPM 150 -> Good IOPS 2000
2022-02-05 20:07:01 +08 Updated RPM 15 -> Good IOPS 180
2022-02-05 20:07:01 +08 Updated RPM 7 -> Good IOPS 60
2022-02-05 20:07:01 +08 Updated Running at interval 840 for 3360 seconds
2022-02-05 20:21:01 +08 Updated
2022-02-05 20:21:01 +08 Updated Starting next iteration
2022-02-05 20:21:01 +08 Updated
2022-02-05 20:21:01 +08 Updated Checking speed 7 drives
2022-02-05 20:21:01 +08 Updated Candidate:PDID: 27, adj_svct: 7.0, idle%: 99.7, iops: 0.5, kbps: 15.4, svct: 7.2
2022-02-05 20:21:01 +08 Updated Next:PDID: 19, adj_svct: 6.6, idle%: 99.8, iops: 0.4, kbps: 12.6, svct: 6.8
2022-02-05 20:21:01 +08 Updated Checking speed 10 drives
2022-02-05 20:21:01 +08 Updated Candidate:PDID: 64, adj_svct: 59.4, idle%: 7.6, iops: 109.7, kbps: 3027.2, svct: 98.3
2022-02-05 20:21:01 +08 Updated Next:PDID: 11, adj_svct: 15.3, idle%: 19.6, iops: 122.1, kbps: 3355.2, svct: 58.6
2022-02-05 20:35:01 +08 Updated
2022-02-05 20:35:01 +08 Updated Starting next iteration
2022-02-05 20:35:01 +08 Updated
2022-02-05 20:35:01 +08 Updated Checking speed 7 drives
2022-02-05 20:35:01 +08 Updated Candidate:PDID: 26, adj_svct: 4.2, idle%: 99.7, iops: 0.8, kbps: 33.2, svct: 4.6
2022-02-05 20:35:01 +08 Updated Next:PDID: 19, adj_svct: 3.9, idle%: 99.9, iops: 0.3, kbps: 11.6, svct: 4.1
2022-02-05 20:35:01 +08 Updated Checking speed 10 drives
2022-02-05 20:35:01 +08 Updated Candidate:PDID: 64, adj_svct: 113.1, idle%: 1.8, iops: 129.2, kbps: 3842.5, svct: 159.6
2022-02-05 20:35:01 +08 Updated Next:PDID: 36, adj_svct: 45.9, idle%: 10.5, iops: 143.5, kbps: 3871.3, svct: 96.7
2022-02-05 20:49:02 +08 Updated
2022-02-05 20:49:02 +08 Updated Starting next iteration
2022-02-05 20:49:02 +08 Updated
2022-02-05 20:49:02 +08 Updated Checking speed 7 drives
2022-02-05 20:49:02 +08 Updated Candidate:PDID: 19, adj_svct: 4.9, idle%: 99.8, iops: 0.4, kbps: 13.5, svct: 5.1
2022-02-05 20:49:02 +08 Updated Next:PDID: 27, adj_svct: 4.1, idle%: 99.8, iops: 0.5, kbps: 17.3, svct: 4.4
2022-02-05 20:49:02 +08 Updated Checking speed 10 drives
2022-02-05 20:49:02 +08 Updated Candidate:PDID: 64, adj_svct: 96.6, idle%: 2.0, iops: 128.5, kbps: 3936.1, svct: 143.0
2022-02-05 20:49:02 +08 Updated Next:PDID: 36, adj_svct: 29.2, idle%: 11.7, iops: 136.4, kbps: 3825.2, svct: 77.8
2022-02-05 21:03:02 +08 Updated
2022-02-05 21:03:02 +08 Updated Starting next iteration
2022-02-05 21:03:02 +08 Updated
2022-02-05 21:03:02 +08 Updated Checking speed 7 drives
2022-02-05 21:03:02 +08 Updated Candidate:PDID: 29, adj_svct: 12.5, idle%: 99.2, iops: 1.6, kbps: 289.4, svct: 13.7
2022-02-05 21:03:02 +08 Updated Next:PDID: 21, adj_svct: 12.5, idle%: 99.2, iops: 1.5, kbps: 282.6, svct: 13.6
2022-02-05 21:03:02 +08 Updated Checking speed 10 drives
2022-02-05 21:03:02 +08 Updated Candidate:PDID: 64, adj_svct: 105.8, idle%: 1.6, iops: 130.9, kbps: 4184.9, svct: 153.4
2022-02-05 21:03:02 +08 Updated Next:PDID: 35, adj_svct: 22.6, idle%: 12.0, iops: 142.0, kbps: 4152.3, svct: 73.5
2022-02-05 21:03:02 +08 Updated
2022-02-05 21:03:02 +08 Updated FOUND SLOW DRIVE: PDID: 64, adj_svct: 105.8, idle%: 1.6, iops: 130.9, kbps: 4184.9, svct: 153.4
2022-02-05 21:03:02 +08 Updated Marking slow disk 64 failed
2022-02-05 21:03:02 +08 Updated Failed PDID 64
2022-02-05 21:03:02 +08 Updated
2022-02-05 21:03:02 +08 Completed.
The latest servicemag start
2022-02-06 00:30:36 +08 Updated Executing "sstart_pd_64" as 1:15777
2022-02-06 00:30:36 +08 Updated servicemag start -wait -pdid 64
2022-02-06 00:30:36 +08 Updated ... servicing disks in mag: 3 0
2022-02-06 00:30:36 +08 Updated ... normal disks:
2022-02-06 00:30:36 +08 Updated ... not normal disks: WWN [5000C5007F6EFABC] Id [64] diskpos [0]
2022-02-06 00:30:36 +08 Updated ... relocating chunklets to spare space...
2022-02-06 00:30:47 +08 Updated ... bypassing mag 3 0
2022-02-06 00:31:27 +08 Updated ... bypassed mag 3 0
2022-02-06 00:31:27 +08 Updated servicemag start -wait -pdid 64 -- Succeeded
2022-02-06 00:31:27 +08 Completed scheduled task.
I noticed that the replacement disk is a candidate for checking for 10 consecutive times then the system will mark it as Failed.
Has anyone experienced this same issue? Is there a way to not make the disk on the specific slot not to be slow?
I have this weird issue with 3PAR 7200. I have a failed disk with specs 900GB FC 10K 6G Encrypted HDD.
Each time I replace it, servicemag resume will succeed.
However after a couple of hours, the disk will fail again. Also showing that servicemag start succeeds. I have tried 3 disks already, each with different DOM (2013, 2014, 2015) and it is still the same.
I then further dig into the logs. Each replacement that I have, I noticed that after servicemag completes, the replacement disk is always marked as a candidate for check_slow_disk task.
The IOPS for the replaced disk is between the range of 105 to 135. While the ideal should be 140 for a 10K HDD.
This is the last extract of the check_slow_disk before failing, for the 4th time.
2022-02-05 20:07:01 +08 Updated Executing "check_slow_disk" as 0:29843
2022-02-05 20:07:01 +08 Updated RPM 100 -> Good IOPS 2000
2022-02-05 20:07:01 +08 Updated RPM 10 -> Good IOPS 140
2022-02-05 20:07:01 +08 Updated RPM 150 -> Good IOPS 2000
2022-02-05 20:07:01 +08 Updated RPM 15 -> Good IOPS 180
2022-02-05 20:07:01 +08 Updated RPM 7 -> Good IOPS 60
2022-02-05 20:07:01 +08 Updated Running at interval 840 for 3360 seconds
2022-02-05 20:21:01 +08 Updated
2022-02-05 20:21:01 +08 Updated Starting next iteration
2022-02-05 20:21:01 +08 Updated
2022-02-05 20:21:01 +08 Updated Checking speed 7 drives
2022-02-05 20:21:01 +08 Updated Candidate:PDID: 27, adj_svct: 7.0, idle%: 99.7, iops: 0.5, kbps: 15.4, svct: 7.2
2022-02-05 20:21:01 +08 Updated Next:PDID: 19, adj_svct: 6.6, idle%: 99.8, iops: 0.4, kbps: 12.6, svct: 6.8
2022-02-05 20:21:01 +08 Updated Checking speed 10 drives
2022-02-05 20:21:01 +08 Updated Candidate:PDID: 64, adj_svct: 59.4, idle%: 7.6, iops: 109.7, kbps: 3027.2, svct: 98.3
2022-02-05 20:21:01 +08 Updated Next:PDID: 11, adj_svct: 15.3, idle%: 19.6, iops: 122.1, kbps: 3355.2, svct: 58.6
2022-02-05 20:35:01 +08 Updated
2022-02-05 20:35:01 +08 Updated Starting next iteration
2022-02-05 20:35:01 +08 Updated
2022-02-05 20:35:01 +08 Updated Checking speed 7 drives
2022-02-05 20:35:01 +08 Updated Candidate:PDID: 26, adj_svct: 4.2, idle%: 99.7, iops: 0.8, kbps: 33.2, svct: 4.6
2022-02-05 20:35:01 +08 Updated Next:PDID: 19, adj_svct: 3.9, idle%: 99.9, iops: 0.3, kbps: 11.6, svct: 4.1
2022-02-05 20:35:01 +08 Updated Checking speed 10 drives
2022-02-05 20:35:01 +08 Updated Candidate:PDID: 64, adj_svct: 113.1, idle%: 1.8, iops: 129.2, kbps: 3842.5, svct: 159.6
2022-02-05 20:35:01 +08 Updated Next:PDID: 36, adj_svct: 45.9, idle%: 10.5, iops: 143.5, kbps: 3871.3, svct: 96.7
2022-02-05 20:49:02 +08 Updated
2022-02-05 20:49:02 +08 Updated Starting next iteration
2022-02-05 20:49:02 +08 Updated
2022-02-05 20:49:02 +08 Updated Checking speed 7 drives
2022-02-05 20:49:02 +08 Updated Candidate:PDID: 19, adj_svct: 4.9, idle%: 99.8, iops: 0.4, kbps: 13.5, svct: 5.1
2022-02-05 20:49:02 +08 Updated Next:PDID: 27, adj_svct: 4.1, idle%: 99.8, iops: 0.5, kbps: 17.3, svct: 4.4
2022-02-05 20:49:02 +08 Updated Checking speed 10 drives
2022-02-05 20:49:02 +08 Updated Candidate:PDID: 64, adj_svct: 96.6, idle%: 2.0, iops: 128.5, kbps: 3936.1, svct: 143.0
2022-02-05 20:49:02 +08 Updated Next:PDID: 36, adj_svct: 29.2, idle%: 11.7, iops: 136.4, kbps: 3825.2, svct: 77.8
2022-02-05 21:03:02 +08 Updated
2022-02-05 21:03:02 +08 Updated Starting next iteration
2022-02-05 21:03:02 +08 Updated
2022-02-05 21:03:02 +08 Updated Checking speed 7 drives
2022-02-05 21:03:02 +08 Updated Candidate:PDID: 29, adj_svct: 12.5, idle%: 99.2, iops: 1.6, kbps: 289.4, svct: 13.7
2022-02-05 21:03:02 +08 Updated Next:PDID: 21, adj_svct: 12.5, idle%: 99.2, iops: 1.5, kbps: 282.6, svct: 13.6
2022-02-05 21:03:02 +08 Updated Checking speed 10 drives
2022-02-05 21:03:02 +08 Updated Candidate:PDID: 64, adj_svct: 105.8, idle%: 1.6, iops: 130.9, kbps: 4184.9, svct: 153.4
2022-02-05 21:03:02 +08 Updated Next:PDID: 35, adj_svct: 22.6, idle%: 12.0, iops: 142.0, kbps: 4152.3, svct: 73.5
2022-02-05 21:03:02 +08 Updated
2022-02-05 21:03:02 +08 Updated FOUND SLOW DRIVE: PDID: 64, adj_svct: 105.8, idle%: 1.6, iops: 130.9, kbps: 4184.9, svct: 153.4
2022-02-05 21:03:02 +08 Updated Marking slow disk 64 failed
2022-02-05 21:03:02 +08 Updated Failed PDID 64
2022-02-05 21:03:02 +08 Updated
2022-02-05 21:03:02 +08 Completed.
The latest servicemag start
2022-02-06 00:30:36 +08 Updated Executing "sstart_pd_64" as 1:15777
2022-02-06 00:30:36 +08 Updated servicemag start -wait -pdid 64
2022-02-06 00:30:36 +08 Updated ... servicing disks in mag: 3 0
2022-02-06 00:30:36 +08 Updated ... normal disks:
2022-02-06 00:30:36 +08 Updated ... not normal disks: WWN [5000C5007F6EFABC] Id [64] diskpos [0]
2022-02-06 00:30:36 +08 Updated ... relocating chunklets to spare space...
2022-02-06 00:30:47 +08 Updated ... bypassing mag 3 0
2022-02-06 00:31:27 +08 Updated ... bypassed mag 3 0
2022-02-06 00:31:27 +08 Updated servicemag start -wait -pdid 64 -- Succeeded
2022-02-06 00:31:27 +08 Completed scheduled task.
I noticed that the replacement disk is a candidate for checking for 10 consecutive times then the system will mark it as Failed.
Has anyone experienced this same issue? Is there a way to not make the disk on the specific slot not to be slow?