HPE Storage Users Group

A Storage Administrator Community




Post new topic Reply to topic  [ 22 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
 Post subject: Re: Physical Disk Failures
PostPosted: Tue Mar 25, 2014 9:32 am 
Site Admin
User avatar

Joined: Tue Aug 18, 2009 10:35 pm
Posts: 1328
Location: Dallas, Texas
I was able to confirm that "Used Fail" chunklets is a good one to watch to get to 0. Just had a 1 TB NL drive fail:

Code:
ESFWT800-1 cli% showpd -c 362
                              ------- Normal Chunklets -------- ---- Spare Chunklets ----
                              - Used - -------- Unused -------- - Used - ---- Unused ----
 Id CagePos Type State  Total OK  Fail Free Uninit Unavail Fail OK  Fail Free Uninit Fail
362 0:5:2   NL   failed  3724  0  1078    0   1046       0 1586  0     0    0      0   14
-----------------------------------------------------------------------------------------
  1 total                3724  0  1078    0   1046       0 1586  0     0    0      0   14


That number of failed chunklets is slowly ticking down over time as they, and I cant tell which, move or rebuild from parity.

"showpdch -sync" did not show anything.

"showpdch -mov" showed all the chunklets from the failed PD that had already been relocated, and 2 that were actively moving.

"showpdch 362" showed all the chunklets left on the drive, and the current 2 that were moving. This list is getting shorter and shorter, it only takes a short time per chunklet.

"showpdch -mov 362" shows just the 2 chunklets being moved off the failed drive.

What is interesting is that "showpd -c 362" shows all the remaining chunklets as "failed" and that number is shriking over time... however, "showpdch 362" shows all the chunklets as "normal" but its clearly evacuating them to other disks 2 at a time.

_________________
Richard Siemers
The views and opinions expressed are my own and do not necessarily reflect those of my employer.


Top
 Profile  
Reply with quote  
 Post subject: Re: Physical Disk Failures
PostPosted: Thu Mar 27, 2014 4:15 pm 

Joined: Tue May 07, 2013 1:45 pm
Posts: 216
Hmm, only 2 chunklets at a time? That seems like a rather slow way to restore availability. I was led to believe that recovery operations were done on a many to many basis like XIV but 2 chunklets concurrent sounds much closer to RID's on EVA.


Top
 Profile  
Reply with quote  
 Post subject: Re: Physical Disk Failures
PostPosted: Thu Mar 27, 2014 11:43 pm 
Site Admin
User avatar

Joined: Tue Aug 18, 2009 10:35 pm
Posts: 1328
Location: Dallas, Texas
This 2 chunklets moving at a time thing *seems* to be a new feature since we upgraded from 2.3.1 to 3.1.1. With 2.3.1, rebuilds would go fast enough to trigger our IOPS/PD alerts every 5 minutes for about 30 minutes total... then the rebuild would complete.

I suspect there is more to it than that... I *think* these chunks on the failed drive were still online/readable so it may have chosen an low priority move since availability was not impacted... I hope thats the case.

Would be nice to have some documentation of how drive errors are dealt with.

_________________
Richard Siemers
The views and opinions expressed are my own and do not necessarily reflect those of my employer.


Top
 Profile  
Reply with quote  
 Post subject: Re: Physical Disk Failures
PostPosted: Fri Mar 28, 2014 7:23 am 

Joined: Tue May 07, 2013 1:45 pm
Posts: 216
Ah, that makes sensse, if it sees the drive as online but degraded it's logical to do a low priority evacuation.


Top
 Profile  
Reply with quote  
 Post subject: Re: Physical Disk Failures
PostPosted: Fri Aug 14, 2015 2:03 pm 

Joined: Mon Jun 15, 2015 11:34 am
Posts: 13
What happens if someone pulls wrong disk out and wants to put it back ?

1, Does it move chunklets from removed disk to other PD ?
2, How to bring PD back online after putting it back inside ?
3, How to restore those chunklets back to the disk which was pulled out ?


Thanks for your recommendations and expert views


Top
 Profile  
Reply with quote  
 Post subject: Re: Physical Disk Failures
PostPosted: Tue May 01, 2018 4:41 am 

Joined: Tue May 01, 2018 4:27 am
Posts: 6
Hello

I have some problem with hp 3par 7200 with 900GB FC HDD.
one of the HDDs is fail about 1 month ago , the pdid of my hdd is 0 19 , i replace it withe resvicemag procedure and everything is ok.
after 1 day my new hdd is normal and the failed disk is gone but next day new disk is fail , i replace the fail disk again and after 1 day everything is ok.
after 1 month hdd in 0 19 fail again and i replace it but after 2 day the new hdd has fail again.


cli% showpd

----Size(MB)---- ----Ports----
Id CagePos Type RPM State Total Free A B Cap(GB)
0 0:0:0 FC 10 normal 838656 146432 1:0:1* 0:0:1 900
1 0:1:0 FC 10 normal 838656 143360 1:0:1 0:0:1* 900
2 0:2:0 FC 10 normal 838656 585728 1:0:1* 0:0:1 900
3 0:3:0 FC 10 normal 838656 136192 1:0:1 0:0:1* 900
4 0:4:0 FC 10 normal 838656 147456 1:0:1* 0:0:1 900
5 0:5:0 FC 10 normal 838656 117760 1:0:1 0:0:1* 900
6 0:6:0 FC 10 normal 838656 148480 1:0:1* 0:0:1 900
7 0:7:0 FC 10 normal 838656 129024 1:0:1 0:0:1* 900
8 0:8:0 FC 10 normal 838656 148480 1:0:1* 0:0:1 900
9 0:9:0 FC 10 normal 838656 105472 1:0:1 0:0:1* 900
10 0:10:0 FC 10 normal 838656 0 1:0:1* 0:0:1 900
11 0:11:0 FC 10 normal 838656 0 1:0:1 0:0:1* 900
12 0:12:0 FC 10 normal 838656 0 1:0:1* 0:0:1 900
13 0:13:0 FC 10 normal 838656 0 1:0:1 0:0:1* 900
14 0:14:0 FC 10 normal 838656 0 1:0:1* 0:0:1 900
15 0:15:0 FC 10 normal 838656 1024 1:0:1 0:0:1* 900
16 0:16:0 FC 10 normal 838656 0 1:0:1* 0:0:1 900
17 0:17:0 FC 10 normal 838656 0 1:0:1 0:0:1* 900
18 0:18:0 FC 10 normal 838656 0 1:0:1* 0:0:1 900
19 0:19:0 FC 10 failed 838656 0 1:0:1 0:0:1* 900
20 0:21:0 FC 10 normal 838656 0 1:0:1 0:0:1* 900
21 0:22:0 FC 10 normal 838656 5120 1:0:1* 0:0:1 900
22 0:23:0 FC 10 normal 838656 2048 1:0:1 0:0:1* 900
23 0:20:0 FC 10 normal 838656 0 1:0:1* 0:0:1 900


cli% checkhealth

Checking alert
Checking cabling
Checking cage
Checking dar
Checking date
Checking ld
Checking license
Checking network
Checking node
Checking pd
Checking port
Checking rc
Checking snmp
Checking task
Checking vlun
Checking vv
Component ---------------Description--------------- Qty
Network Too few working admin network connections 1
PD PDs that are failed 1


cli% showcage

Id Name LoopA Pos.A LoopB Pos.B Drives Temp RevA RevB Model Side
0 cage0 1:0:1 0 0:0:1 0 24 26-30 320e 320e DCN1 n/a



cli% showversion

Release version 3.1.2 (MU2)
Patches: P10

Component Name Version
CLI Server 3.1.2 (MU2)
CLI Client 3.1.2 (MU2)
System Manager 3.1.2 (MU2)
Kernel 3.1.2 (MU2)
TPD Kernel Code 3.1.2 (MU2)

cli% servicemag start -pdid 19 -seucceeded

Expecting integer pdid, got: -succeeded

SAN.SER cli% servicemag start -pdid 19 -succeeded

Are you sure you want to run servicemag?
select q=quit y=yes n=no: y
servicemag start -pdid 19

... servicing disks in mag: 0 19

... normal disks:

... not normal disks: WWN [XXXXXXXXXXXXXXXX] Id [19] diskpos [0]



The servicemag start operation will continue in the background.

cli% showpd -space 19

-----------------(MB)------------------
Id CagePos Type -State- Size Volume Spare Free Unavail Failed
19 0:19:0 FC failed 838656 0 0 0 0 838656
---------------------------------------------------------------
1 total 838656 0 0 0 0 838656
SAN.SER cli% servicemag resume 0 19

Are you sure you want to run servicemag?
select q=quit y=yes n=no: y

servicemag status 0 19

The magazine is being brought online due to a servicemag resume.
The last status update was at Tue May 1 10:27:04 2018.
Chunklets relocated: 6 in 4 minutes and 45 seconds
Chunklets remaining: 2232
Chunklets marked for moving: 2232
Estimated time for relocation completion based on 47 seconds per chunklet is: 1 days, 5 hours, 8 minutes and 24 seconds
servicemag resume 0 19 -- is in Progress
cli% exit

may the os version is my problem?

please help me about this problem.

thank you


Top
 Profile  
Reply with quote  
 Post subject: Re: Physical Disk Failures
PostPosted: Tue May 01, 2018 5:56 am 

Joined: Mon Sep 21, 2015 2:11 pm
Posts: 1570
Location: Europe
Could be OS.

Could also be the cage slot.

How are you getting your replacement drives? If they are from ebay or some third party these may have been used and have some SMART counters just waiting to fail the drive.

_________________
The views and opinions expressed are my own and do not necessarily reflect those of my current or previous employers.


Top
 Profile  
Reply with quote  
 Post subject: Re: Physical Disk Failures
PostPosted: Tue May 01, 2018 6:02 am 

Joined: Tue May 01, 2018 4:27 am
Posts: 6
Thank you for reply

i buy my hdd from hp.
so if the slot is my problem , new hhd must be fail after the i insert the disk in slot.
but hdd fail after the chunklet relocation is end and hdd state is normal for 3 days or 1 month.


Top
 Profile  
Reply with quote  
 Post subject: Re: Physical Disk Failures
PostPosted: Tue May 01, 2018 6:27 am 

Joined: Wed Nov 09, 2011 12:01 pm
Posts: 392
Slot could be causing some intermittent errors that add up over time, reaching a threshold that fails the disk.

Next time check the slot for any debris or pin damage just in case.

Maybe occasional showpd -e commands to see if any errors are climbing.


Top
 Profile  
Reply with quote  
 Post subject: Re: Physical Disk Failures
PostPosted: Tue Sep 25, 2018 6:52 am 

Joined: Thu Oct 26, 2017 1:21 am
Posts: 96
at one of my drives i discovered 3Gib failed. when is time for concern? for how many failed chuncklets I have right to call support to change the drive?


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 22 posts ]  Go to page Previous  1, 2, 3  Next


Who is online

Users browsing this forum: No registered users and 80 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group | DVGFX2 by: Matt