Page 1 of 1

How many disks can fail???

Posted: Mon Jul 22, 2013 8:36 am
by sc0tt
Hi,

Haven't been able to find an explanation...

Trying to understand how 3PAR does RAID with the CPGs. For the 7000 series and using only disk HA in a single drive shelf how many drives can you lose with the different RAID types, 1/5/6?

For example, using all 24 FC, 15K disks to create a r6 CPG AND a r1 CPG, how many drive failures can I experience and still have data? If I use a set size of 6 (4+2) for the r6 CPG and also have a r1 on the same disks how do I know how many drives I can lose?

Trying to figure out chunklets/raidlets/LDs/wide striping, etc and how all this effects how many drive failures the system can tolerate with only one shelf and only disk HA.

Thanks,

Re: How many disks can fail???

Posted: Mon Jul 22, 2013 12:27 pm
by hdtvguy
I don't have a specific answer, but seeing as the data is wide stripped I am assuming as long as the failed drives do not impact the parity of the underlying data you can problem have numerous drives fail.

Re: How many disks can fail???

Posted: Mon Jul 22, 2013 2:32 pm
by sc0tt
Thank you for your reply.

I get that a raidlet (raid set of chunklets) has the same protection as a traditional raid set in that you can lose one chunklet in a r5 set and you can lose two chunklets in a r6 set. That's straightforward.

What I don't get is that there is another layer of striping in the logical drives across the raidlets. This is also where the "rows" come into play.

So, while a RAID5 raidlet might lose data if it loses two chunklets, do you also lose data if there is RAID50 striping across all the raidlets in the Logical Drive?

Thanks again,
Scott

Re: How many disks can fail???

Posted: Tue Jul 23, 2013 6:23 am
by Perconte
I would say 2. If one disk dies you still got the one spare disk. if that one dies aswell the chance is there are chucklets on it from your R1 CPG. If the third one dies and it has the other chuncklet of that same virtual volume of that R1 CPG your screwed.

Im not a 3Par guru so anybody.... correct me if im wrong....

Re: How many disks can fail???

Posted: Tue Jul 23, 2013 11:20 am
by hdtvguy
You can loose more than 2, if you array is properly configured you can loos an entire tray, there is a cage redundant setting that forces chunklets to not exist on 2 drives int he same tray/chassis. IT si all mathematical statistics and odds depending on how spread out your data is.

Re: How many disks can fail???

Posted: Tue Jul 23, 2013 2:56 pm
by sc0tt
This question pertains to drive HA, not cage HA. Only one drive shelf.

Thanks,

Re: How many disks can fail???

Posted: Wed Jul 24, 2013 7:16 am
by sc0tt
Hi all,

I got some answers in the hp forum if you're interested. You'll need a passport login to view the page.

http://h30499.www3.hp.com/t5/Storage-Ar ... lse#M63740

Re: How many disks can fail???

Posted: Thu Jul 25, 2013 2:02 am
by Perconte
Im not to sure about the 7000 series, but we had to give 3Par a number of spare disks that we want to have in our F-series. Did you have to do the same?

Re: How many disks can fail???

Posted: Thu Jul 25, 2013 3:22 pm
by afidel
Perconte wrote:Im not to sure about the 7000 series, but we had to give 3Par a number of spare disks that we want to have in our F-series. Did you have to do the same?

That sets up the amount of spare chunklets which is good for determining how many disks overall you can lose without replacements if there's enough time between failures but it doesn't affect how many simultaneous disk failures you can survive. That's determined by raid type and whether you've enabled cage redundancy. Spare chunklets help you a bit if the failures are spread over time but I'd never crank it past defaults since HP should be out within a few hours with spares anyways.

Re: How many disks can fail???

Posted: Tue Jul 30, 2013 4:16 pm
by Richard Siemers
I think there are 2 variations of the question... how many drives can fail before you run out of spare space.... vs how many drives can fail at once before rebuild/sparing completes.

Properly configured, every drive will have some spare chunklets on it, and there are no idle dedicated spare spindles wasting away. How many disks can fail before you run out of spare space is a moving target... if your system is under utilized, say only 10% full... then when a drive fails... only 10% of that drive's capacity needs to be moved/rebuilt into the spare space. Instead of rebuilding an entire drive, it only rebuilds the used portion. I system that is 90% full will use more spare chunklets per failed disk.

For concurrent fails, I think conventional rules apply. In a raid5 drive HA scenario, a 2 simultaneous disk failure can lead to data loss. Raid6 would take 3 failed drives in the same raid group. Raid1 is also prone to double disk failures, except the odds are less likely since both 2 of 2 disks must fail, as opposed to any 2 of 9 disks (if you are using raid5 8+1).

I had the displeasure of having two 300gb drives fail within 20 minutes of each other that were in a raid 5 config. Engineering manually recovered, hot and online, by forcing one or the other failed disk online, then manually rebuilding the failed raidlets. Some raidlets were successfully rebuilt using failed disk #1, the rest needed failed disk #2 to recover. Luckily, both disks failed due to media errors and not due to connectivity or other mechanical issues. I was surprised that we had no data loss or perceived outage.