Page 1 of 3

How many nodes does 7400 can fail at same time?

Posted: Wed Jul 24, 2013 3:16 am
by nnnnnnine
If I have a StoreServ 7400 with 4 nodes configurations,

and then two nodes fail at the same node canister..

Does the 7400 can still provide I/O service for front Host server?

Re: How many nodes does 7400 can fail at same time?

Posted: Wed Jul 24, 2013 11:16 am
by hdtvguy
I'll assume the 7400 4 nodes is just liek a P10000 V400 4 node, you have 2 node pairs, if you loose both nodes in a pair you have data issues as half your drives are unavailable.

Re: How many nodes does 7400 can fail at same time?

Posted: Sat Jul 27, 2013 12:52 pm
by nnnnnnine
thanks for your reply.

So does it mean, 7400 in same node pair can only fail one node.
If node fail in different pair then 7400 can fail two node at same time?

Re: How many nodes does 7400 can fail at same time?

Posted: Mon Jul 29, 2013 6:04 am
by hdtvguy
In theory a node in each node pair can fail.

Re: How many nodes does 7400 can fail at same time?

Posted: Tue Jul 30, 2013 9:42 am
by Arkturas
I to would also like to know, in the 7400 - nodes are grouped in pairs (0,1) + (2,3). Logically losing one node from each pair should not result in array down status.

However, having raised this with HP this was their response:

HP commits “No single point of failure” without system interruption in 3PAR StoreServ Storage ie., only a single node failure. Refer page-3 in the attached whitepaper.

"HP 3PAR StoreServ Storage designed for mission-critical high availability.pdf
http://h20195.www2.hp.com/v2/GetPDF.asp ... 316ENW.pdf"

What would happen if we lost 2 nodes from a pair ? would the array still function ?
"No. StorServ 7400 is designed to survive single node failure."

According the following whitepaper : An introduction to HP 3PAR StoreServ for the EVA administrator -

HP 3PAR OS implements a unique feature in the industry called HP 3PAR Persistent Cache. This resiliency feature preserves write-caching in the event of loss of a controller node on systems with four nodes or more. In case of a node failure, Persistent Cache rapidly mirrors the cache of the surviving node of the node pair to one of other nodes in the array. That means we always have two copies of a node’s cache in the system.

- written as 'Node' singular, would infer that on you can only lose a single node.

Please could someone clarify -

in a 7400 with 4 nodes (0,1,2,3) where 0,1 are paired and 2,3 are paired. Is it possible to lose one node from each pair without loss of connectivity to vv's.

Thanks.
Gareth

Re: How many nodes does 7400 can fail at same time?

Posted: Tue Jul 30, 2013 10:49 am
by Richard Siemers
Arkturas wrote:Please could someone clarify -

in a 7400 with 4 nodes (0,1,2,3) where 0,1 are paired and 2,3 are paired. Is it possible to lose one node from each pair without loss of connectivity to vv's.


Confirmed. I have a two T800s with 4 nodes each. When they perform major version upgrades, they reboot half the nodes at one time (odds or evens), then the other half to complete the upgrade non-disruptively. However, connectivity to the VVs is USER controlled and subject to user errors in zoning and exporting.

We do simple Host to VV without further specifying or locking down which nodes or ports so that the exports will find the hosts on any port, any node.

Zoning is important to plan and audit that its being done correctly. Health checks prior and during an Inserv upgrade will check for "Vertically connected" hosts and fail if found. Since the upgrades reboot 1/2 the nodes at once, it makes sure that there are no hosts connected to just the even nodes or just the odd nodes (odds/even are physically vertically stacked in S400/800 and T400/800). Easiest way to avoid this is ensure that each host WWN is zoned to both members of a node pair. I like this method as well because it splits the hosts load across 2 ports helping to mitigate any single noisy host from single handedly saturating a Inserv port. As in a case where you host is 4g ports, and your storage is also 4g ports... by zoning the host port to 2 nodes, if the host peaks its 4g port, that only results in 2g or 50% utilization on each of the 3par ports, so that adds a soft layer of performance protection. Assuming you also have 2 sans, an A side and a B side... zoning can get pretty hairy. In our case, we have 4 nodes, and each node has 4 ports for a total of 16 storage ports, we havs an A-side and a B-Side VSAN... I chose to put all the odd ports on the A-side, and all the even ports on the B-Side resulting in every NODE having 2 connections to each side of the SAN. So in my environment where I zone each dual attached host to 4 nodes, this nets out to 4 groups of 4 ports. When we add a new host, we pick the group with the least hosts attached and this keeps things pretty balanced and symmetrical.

Re: How many nodes does 7400 can fail at same time?

Posted: Tue Jul 30, 2013 11:20 am
by Arkturas
Update - having spoken to a Senior HP Storage Architect...

Q: On the 7400’s (4-node) hypothetically speaking if I lost one node from each pair (node 0, 3) would the array still function ? (ie serve VV’s to hosts).

A: You can only lose a single node in a 4node configs and maintain availability, if a system was then to lose a second node from the remaining controller pair then the system would shut down. This same rule applies to a fully populated 10800 8 node system also, one node goes down it will continue, 2 nodes go down and it will shut down even though there are 6 remaining nodes up and working.

Re: How many nodes does 7400 can fail at same time?

Posted: Tue Jul 30, 2013 2:03 pm
by Richard Siemers
I am skeptical of that answer. Perhaps someone in a proof of concept situation can do a hands on test.

I know for fact that when they do major revision upgrades, they reboot half the nodes at once, which on a fully populated T800 would mean 4 nodes rebooting at once. Also the system "shutting down" I believe is an over simplification of explaining that the VVs impacted by the outage would go offline. So in a sitation where node 0 and 1 failed, all the LDs that used PDs serviced by that node pair would go offline, and take down whatever VVs assigned to them. IF you had special CPGs setup to only use PDs from nodes 2 and 3, then those LDs and VVs should remain online.

Re: How many nodes does 7400 can fail at same time?

Posted: Tue Jul 30, 2013 2:27 pm
by Arkturas
Thanks Richard, hopefully someone with a spare 4-node will be able to prove or disprove our suspicions.

Re: How many nodes does 7400 can fail at same time?

Posted: Wed Jul 31, 2013 5:08 pm
by afidel
Richard, with port persistence wouldn't the zoning mixup be a non-issue because the nodes that are remaining online would adopt the port personalities of the nodes that are being rebooted?