Page 1 of 1

Windows Hosts Goes Unresponsive During 3PAR OS Upgrade

Posted: Sat May 31, 2014 2:09 am
by gkapoor
Hello All,

I am running into a weird problem and this is second time. We are upgrading out Inform OS version from 3.1.1 MU1 --> 3.1.2 MU3 (All required Patches) on T400.

We have approximate 30 Windows Hosts, these windows hosts boot from SAN. All of those are zoned to two different FC switches and can see all paths as required.

During the upgrade as soon as we reboot our first node of T400, these two windows hosts becomes unresponsive. Though they are always reachable on network but doesn't allow RDP or even iDRAC session. These are combination of DELL R620 and R610 Servers having EMULEX HBA cards.

MPIO is set to round robin, required hotfixes of MPIO (Probably 5 patches) recommended by Windows is also installed on these boxes.

Following are the details of HBA cards from Windows.

Emulex LPE12002-M8 (Dual Port)

Driver Version: 2.74.014.001
Firmware Version: 2.01A4
Boot Version: 2.12a9

There is nothing being logged in EVENT Viewer of Windows when system becomes unresponsive. To me it appears that as soon as node is being rebooted, nothing is being written to the disk, it looks like somehow connectivity is being broken for BOOT from SAN LUN.


Any assistance or leads would be really appreciated.

Thanks,
G Kapoor

Re: Windows Hosts Goes Unresponsive During 3PAR OS Upgrade

Posted: Sat May 31, 2014 1:12 pm
by Davidkn
Hmmm, I'm personally not a great fan of boot on San, exactly for these reasons, but I'm struggling to come up with an answer.

I not an expert in boot from San but only 1 port on the Hba is connected to the lun it's booting from afaik, and it's connecting to that via a wwn and a lun number?

So is it due to the fact that the version you are on doesn't support persistent ports and so the wwn isn't failed over during the reboot?

I'm not sure, I wish I had more experience around this, hopefully someone else will be able to answer.

Re: Windows Hosts Goes Unresponsive During 3PAR OS Upgrade

Posted: Sat May 31, 2014 10:28 pm
by Richard Siemers
We have roughly 100 windows physical boxes SAN attached, all boot from local raid controllers. Its not rare for us to find a Windows box that inexplicably hangs when we do switch or storage maintenance. One such host had this issue with Clarrion/Powerpath, and still had the issue after we removed power path and connected it to 3PAR with MPIO. I have no clue what causes that, my hunch is that its patch order induced. We have 50ish AIX physical boxes and about 15 of those boot from SAN, never an issue with any of them (knock on wood).

The good news though, is now that your on 3.1.2 port persistence should be in effect for you know assuming your cabling/zoning is correct and NPIV is enabled on the switches. This should prevent the host from experiencing a path down during maintenance events like a node reboot.

Re: Windows Hosts Goes Unresponsive During 3PAR OS Upgrade

Posted: Sun Jun 01, 2014 11:08 pm
by gkapoor
We are not yet on 3.1.2 and that's the problem. We had to roll back the upgrade as the windows host which went down were critical to be up. Hence, we had to minimize the downtime for customers.

I am till struggling to find out the cause of these hosts to go down so that I can plan the other maintenance for same upgrade.

Thanks Guys. I will post something for sure if I will be able to find out.

Re: Windows Hosts Goes Unresponsive During 3PAR OS Upgrade

Posted: Mon Jun 02, 2014 12:53 am
by apol
We had some issues that after an update, some hosts had difficulties in using their volumes presented from our 3PAR-arrays. The problem was a windows-policy on how to treat volumes with changed scsi-characteristics. And those scsi-characteristics changed with most of the 3par-updates we did so far.

Open a command-promt and type "diskpart". On the diskpart-prompt, type "san".

Result should be: online - all

If it's different, ("offline ..."), this host will have problems with his discs being "offline" until an admin takes them online again in server-management.

Re: Windows Hosts Goes Unresponsive During 3PAR OS Upgrade

Posted: Mon Jun 02, 2014 6:28 am
by hdtvguy
Also what FC are you running, Brocade has a bug in 7.0 and 7.1 where ports can get into a condition where devices are "logged into the fabric" but are not communicating. The fix in that scenario is a hard reboot of the FC switch and get to 7.2. We got bit by this with our Windows servers and EVA when they stopped communicating and it happened on all paths so we lost all connections from Windows to the EVAs until the fabrics were rebooted. Even power cycling EVA and Windows servers and offline/online FC ports did not fix it. Brocade used to be a good company, but their products are so mediocre now and they are a company lost without a mission.