Page 1 of 1

IO Modules dead after cageupgrade

Posted: Thu Jul 10, 2014 11:52 am
by vPatrick
Hi there,

I added a M6720 cage (HP,DCS1) to an 7400-2 Node today. The online add of the cage was no big deal an went smooth.

This is a showcage after the addition of the cage.

Code: Select all

7400-1 cli% showcage -d cage9
Id Name  LoopA Pos.A LoopB Pos.B Drives Temp  RevA RevB Model Side
 9 cage9 1:0:2     0 0:0:2     4     12 23-25 320c 320c DCS1  n/a

-----------Cage detail info for cage9 ---------

Position: ---

Interface Board Info            Card0            Card1
     Firmware_status              Old              Old
         Product_Rev             320c             320c
 State(self,partner)            OK,OK            OK,OK
  VendorId,ProductId          HP,DCS1          HP,DCS1
          Master_CPU               No              Yes
            SAS_Addr 50050CC10EE9F1FE 50050CC10EE9F57E
 Link_Speed(DP1,DP2)  6.0Gbps,Unknown  6.0Gbps,6.0Gbps

 PS PSState ACState DCState Fan State Fan0_Speed Fan1_Speed
ps0      OK      OK      OK        OK    HiSpeed    HiSpeed
ps1      OK      OK      OK        OK    HiSpeed    HiSpeed


-------------Drive Info-------------- --PortA-- --PortB--
Drive       DeviceName  State Temp(C) LoopState LoopState
  0:0 5000cca0288df353 Normal      25        OK        OK
  1:0 5000cca0286f34e3 Normal      23        OK        OK
  2:0 5000cca01b95acb7 Normal      24        OK        OK
  3:0 5000cca0287d24eb Normal      24        OK        OK
  4:0 5000cca0288df42f Normal      25        OK        OK
  5:0 5000cca01c97e62f Normal      25        OK        OK
  6:0 5000cca0288df39b Normal      25        OK        OK
  7:0 5000cca02889647f Normal      25        OK        OK
  8:0 5000cca028830d9b Normal      25        OK        OK
  9:0 5000cca0288df20f Normal      24        OK        OK
 10:0 5000cca0288df357 Normal      25        OK        OK
 11:0 5000cca0288d4e83 Normal      25        OK        OK


As you can see, showcage showed to an old firmware. So I did an upgradecage to update the firmware.

Code: Select all

7400-1 cli% upgradecage cage9
Upgrading cage cage9 cpuA from rev 320c to revision in file /opt/tpd/fw/cage/ebod/hp_e6ebd_local_combined_v3.2.0.15.gff.
Upgrading cage cage9 cpuB from rev 320c to revision in file /opt/tpd/fw/cage/ebod/hp_e6ebd_local_combined_v3.2.0.15.gff.
g b Beginning test after upgrade for cage9
cage9 passed test after upgrade


Now it went wrong... This is a showcage AFTER the firmware update:

Code: Select all

7400-1 cli% showcage -d cage9
Id Name  LoopA Pos.A LoopB Pos.B Drives Temp  RevA RevB Model Side
 9 cage9 ---       0 0:0:2     4     12 23-25 -         DCS1  n/a

-----------Cage detail info for cage9 ---------

Position: ---

Interface Board Info Card0            Card1
     Firmware_status     -          Unknown
         Product_Rev     -                 
 State(self,partner)   -,-       Unknown,OK
  VendorId,ProductId   -,-          HP,DCS1
          Master_CPU     -               No
            SAS_Addr     - 50050CC10EE9F57E
 Link_Speed(DP1,DP2)   -,-  Unknown,Unknown

 PS PSState ACState DCState Fan State Fan0_Speed Fan1_Speed
ps0      OK      OK      OK        OK        Low        Low
ps1      OK      OK      OK        OK        Low        Low


-------------Drive Info-------------- --PortA-- --PortB--
Drive       DeviceName  State Temp(C) LoopState LoopState
  0:0 5000cca0288df353 Normal      25         -        OK
  1:0 5000cca0286f34e3 Normal      23         -        OK
  2:0 5000cca01b95acb7 Normal      24         -        OK
  3:0 5000cca0287d24eb Normal      24         -        OK
  4:0 5000cca0288df42f Normal      25         -        OK
  5:0 5000cca01c97e62f Normal      25         -        OK
  6:0 5000cca0288df39b Normal      24         -        OK
  7:0 5000cca02889647f Normal      25         -        OK
  8:0 5000cca028830d9b Normal      25         -        OK
  9:0 5000cca0288df20f Normal      24         -        OK
 10:0 5000cca0288df357 Normal      25         -        OK
 11:0 5000cca0288d4e83 Normal      25         -        OK


One loop is dead, the other IO modules looks spooky. I opened a case at HP and they told me, that there is a known issue with firmware updates, if 2 TB LFF NL SAS drives are installed. Can anyone confirm that or does anyone had the same issue? I have a virtual room meeting tomorrow with a 2nd level engineer to fix this. A soft- and hard reset of the IO modules didn't helped. Cage 4 is also a M6720 (DCS1) with 12x 2 TB 2 TB LFF NL SAS drives running firmware 320f on both IO modules. This cage hadn't any problems, but it was part of the initial setup, so I don't know if a firmware update was made on this cage.

Thanks for advice.

Best regards,
Patrick

Re: IO Modules dead after cageupgrade

Posted: Thu Jul 10, 2014 4:26 pm
by Davidkn
I personally haven't seen this in the field, trying to remember whether the last one I did had the 2tb drives or not.

I didn't run the upgradecage command, I thinking ran the admithw command which invokes the firmware upgrade or downgrade depending on what the other shelves are running. I can't imagine the command run would matter.

I'm sure the hp engineer will be able to fix it, if not it sounds like they'll be sending you out some new io modules as failed firmware updates normally end up in useless parts.

Re: IO Modules dead after cageupgrade

Posted: Fri Jul 11, 2014 12:58 am
by afidel
There was definitely an issue with 2TB and 3TB drives resolved in a hotfix for either 3.1.2 MU2 or 3, it's been a few weeks since I went through all the release notes. I definitely remember it was supposed to be a hard requirement before adding any NL shelves and was highly recommended if you had NL shelves due to the fact that replacement drives could trigger the issue in existing shelves.

Re: IO Modules dead after cageupgrade

Posted: Fri Jul 11, 2014 1:01 am
by vPatrick
Hello David,

thanks for your reply. That's interesting, because HP states in their Troubleshooting Guide to run upgradecage if an old firmware is on the I/O modules.

@ afidel

Regarding the hotfix: Good information! If I remind correctly, the Customer is running 3.1.2 MU3, so he's maybe affected.

Lets see what HP can do today. :)

Best regards,
Patrick

Re: IO Modules dead after cageupgrade

Posted: Fri Jul 11, 2014 7:30 am
by vPatrick
Hello,

2nd level solved the problem without replacing the HW. Root login to the StoreServ was necessary. 2nd level ussed a tcli command to reboot the IFC1 (card1). After the reboot, the "dead" IFC0 (card0) came back. Everything looks shiny now.

7400-1 cli% showcage -d cage9
Id Name LoopA Pos.A LoopB Pos.B Drives Temp RevA RevB Model Side
9 cage9 1:0:2 0 0:0:2 4 12 32-37 320f 320f DCS1 n/a

-----------Cage detail info for cage9 ---------

Position: ---

Interface Board Info Card0 Card1
Firmware_status Current Current
Product_Rev 320f 320f
State(self,partner) OK,OK OK,OK
VendorId,ProductId HP,DCS1 HP,DCS1
Master_CPU Yes No
SAS_Addr 50050CC10EE9F1FE 50050CC10EE9F57E
Link_Speed(DP1,DP2) 6.0Gbps,Unknown 6.0Gbps,6.0Gbps

PS PSState ACState DCState Fan State Fan0_Speed Fan1_Speed
ps0 OK OK OK OK Low Low
ps1 OK OK OK OK Low Low


-------------Drive Info-------------- --PortA-- --PortB--
Drive DeviceName State Temp(C) LoopState LoopState
0:0 5000cca0288df353 Normal 34 OK OK
1:0 5000cca0286f34e3 Normal 33 OK OK
2:0 5000cca01b95acb7 Normal 34 OK OK
3:0 5000cca0287d24eb Normal 35 OK OK
4:0 5000cca0288df42f Normal 35 OK OK
5:0 5000cca01c97e62f Normal 34 OK OK
6:0 5000cca0288df39b Normal 36 OK OK
7:0 5000cca02889647f Normal 37 OK OK
8:0 5000cca028830d9b Normal 32 OK OK
9:0 5000cca0288df20f Normal 32 OK OK
10:0 5000cca0288df357 Normal 33 OK OK
11:0 5000cca0288d4e83 Normal 34 OK OK

Re: IO Modules dead after cageupgrade

Posted: Sat Jul 12, 2014 12:50 pm
by Davidkn
I guess this highlights the value of getting hp involved with hardware upgrades as the upgrade plan should have highlighted this and the relevant patches installed ahead of time?

I never just add the hardware with first getting hp involved. Not worth the risk on live sans when hp will provide the upgrade plan for you.

Glad it's sorted though.