Page 1 of 2

3Par Rookie - Need assistance in troubleshooting please..

Posted: Wed Jun 27, 2018 10:12 am
by walter_white
I know these types of questions are difficult to troubleshoot without knowing all the specifics of the environment but I'm hoping someone can walk me through it.. We have a site that has some VMs that aren't performing very good.. Very sluggish and it's not compute (CPU/MEM).. I'm thinking it's storage and hoping you guys can assist and verify..

I've attached some screenshots from StoreServ..

    The san is a 7200 running 3.3.3.612.
    The 7200 has 36 total drives.. (8) 150K SSDs and (28) FC 10Ks
    The Virtual Volume that the VM I've focused on is in a RAID5 CPG..
    The CPG has 41 VV's..

Can anyone assist, please?

Thanks for your time!

Re: 3Par Rookie - Need assistance in troubleshooting please.

Posted: Wed Jun 27, 2018 11:14 am
by BryanW
Are you using AO, AFC or both for the 8 SSDs?
Was AO running when you saw the latency?

Also the first thing I would look for when there is unexplained LD/VV latency is run the "Physical Drive Compare by Performance" report for the same period and look for a failing physical disk

Absent a failing PD, it could be a few of things - reply with what you see

Re: 3Par Rookie - Need assistance in troubleshooting please.

Posted: Wed Jun 27, 2018 12:09 pm
by walter_white
BryanW wrote:Are you using AO, AFC or both for the 8 SSDs?
Was AO running when you saw the latency?

Also the first thing I would look for when there is unexplained LD/VV latency is run the "Physical Drive Compare by Performance" report for the same period and look for a failing physical disk

Absent a failing PD, it could be a few of things - reply with what you see


Thanks so much for the reply! Greatly appreciated!!

Screenshots below.. Looks like AFC is off.. AO, I'm not sure.. It looks like it's possibly on but I don't see anything under schedules??

As far as the "Physical Drive Compare by Performance" report.. I've generated the report but after 30 minutes it's still spinning saying "Loading.."

Re: 3Par Rookie - Need assistance in troubleshooting please.

Posted: Wed Jun 27, 2018 1:02 pm
by MammaGutt
I’m just throwing multiple stuff out there.

Note:
3.3.3.612 isn’t a valid version. The latest is 3.3.1.x
If you have 3.2.2 MU4 or later and a valid support contract you can enable vcenter integration for Infosight. This could be extremely valuable in this troubleshooting.
You have allocated only 250 GB of SSD capacity to SSD tier in AO, this is very low.
You are running Raid1 on SSD CPG, RAID5 will most likely provide both better performance and more capacity.


I would review multiple things. 28x 10k drives are not able to provide a lot of performance. They have a «safe iops» of 150. How much iops are they doing when you see issues.
In CLI, «statpd -rw», «statvv -rw -ni», «statvlun -ni -rw» and «statvlun -hostsum -ni -rw» should give you some pointers. If statvlun is high and statvv is low, your issue is 99% host or fabric issue. With and without hostsum can help you understand if it is volume or host related. If statvv is high, the latency is due to something internal on the 3PAR, if statpd latency is high, it is most likely a backend issue (not enough disks/SSD). Always remember that read iops (vlun and vv) is mostly read from disk (around 5-10msec) while write should hit cache (1-ish msec).

Re: 3Par Rookie - Need assistance in troubleshooting please.

Posted: Wed Jun 27, 2018 1:05 pm
by walter_white
MammaGutt wrote:I’m just throwing multiple stuff out there.

Note:
3.3.3.612 isn’t a valid version. The latest is 3.3.1.x
If you have 3.2.2 MU4 or later and a valid support contract you can enable vcenter integration for Infosight. This could be extremely valuable in this troubleshooting.
You have allocated only 250 GB of SSD capacity to SSD tier in AO, this is very low.
You are running Raid1 on SSD CPG, RAID5 will most likely provide both better performance and more capacity.


I would review multiple things. 28x 10k drives are not able to provide a lot of performance. They have a «safe iops» of 150. How much iops are they doing when you see issues.
In CLI, «statpd -rw», «statvv -rw -ni», «statvlun -ni -rw» and «statvlun -hostsum -ni -rw» should give you some pointers. If statvlun is high and statvv is low, your issue is 99% host or fabric issue. With and without hostsum can help you understand if it is volume or host related. If statvv is high, the latency is due to something internal on the 3PAR, if statpd latency is high, it is most likely a backend issue (not enough disks/SSD). Always remember that read iops (vlun and vv) is mostly read from disk (around 5-10msec) while write should hit cache (1-ish msec).


Sorry, typo.. OS is 3.2.2.612 (MU4)+P56,P58,P59,P73,P84,P85,P87 ..

I thought we needed another version up to get vCenter integration into Infosight?

We have a request into HP to get the upgrades scheduled.. That's what I'm waiting for is the ability to see this helpful info in InfoSight, but we need to get upgraded first.. :(

Is the AO even running since I don't see it in any schedule? Sorry, not sure how it works..

Also.. Isn't service time over 500ms super high? I looked at another one of our 3pars in the same time period and it hadn't gone over 20ms for the entire day.. It had about 25VMs, RAID5 CPG, 10K FC but with 128 drives..

Thanks

Re: 3Par Rookie - Need assistance in troubleshooting please.

Posted: Wed Jun 27, 2018 3:04 pm
by MammaGutt
With FC + SSD a proper sized 3PAR should IMo average at 5msec or better.

Infosight requires SP4.4 MU7 (requires 3.2.2 MU4 or later) or SP 5.0.3 (requires 3.3.1).

AO requires a schedule or it needs to be run manually to do anything.

500 msec is super high! Anything above 10-15 msec should be investigated.

Just checking, you have set the SATP rule listed in the 3PAR Vmware implementation guide? If you don’t have round robin, a lot of wierdness can happen.

Re: 3Par Rookie - Need assistance in troubleshooting please.

Posted: Thu Jun 28, 2018 7:44 am
by walter_white
MammaGutt wrote:With FC + SSD a proper sized 3PAR should IMo average at 5msec or better.

Infosight requires SP4.4 MU7 (requires 3.2.2 MU4 or later) or SP 5.0.3 (requires 3.3.1).

AO requires a schedule or it needs to be run manually to do anything.

500 msec is super high! Anything above 10-15 msec should be investigated.

Just checking, you have set the SATP rule listed in the 3PAR Vmware implementation guide? If you don’t have round robin, a lot of wierdness can happen.


I spot checked a few of my hosts/datastores and all of them have Most Recently Used (VMware) / VMW_SATP_ALUA..

I have 11 3Pars and around 100 ESXi hosts and all appear to be set like this.. The only place I'm seeing this I/O issue though is one of them..

That being said, it looks like best practice via the 3Par guide is to have all of them set to Round Robin..

Re: 3Par Rookie - Need assistance in troubleshooting please.

Posted: Thu Jun 28, 2018 8:51 am
by ailean
walter_white wrote:
MammaGutt wrote:With FC + SSD a proper sized 3PAR should IMo average at 5msec or better.

Infosight requires SP4.4 MU7 (requires 3.2.2 MU4 or later) or SP 5.0.3 (requires 3.3.1).

AO requires a schedule or it needs to be run manually to do anything.

500 msec is super high! Anything above 10-15 msec should be investigated.

Just checking, you have set the SATP rule listed in the 3PAR Vmware implementation guide? If you don’t have round robin, a lot of wierdness can happen.


I spot checked a few of my hosts/datastores and all of them have Most Recently Used (VMware) / VMW_SATP_ALUA..

I have 11 3Pars and around 100 ESXi hosts and all appear to be set like this.. The only place I'm seeing this I/O issue though is one of them..

That being said, it looks like best practice via the 3Par guide is to have all of them set to Round Robin..


It's like everyone deciding to not change lanes once on the motorway, if they all happen to start in the same lane one day it's chaos, other days they might happen to pick different ones and it's fine. Round robin with I think the recommended setting of 1 puts every IO in a different lane so less queues on average.

I think the recommendation for the move to RR happened over 6 years back, so if you've had 3PAR since the F & T models that might be where the other setting came from (we went round balancing the recent used paths manually originally ;) ).

Re: 3Par Rookie - Need assistance in troubleshooting please.

Posted: Thu Jun 28, 2018 1:02 pm
by Proc_rqrd
ailean wrote:
walter_white wrote:
MammaGutt wrote:With FC + SSD a proper sized 3PAR should IMo average at 5msec or better.

Infosight requires SP4.4 MU7 (requires 3.2.2 MU4 or later) or SP 5.0.3 (requires 3.3.1).

AO requires a schedule or it needs to be run manually to do anything.

500 msec is super high! Anything above 10-15 msec should be investigated.

Just checking, you have set the SATP rule listed in the 3PAR Vmware implementation guide? If you don’t have round robin, a lot of wierdness can happen.


I spot checked a few of my hosts/datastores and all of them have Most Recently Used (VMware) / VMW_SATP_ALUA..

I have 11 3Pars and around 100 ESXi hosts and all appear to be set like this.. The only place I'm seeing this I/O issue though is one of them..

That being said, it looks like best practice via the 3Par guide is to have all of them set to Round Robin..


It's like everyone deciding to not change lanes once on the motorway, if they all happen to start in the same lane one day it's chaos, other days they might happen to pick different ones and it's fine. Round robin with I think the recommended setting of 1 puts every IO in a different lane so less queues on average.

I think the recommendation for the move to RR happened over 6 years back, so if you've had 3PAR since the F & T models that might be where the other setting came from (we went round balancing the recent used paths manually originally ;) ).

correct
per current 3par vmware best practices a custom satp rule.
per esxi host:
esxcli storage nmp satp rule add -s "VMW_SATP_ALUA" -P "VMW_PSP_RR" -O iops=1 -c "tpgs_on" -V "3PARdata" -M "VV" -e "HP 3PAR Custom ALUA Rule"

verify:
esxcli storage nmp satp rule list | grep -i 3PAR

a reboot is required for existing volumes to take the new rule, and all new volumes from a 3par will fall into this satp. RR with iops of 1.


per other recommendations, MammaGutt has solid advice.
AFC is a good addition, and if you have no SSD VV's, then no growth limit on the AO for warm vv's chunks heading to SSD.
also...if your not wide striped or optimized, the statpd as mentioned may show one extremely warm PD running your virtual party.
statvlun vs statvv also helps if they are very different readings. id also check showhost -lesb incase any of your hosts are spitting errors on their target ports. statcmp vs statcache as well.

Re: 3Par Rookie - Need assistance in troubleshooting please.

Posted: Thu Jun 28, 2018 3:06 pm
by MammaGutt
walter_white wrote:
I spot checked a few of my hosts/datastores and all of them have Most Recently Used (VMware) / VMW_SATP_ALUA..

I have 11 3Pars and around 100 ESXi hosts and all appear to be set like this.. The only place I'm seeing this I/O issue though is one of them..

That being said, it looks like best practice via the 3Par guide is to have all of them set to Round Robin..


Just adding to the rest of the guys here, if you use MRU and you have a host experiencing issues I can guarantee you that the host with high latency is not the one causing the issue. That’s the huge PITA with MRU.... it’s never the host which is causing the issue that are experiencing it. And all you need is one host with MRU to mess things up for every other hosts using the same host ports. So consistency is important.