3Par Rookie - Need assistance in troubleshooting please..

walter_white
Posts: 42
Joined: Wed Nov 08, 2017 8:57 am

3Par Rookie - Need assistance in troubleshooting please..

Post by walter_white »

I know these types of questions are difficult to troubleshoot without knowing all the specifics of the environment but I'm hoping someone can walk me through it.. We have a site that has some VMs that aren't performing very good.. Very sluggish and it's not compute (CPU/MEM).. I'm thinking it's storage and hoping you guys can assist and verify..

I've attached some screenshots from StoreServ..

    The san is a 7200 running 3.3.3.612.
    The 7200 has 36 total drives.. (8) 150K SSDs and (28) FC 10Ks
    The Virtual Volume that the VM I've focused on is in a RAID5 CPG..
    The CPG has 41 VV's..

Can anyone assist, please?

Thanks for your time!
Attachments
VV Performance - 24 Hours
VV Performance - 24 Hours
2018-06-27_10-57-35.jpg (151.26 KiB) Viewed 31675 times
User avatar
BryanW
Posts: 71
Joined: Sat May 03, 2014 2:01 pm
Location: Dallas, TX

Re: 3Par Rookie - Need assistance in troubleshooting please.

Post by BryanW »

Are you using AO, AFC or both for the 8 SSDs?
Was AO running when you saw the latency?

Also the first thing I would look for when there is unexplained LD/VV latency is run the "Physical Drive Compare by Performance" report for the same period and look for a failing physical disk

Absent a failing PD, it could be a few of things - reply with what you see
Bryan W
Senior Architect/Manager of System Infrastructure, Dallas TX
https://www.linkedin.com/in/bryanlwhite
walter_white
Posts: 42
Joined: Wed Nov 08, 2017 8:57 am

Re: 3Par Rookie - Need assistance in troubleshooting please.

Post by walter_white »

BryanW wrote:Are you using AO, AFC or both for the 8 SSDs?
Was AO running when you saw the latency?

Also the first thing I would look for when there is unexplained LD/VV latency is run the "Physical Drive Compare by Performance" report for the same period and look for a failing physical disk

Absent a failing PD, it could be a few of things - reply with what you see


Thanks so much for the reply! Greatly appreciated!!

Screenshots below.. Looks like AFC is off.. AO, I'm not sure.. It looks like it's possibly on but I don't see anything under schedules??

As far as the "Physical Drive Compare by Performance" report.. I've generated the report but after 30 minutes it's still spinning saying "Loading.."
Attachments
AFC
AFC
AFC.jpg (93.15 KiB) Viewed 31667 times
AO - 3
AO - 3
AO3.jpg (34.71 KiB) Viewed 31667 times
AO - 2
AO - 2
AO2.jpg (114.54 KiB) Viewed 31667 times
AO - 1
AO - 1
AO1.jpg (106.19 KiB) Viewed 31667 times
MammaGutt
Posts: 1577
Joined: Mon Sep 21, 2015 2:11 pm
Location: Europe

Re: 3Par Rookie - Need assistance in troubleshooting please.

Post by MammaGutt »

I’m just throwing multiple stuff out there.

Note:
3.3.3.612 isn’t a valid version. The latest is 3.3.1.x
If you have 3.2.2 MU4 or later and a valid support contract you can enable vcenter integration for Infosight. This could be extremely valuable in this troubleshooting.
You have allocated only 250 GB of SSD capacity to SSD tier in AO, this is very low.
You are running Raid1 on SSD CPG, RAID5 will most likely provide both better performance and more capacity.


I would review multiple things. 28x 10k drives are not able to provide a lot of performance. They have a «safe iops» of 150. How much iops are they doing when you see issues.
In CLI, «statpd -rw», «statvv -rw -ni», «statvlun -ni -rw» and «statvlun -hostsum -ni -rw» should give you some pointers. If statvlun is high and statvv is low, your issue is 99% host or fabric issue. With and without hostsum can help you understand if it is volume or host related. If statvv is high, the latency is due to something internal on the 3PAR, if statpd latency is high, it is most likely a backend issue (not enough disks/SSD). Always remember that read iops (vlun and vv) is mostly read from disk (around 5-10msec) while write should hit cache (1-ish msec).
The views and opinions expressed are my own and do not necessarily reflect those of my current or previous employers.
walter_white
Posts: 42
Joined: Wed Nov 08, 2017 8:57 am

Re: 3Par Rookie - Need assistance in troubleshooting please.

Post by walter_white »

MammaGutt wrote:I’m just throwing multiple stuff out there.

Note:
3.3.3.612 isn’t a valid version. The latest is 3.3.1.x
If you have 3.2.2 MU4 or later and a valid support contract you can enable vcenter integration for Infosight. This could be extremely valuable in this troubleshooting.
You have allocated only 250 GB of SSD capacity to SSD tier in AO, this is very low.
You are running Raid1 on SSD CPG, RAID5 will most likely provide both better performance and more capacity.


I would review multiple things. 28x 10k drives are not able to provide a lot of performance. They have a «safe iops» of 150. How much iops are they doing when you see issues.
In CLI, «statpd -rw», «statvv -rw -ni», «statvlun -ni -rw» and «statvlun -hostsum -ni -rw» should give you some pointers. If statvlun is high and statvv is low, your issue is 99% host or fabric issue. With and without hostsum can help you understand if it is volume or host related. If statvv is high, the latency is due to something internal on the 3PAR, if statpd latency is high, it is most likely a backend issue (not enough disks/SSD). Always remember that read iops (vlun and vv) is mostly read from disk (around 5-10msec) while write should hit cache (1-ish msec).


Sorry, typo.. OS is 3.2.2.612 (MU4)+P56,P58,P59,P73,P84,P85,P87 ..

I thought we needed another version up to get vCenter integration into Infosight?

We have a request into HP to get the upgrades scheduled.. That's what I'm waiting for is the ability to see this helpful info in InfoSight, but we need to get upgraded first.. :(

Is the AO even running since I don't see it in any schedule? Sorry, not sure how it works..

Also.. Isn't service time over 500ms super high? I looked at another one of our 3pars in the same time period and it hadn't gone over 20ms for the entire day.. It had about 25VMs, RAID5 CPG, 10K FC but with 128 drives..

Thanks
MammaGutt
Posts: 1577
Joined: Mon Sep 21, 2015 2:11 pm
Location: Europe

Re: 3Par Rookie - Need assistance in troubleshooting please.

Post by MammaGutt »

With FC + SSD a proper sized 3PAR should IMo average at 5msec or better.

Infosight requires SP4.4 MU7 (requires 3.2.2 MU4 or later) or SP 5.0.3 (requires 3.3.1).

AO requires a schedule or it needs to be run manually to do anything.

500 msec is super high! Anything above 10-15 msec should be investigated.

Just checking, you have set the SATP rule listed in the 3PAR Vmware implementation guide? If you don’t have round robin, a lot of wierdness can happen.
The views and opinions expressed are my own and do not necessarily reflect those of my current or previous employers.
walter_white
Posts: 42
Joined: Wed Nov 08, 2017 8:57 am

Re: 3Par Rookie - Need assistance in troubleshooting please.

Post by walter_white »

MammaGutt wrote:With FC + SSD a proper sized 3PAR should IMo average at 5msec or better.

Infosight requires SP4.4 MU7 (requires 3.2.2 MU4 or later) or SP 5.0.3 (requires 3.3.1).

AO requires a schedule or it needs to be run manually to do anything.

500 msec is super high! Anything above 10-15 msec should be investigated.

Just checking, you have set the SATP rule listed in the 3PAR Vmware implementation guide? If you don’t have round robin, a lot of wierdness can happen.


I spot checked a few of my hosts/datastores and all of them have Most Recently Used (VMware) / VMW_SATP_ALUA..

I have 11 3Pars and around 100 ESXi hosts and all appear to be set like this.. The only place I'm seeing this I/O issue though is one of them..

That being said, it looks like best practice via the 3Par guide is to have all of them set to Round Robin..
ailean
Posts: 392
Joined: Wed Nov 09, 2011 12:01 pm

Re: 3Par Rookie - Need assistance in troubleshooting please.

Post by ailean »

walter_white wrote:
MammaGutt wrote:With FC + SSD a proper sized 3PAR should IMo average at 5msec or better.

Infosight requires SP4.4 MU7 (requires 3.2.2 MU4 or later) or SP 5.0.3 (requires 3.3.1).

AO requires a schedule or it needs to be run manually to do anything.

500 msec is super high! Anything above 10-15 msec should be investigated.

Just checking, you have set the SATP rule listed in the 3PAR Vmware implementation guide? If you don’t have round robin, a lot of wierdness can happen.


I spot checked a few of my hosts/datastores and all of them have Most Recently Used (VMware) / VMW_SATP_ALUA..

I have 11 3Pars and around 100 ESXi hosts and all appear to be set like this.. The only place I'm seeing this I/O issue though is one of them..

That being said, it looks like best practice via the 3Par guide is to have all of them set to Round Robin..


It's like everyone deciding to not change lanes once on the motorway, if they all happen to start in the same lane one day it's chaos, other days they might happen to pick different ones and it's fine. Round robin with I think the recommended setting of 1 puts every IO in a different lane so less queues on average.

I think the recommendation for the move to RR happened over 6 years back, so if you've had 3PAR since the F & T models that might be where the other setting came from (we went round balancing the recent used paths manually originally ;) ).
Proc_rqrd
Posts: 28
Joined: Thu Feb 04, 2016 4:12 pm

Re: 3Par Rookie - Need assistance in troubleshooting please.

Post by Proc_rqrd »

ailean wrote:
walter_white wrote:
MammaGutt wrote:With FC + SSD a proper sized 3PAR should IMo average at 5msec or better.

Infosight requires SP4.4 MU7 (requires 3.2.2 MU4 or later) or SP 5.0.3 (requires 3.3.1).

AO requires a schedule or it needs to be run manually to do anything.

500 msec is super high! Anything above 10-15 msec should be investigated.

Just checking, you have set the SATP rule listed in the 3PAR Vmware implementation guide? If you don’t have round robin, a lot of wierdness can happen.


I spot checked a few of my hosts/datastores and all of them have Most Recently Used (VMware) / VMW_SATP_ALUA..

I have 11 3Pars and around 100 ESXi hosts and all appear to be set like this.. The only place I'm seeing this I/O issue though is one of them..

That being said, it looks like best practice via the 3Par guide is to have all of them set to Round Robin..


It's like everyone deciding to not change lanes once on the motorway, if they all happen to start in the same lane one day it's chaos, other days they might happen to pick different ones and it's fine. Round robin with I think the recommended setting of 1 puts every IO in a different lane so less queues on average.

I think the recommendation for the move to RR happened over 6 years back, so if you've had 3PAR since the F & T models that might be where the other setting came from (we went round balancing the recent used paths manually originally ;) ).

correct
per current 3par vmware best practices a custom satp rule.
per esxi host:
esxcli storage nmp satp rule add -s "VMW_SATP_ALUA" -P "VMW_PSP_RR" -O iops=1 -c "tpgs_on" -V "3PARdata" -M "VV" -e "HP 3PAR Custom ALUA Rule"

verify:
esxcli storage nmp satp rule list | grep -i 3PAR

a reboot is required for existing volumes to take the new rule, and all new volumes from a 3par will fall into this satp. RR with iops of 1.


per other recommendations, MammaGutt has solid advice.
AFC is a good addition, and if you have no SSD VV's, then no growth limit on the AO for warm vv's chunks heading to SSD.
also...if your not wide striped or optimized, the statpd as mentioned may show one extremely warm PD running your virtual party.
statvlun vs statvv also helps if they are very different readings. id also check showhost -lesb incase any of your hosts are spitting errors on their target ports. statcmp vs statcache as well.
MammaGutt
Posts: 1577
Joined: Mon Sep 21, 2015 2:11 pm
Location: Europe

Re: 3Par Rookie - Need assistance in troubleshooting please.

Post by MammaGutt »

walter_white wrote:
I spot checked a few of my hosts/datastores and all of them have Most Recently Used (VMware) / VMW_SATP_ALUA..

I have 11 3Pars and around 100 ESXi hosts and all appear to be set like this.. The only place I'm seeing this I/O issue though is one of them..

That being said, it looks like best practice via the 3Par guide is to have all of them set to Round Robin..


Just adding to the rest of the guys here, if you use MRU and you have a host experiencing issues I can guarantee you that the host with high latency is not the one causing the issue. That’s the huge PITA with MRU.... it’s never the host which is causing the issue that are experiencing it. And all you need is one host with MRU to mess things up for every other hosts using the same host ports. So consistency is important.
The views and opinions expressed are my own and do not necessarily reflect those of my current or previous employers.
Post Reply