3PAR Users Group

A Storage Administrator Community




Post new topic Reply to topic  [ 15 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: 3Par Rookie - Need assistance in troubleshooting please..
PostPosted: Wed Jun 27, 2018 10:12 am 

Joined: Wed Nov 08, 2017 8:57 am
Posts: 24
I know these types of questions are difficult to troubleshoot without knowing all the specifics of the environment but I'm hoping someone can walk me through it.. We have a site that has some VMs that aren't performing very good.. Very sluggish and it's not compute (CPU/MEM).. I'm thinking it's storage and hoping you guys can assist and verify..

I've attached some screenshots from StoreServ..

    The san is a 7200 running 3.3.3.612.
    The 7200 has 36 total drives.. (8) 150K SSDs and (28) FC 10Ks
    The Virtual Volume that the VM I've focused on is in a RAID5 CPG..
    The CPG has 41 VV's..

Can anyone assist, please?

Thanks for your time!


Attachments:
File comment: VV Performance - 24 Hours
2018-06-27_10-57-35.jpg
2018-06-27_10-57-35.jpg [ 151.26 KiB | Viewed 922 times ]
Top
 Profile  
Reply with quote  
 Post subject: Re: 3Par Rookie - Need assistance in troubleshooting please.
PostPosted: Wed Jun 27, 2018 11:14 am 
User avatar

Joined: Sat May 03, 2014 2:01 pm
Posts: 69
Location: Dallas, TX
Are you using AO, AFC or both for the 8 SSDs?
Was AO running when you saw the latency?

Also the first thing I would look for when there is unexplained LD/VV latency is run the "Physical Drive Compare by Performance" report for the same period and look for a failing physical disk

Absent a failing PD, it could be a few of things - reply with what you see

_________________
Bryan W
Senior Architect/Manager of System Infrastructure, Dallas TX
https://www.linkedin.com/in/bryanlwhite


Top
 Profile  
Reply with quote  
 Post subject: Re: 3Par Rookie - Need assistance in troubleshooting please.
PostPosted: Wed Jun 27, 2018 12:09 pm 

Joined: Wed Nov 08, 2017 8:57 am
Posts: 24
BryanW wrote:
Are you using AO, AFC or both for the 8 SSDs?
Was AO running when you saw the latency?

Also the first thing I would look for when there is unexplained LD/VV latency is run the "Physical Drive Compare by Performance" report for the same period and look for a failing physical disk

Absent a failing PD, it could be a few of things - reply with what you see


Thanks so much for the reply! Greatly appreciated!!

Screenshots below.. Looks like AFC is off.. AO, I'm not sure.. It looks like it's possibly on but I don't see anything under schedules??

As far as the "Physical Drive Compare by Performance" report.. I've generated the report but after 30 minutes it's still spinning saying "Loading.."


Attachments:
File comment: AFC
AFC.jpg
AFC.jpg [ 93.15 KiB | Viewed 914 times ]
File comment: AO - 3
AO3.jpg
AO3.jpg [ 34.71 KiB | Viewed 914 times ]
File comment: AO - 2
AO2.jpg
AO2.jpg [ 114.54 KiB | Viewed 914 times ]
File comment: AO - 1
AO1.jpg
AO1.jpg [ 106.19 KiB | Viewed 914 times ]
Top
 Profile  
Reply with quote  
 Post subject: Re: 3Par Rookie - Need assistance in troubleshooting please.
PostPosted: Wed Jun 27, 2018 1:02 pm 

Joined: Mon Sep 21, 2015 2:11 pm
Posts: 702
Location: Europe
I’m just throwing multiple stuff out there.

Note:
3.3.3.612 isn’t a valid version. The latest is 3.3.1.x
If you have 3.2.2 MU4 or later and a valid support contract you can enable vcenter integration for Infosight. This could be extremely valuable in this troubleshooting.
You have allocated only 250 GB of SSD capacity to SSD tier in AO, this is very low.
You are running Raid1 on SSD CPG, RAID5 will most likely provide both better performance and more capacity.


I would review multiple things. 28x 10k drives are not able to provide a lot of performance. They have a «safe iops» of 150. How much iops are they doing when you see issues.
In CLI, «statpd -rw», «statvv -rw -ni», «statvlun -ni -rw» and «statvlun -hostsum -ni -rw» should give you some pointers. If statvlun is high and statvv is low, your issue is 99% host or fabric issue. With and without hostsum can help you understand if it is volume or host related. If statvv is high, the latency is due to something internal on the 3PAR, if statpd latency is high, it is most likely a backend issue (not enough disks/SSD). Always remember that read iops (vlun and vv) is mostly read from disk (around 5-10msec) while write should hit cache (1-ish msec).


Top
 Profile  
Reply with quote  
 Post subject: Re: 3Par Rookie - Need assistance in troubleshooting please.
PostPosted: Wed Jun 27, 2018 1:05 pm 

Joined: Wed Nov 08, 2017 8:57 am
Posts: 24
MammaGutt wrote:
I’m just throwing multiple stuff out there.

Note:
3.3.3.612 isn’t a valid version. The latest is 3.3.1.x
If you have 3.2.2 MU4 or later and a valid support contract you can enable vcenter integration for Infosight. This could be extremely valuable in this troubleshooting.
You have allocated only 250 GB of SSD capacity to SSD tier in AO, this is very low.
You are running Raid1 on SSD CPG, RAID5 will most likely provide both better performance and more capacity.


I would review multiple things. 28x 10k drives are not able to provide a lot of performance. They have a «safe iops» of 150. How much iops are they doing when you see issues.
In CLI, «statpd -rw», «statvv -rw -ni», «statvlun -ni -rw» and «statvlun -hostsum -ni -rw» should give you some pointers. If statvlun is high and statvv is low, your issue is 99% host or fabric issue. With and without hostsum can help you understand if it is volume or host related. If statvv is high, the latency is due to something internal on the 3PAR, if statpd latency is high, it is most likely a backend issue (not enough disks/SSD). Always remember that read iops (vlun and vv) is mostly read from disk (around 5-10msec) while write should hit cache (1-ish msec).


Sorry, typo.. OS is 3.2.2.612 (MU4)+P56,P58,P59,P73,P84,P85,P87 ..

I thought we needed another version up to get vCenter integration into Infosight?

We have a request into HP to get the upgrades scheduled.. That's what I'm waiting for is the ability to see this helpful info in InfoSight, but we need to get upgraded first.. :(

Is the AO even running since I don't see it in any schedule? Sorry, not sure how it works..

Also.. Isn't service time over 500ms super high? I looked at another one of our 3pars in the same time period and it hadn't gone over 20ms for the entire day.. It had about 25VMs, RAID5 CPG, 10K FC but with 128 drives..

Thanks


Top
 Profile  
Reply with quote  
 Post subject: Re: 3Par Rookie - Need assistance in troubleshooting please.
PostPosted: Wed Jun 27, 2018 3:04 pm 

Joined: Mon Sep 21, 2015 2:11 pm
Posts: 702
Location: Europe
With FC + SSD a proper sized 3PAR should IMo average at 5msec or better.

Infosight requires SP4.4 MU7 (requires 3.2.2 MU4 or later) or SP 5.0.3 (requires 3.3.1).

AO requires a schedule or it needs to be run manually to do anything.

500 msec is super high! Anything above 10-15 msec should be investigated.

Just checking, you have set the SATP rule listed in the 3PAR Vmware implementation guide? If you don’t have round robin, a lot of wierdness can happen.


Top
 Profile  
Reply with quote  
 Post subject: Re: 3Par Rookie - Need assistance in troubleshooting please.
PostPosted: Thu Jun 28, 2018 7:44 am 

Joined: Wed Nov 08, 2017 8:57 am
Posts: 24
MammaGutt wrote:
With FC + SSD a proper sized 3PAR should IMo average at 5msec or better.

Infosight requires SP4.4 MU7 (requires 3.2.2 MU4 or later) or SP 5.0.3 (requires 3.3.1).

AO requires a schedule or it needs to be run manually to do anything.

500 msec is super high! Anything above 10-15 msec should be investigated.

Just checking, you have set the SATP rule listed in the 3PAR Vmware implementation guide? If you don’t have round robin, a lot of wierdness can happen.


I spot checked a few of my hosts/datastores and all of them have Most Recently Used (VMware) / VMW_SATP_ALUA..

I have 11 3Pars and around 100 ESXi hosts and all appear to be set like this.. The only place I'm seeing this I/O issue though is one of them..

That being said, it looks like best practice via the 3Par guide is to have all of them set to Round Robin..


Top
 Profile  
Reply with quote  
 Post subject: Re: 3Par Rookie - Need assistance in troubleshooting please.
PostPosted: Thu Jun 28, 2018 8:51 am 

Joined: Wed Nov 09, 2011 12:01 pm
Posts: 272
walter_white wrote:
MammaGutt wrote:
With FC + SSD a proper sized 3PAR should IMo average at 5msec or better.

Infosight requires SP4.4 MU7 (requires 3.2.2 MU4 or later) or SP 5.0.3 (requires 3.3.1).

AO requires a schedule or it needs to be run manually to do anything.

500 msec is super high! Anything above 10-15 msec should be investigated.

Just checking, you have set the SATP rule listed in the 3PAR Vmware implementation guide? If you don’t have round robin, a lot of wierdness can happen.


I spot checked a few of my hosts/datastores and all of them have Most Recently Used (VMware) / VMW_SATP_ALUA..

I have 11 3Pars and around 100 ESXi hosts and all appear to be set like this.. The only place I'm seeing this I/O issue though is one of them..

That being said, it looks like best practice via the 3Par guide is to have all of them set to Round Robin..


It's like everyone deciding to not change lanes once on the motorway, if they all happen to start in the same lane one day it's chaos, other days they might happen to pick different ones and it's fine. Round robin with I think the recommended setting of 1 puts every IO in a different lane so less queues on average.

I think the recommendation for the move to RR happened over 6 years back, so if you've had 3PAR since the F & T models that might be where the other setting came from (we went round balancing the recent used paths manually originally ;) ).


Top
 Profile  
Reply with quote  
 Post subject: Re: 3Par Rookie - Need assistance in troubleshooting please.
PostPosted: Thu Jun 28, 2018 1:02 pm 

Joined: Thu Feb 04, 2016 4:12 pm
Posts: 19
ailean wrote:
walter_white wrote:
MammaGutt wrote:
With FC + SSD a proper sized 3PAR should IMo average at 5msec or better.

Infosight requires SP4.4 MU7 (requires 3.2.2 MU4 or later) or SP 5.0.3 (requires 3.3.1).

AO requires a schedule or it needs to be run manually to do anything.

500 msec is super high! Anything above 10-15 msec should be investigated.

Just checking, you have set the SATP rule listed in the 3PAR Vmware implementation guide? If you don’t have round robin, a lot of wierdness can happen.


I spot checked a few of my hosts/datastores and all of them have Most Recently Used (VMware) / VMW_SATP_ALUA..

I have 11 3Pars and around 100 ESXi hosts and all appear to be set like this.. The only place I'm seeing this I/O issue though is one of them..

That being said, it looks like best practice via the 3Par guide is to have all of them set to Round Robin..


It's like everyone deciding to not change lanes once on the motorway, if they all happen to start in the same lane one day it's chaos, other days they might happen to pick different ones and it's fine. Round robin with I think the recommended setting of 1 puts every IO in a different lane so less queues on average.

I think the recommendation for the move to RR happened over 6 years back, so if you've had 3PAR since the F & T models that might be where the other setting came from (we went round balancing the recent used paths manually originally ;) ).

correct
per current 3par vmware best practices a custom satp rule.
per esxi host:
esxcli storage nmp satp rule add -s "VMW_SATP_ALUA" -P "VMW_PSP_RR" -O iops=1 -c "tpgs_on" -V "3PARdata" -M "VV" -e "HP 3PAR Custom ALUA Rule"

verify:
esxcli storage nmp satp rule list | grep -i 3PAR

a reboot is required for existing volumes to take the new rule, and all new volumes from a 3par will fall into this satp. RR with iops of 1.


per other recommendations, MammaGutt has solid advice.
AFC is a good addition, and if you have no SSD VV's, then no growth limit on the AO for warm vv's chunks heading to SSD.
also...if your not wide striped or optimized, the statpd as mentioned may show one extremely warm PD running your virtual party.
statvlun vs statvv also helps if they are very different readings. id also check showhost -lesb incase any of your hosts are spitting errors on their target ports. statcmp vs statcache as well.


Top
 Profile  
Reply with quote  
 Post subject: Re: 3Par Rookie - Need assistance in troubleshooting please.
PostPosted: Thu Jun 28, 2018 3:06 pm 

Joined: Mon Sep 21, 2015 2:11 pm
Posts: 702
Location: Europe
walter_white wrote:

I spot checked a few of my hosts/datastores and all of them have Most Recently Used (VMware) / VMW_SATP_ALUA..

I have 11 3Pars and around 100 ESXi hosts and all appear to be set like this.. The only place I'm seeing this I/O issue though is one of them..

That being said, it looks like best practice via the 3Par guide is to have all of them set to Round Robin..


Just adding to the rest of the guys here, if you use MRU and you have a host experiencing issues I can guarantee you that the host with high latency is not the one causing the issue. That’s the huge PITA with MRU.... it’s never the host which is causing the issue that are experiencing it. And all you need is one host with MRU to mess things up for every other hosts using the same host ports. So consistency is important.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 15 posts ]  Go to page 1, 2  Next


Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group | DVGFX2 by: Matt