HPE Storage Users Group

A Storage Administrator Community




Post new topic Reply to topic  [ 13 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: High storage latency 3Par 8200 and Vmware
PostPosted: Tue Sep 04, 2018 5:31 am 

Joined: Tue Sep 04, 2018 4:53 am
Posts: 6
Hi all, new to the forum and was hoping i could get some advice on a storage performance issue. I'm new to the 3par but do have some storage experience from working on P9500's and XP1024's in the past.

Setup is as follows. HP Synergy Frame with 6 ESX (v6.7) blades and a physical Oracle DB blade connected via iSCSI to a 3Par 8200 with 12x 1TB SSD. Synergy is updated to latest SPP release and we are currently waiting on a slot from HP to update the 3par to latest firmware revision (currently 3.2.2.709). I have set MPIO to Round robin for all datastores.
I'm experiencing significant latency across all datastores at present in a rapidly growing test environment (Currently running at 80+ VM's) . The VM's are not seeing any heavy usage at present and IOP's is low as we are still in the environment build phase so have not let it loose to the wider user base.
I'm seeing individual VM's storage latency peak at 250ms and am getting regular alerts from VMware. From the 3par i can see service times far higher than i would expect from a flash array. Even the physical Linux host is seeing average latency around 15ms(read), 30ms(write) on its attached LUN's.
Attachment:
volperf.jpg
volperf.jpg [ 145.37 KiB | Viewed 26110 times ]


So far the only changes i have made is to edit the ESX default IOP's limit to 1 instead of defualt 1000, this seems to have made a negligible difference. I have found multiple references suggesting turning off Delayed ACK in VMware for each host but can find no definitive recommendation from HP to do so, would that be a recommended next step?


Top
 Profile  
Reply with quote  
 Post subject: Re: High storage latency 3Par 8200 and Vmware
PostPosted: Tue Sep 04, 2018 6:02 am 

Joined: Wed Nov 09, 2011 12:01 pm
Posts: 392
Not using iSCSI here but generally you might want to look at the array CPU, Port and Physical Disk performance graphs during those peak times to see what the array is up to.


Top
 Profile  
Reply with quote  
 Post subject: Re: High storage latency 3Par 8200 and Vmware
PostPosted: Tue Sep 04, 2018 7:10 am 

Joined: Wed Nov 19, 2014 5:14 am
Posts: 505
I've seen something similar on a few VMware iSCSI implementations and it does seem to be very much stack specific. Statvlun measures the host round trip time as well so if it doesn't receive an acknowledgement from the host in a timely manner then it could be reporting an artificially high latency, so maybe worth looking at what the host is reporting and comparing the two.

It seems to be much more prevalent when the system is idle i.e. not much real traffic coming from the hosts. There are numerous VMware articles citing delack as the cause across many arrays from lots of different vendors and although the issue seems to be pinned on the storage array, I'm not entirely convinced.

See this article.

https://vnote42.wordpress.com/2018/06/06/3par-iscsi-resolve-high-write-latency-on-esxi-hosts/


Top
 Profile  
Reply with quote  
 Post subject: Re: High storage latency 3Par 8200 and Vmware
PostPosted: Fri Sep 07, 2018 5:04 am 

Joined: Tue Sep 04, 2018 4:53 am
Posts: 6
Firstly thanks for the advice so far. I have made the delayedACK change on all ESX hosts. This turned out to be rather more laborious than you would be led to believe from the various articles. I had to manually delete the iSCSi config files from the hosts, reconfigure port bindings and re-add the paths on the 3par, this details the process i followed:-
https://www.virtual-allan.com/disable-delayed-ack-for-iscsi-on-esxi-not-always-working/

Unfortunately although i have seen an improvement in latency figures, the change has not resulted in the drop i would have hoped for. Question is now where do i look..?

I completed the change at ~12:00 06/09 and from graph below there was a drop in write service times which was averaging 30-40ms down to 15-25ms, read times seem to to have improved marginally (Graph shows host port performance). The latency drops that occur daily at ~03:00 can be attributed to the backup schedule so we can see that latency improves when the bandwidth usage increases.
Attachment:
Capture2.PNG
Capture2.PNG [ 56.5 KiB | Viewed 26020 times ]


Top
 Profile  
Reply with quote  
 Post subject: Re: High storage latency 3Par 8200 and Vmware
PostPosted: Sat Sep 08, 2018 10:23 am 

Joined: Fri Jan 20, 2017 9:39 am
Posts: 58
Hi paulv666,

I've got some of the same equipment as you but connected in different ways. I've got an 8200 on 3.3.1 MU2 (originally 3.2.2 MU4) with deudpe enabled and doing iSCSI and my latency is ok given the number of hosts, volumes and vms but its not connected to the synergy.

I don't have a silver bullet for you but just wanted to toss out some ideas in the hopes that something will help you narrow down the problem:

1) If you have a bad cable, it could be causing high latency because of all the retransmits it might generate. I'm not sure how to check for retransmits on a 3par (my other san has a graph showing this value) but maybe use SSMC and make a realtime report for each port to see if one is obviously worse than the others?

Similarly, maybe run a statport at the 3par command line and see if one port has a higher service time than the others?

2) What interconnect are you using with Synergy? Virtual connect? Switch module? Do you have 2 frames connected together? Maybe traffic flowing through the interconnect is causing an issue if you have multiple frames? (You mentioned 6 hosts so I assume it's just 1 frame so maybe no interconnect). You don't have to answer with details. I know enough to be dangerous, not enough to be helpful :)

3) Trying disabling paths manually in ESXi so all traffic flows down a single path and then turn them back on 1 by 1 to see if latency goes up in case there's an issue with multipathing

4) You're not affected by this note on the synergy release set page are you? Users of HPE Synergy 40Gb F8 Switch Module are not recommended to use Synergy Custom SPP 2018.03.20180628. Contact your account team for further guidance.

5) Are you using a VSS or VDS? If VDS, are you sharing the same uplinks for iscsi and network? If so, are you using network i/o control?

6) Do you have iSCSI port binding set correctly if using a single subnet? If you have two subnets,it's recommended not to use port binding.

7) Disable delayed ack. I know there's nothing in the 3par docs that specifically say to do it but it's almost always a recommended thing to do as and as you discovered, doing it afterr the fact ain't fun :)

When I setup a new host I set my claim rule (iops=x and round robin) before doing anything with iscsi. Then add the IP addresses to the port binding section (I'm running with a single subnet) but don't do a rescan. I modify the delayed ack option on the ip in the dynamic section (could also be at the global level) and then reboot the host. If vmware takes a long time to get past the loading the iscsi stuff during boot I know there's a config mistake (doesn't necessarily mean there will be a performance problem but I know it means long rescan times because it's trying to access storage over a path that isn't valid)

8) If you're using jumbo frames, is it enabled end to end? Simple check I use is go on the ESXi host and ping the iscsi address of the 3par as follows:
Code:
vmkping -I vmk# -s 8972 -d <ip address of san port>
where vmk# is one of your iscsi ports.

9) Are you using software or hardware iscsi? I've only ever used the software iscsi adapter myself.

10) If you have support, you can always open a case and 3par support might have you run a perf script on your SP to gather data from the 3par. If you have phone home enabled, they will get the data and then can analyze it and check for duplicate packets/re-transmits.

11) Could your switch be dropping packets? Is storm controlled disabled?


Top
 Profile  
Reply with quote  
 Post subject: Re: High storage latency 3Par 8200 and Vmware
PostPosted: Mon Sep 10, 2018 9:41 am 

Joined: Tue Sep 04, 2018 4:53 am
Posts: 6
Hi Kyle, Thanks for your detailed response. Ill go through each point:-

1) If you have a bad cable, it could be causing high latency because of all the retransmits it might generate. I'm not sure how to check for retransmits on a 3par (my other san has a graph showing this value) but maybe use SSMC and make a realtime report for each port to see if one is obviously worse than the others?

Good point here. We were shipped the wrong Qsfp's initially so we ended up using what we could find to get the system up and running (Copper connection and a fiber), which do appear to work. We have since had the correct parts delivered so will arranged to get them installed asap. From what i can see the paths appear to be balanced and were not seeing an difference in latency times across the ports.

2) What interconnect are you using with Synergy? Virtual connect? Switch module? Do you have 2 frames connected together? Maybe traffic flowing through the interconnect is causing an issue if you have multiple frames? (You mentioned 6 hosts so I assume it's just 1 frame so maybe no interconnect). You don't have to answer with details. I know enough to be dangerous, not enough to be helpful :)

We're using VC 40gb F8 modules in a single frame.

3) Trying disabling paths manually in ESXi so all traffic flows down a single path and then turn them back on 1 by 1 to see if latency goes up in case there's an issue with multipathing

We had an issue recently with a failed 3par update and we lost 2 paths for around a week. This is what first triggered the visibility of the latency issues in the first place, were were seeing 250ms+.

4) You're not affected by this note on the synergy release set page are you? Users of HPE Synergy 40Gb F8 Switch Module are not recommended to use Synergy Custom SPP 2018.03.20180628. Contact your account team for further guidance.

Ill look into that, we only recently update the SPP to try and address another unrelated VMware issue.

5) Are you using a VSS or VDS? If VDS, are you sharing the same uplinks for iscsi and network? If so, are you using network i/o control?

VSS at present.

6) Do you have iSCSI port binding set correctly if using a single subnet? If you have two subnets,it's recommended not to use port binding.

We are using 2 subnets, ill have to check port binding.

7) Disable delayed ack. I know there's nothing in the 3par docs that specifically say to do it but it's almost always a recommended thing to do as and as you discovered, doing it afterr the fact ain't fun :)

Can confirm DelayedACK is disabled on all hosts.

8) If you're using jumbo frames, is it enabled end to end? Simple check I use is go on the ESXi host and ping the iscsi address of the 3par as follows:
Code:
vmkping -I vmk# -s 8972 -d <ip address of san port>
where vmk# is one of your iscsi ports.

Jumbo frames has been set on the switch layer, the Synergy/3par is still using out of the box 1500 settings at the moment.

9) Are you using software or hardware iscsi? I've only ever used the software iscsi adapter myself.

Hardware iSCSI


10) If you have support, you can always open a case and 3par support might have you run a perf script on your SP to gather data from the 3par. If you have phone home enabled, they will get the data and then can analyze it and check for duplicate packets/re-transmits.

Yes i have raised a call, were a bit stuck at the moment as due to the failed 3par update were on a staging version of SP and HP cant run any advanced diagnostics, were due to be updated later on this month. From the discussions i have had and from looking at outputs the array appears to be performing fine and the issue is likely in the network/vmware area.

11) Could your switch be dropping packets? Is storm controlled disabled?[/quote]
Network team has confirmed that the switch isn't dropping any packets, utilization is tiny with no contention.


Top
 Profile  
Reply with quote  
 Post subject: Re: High storage latency 3Par 8200 and Vmware
PostPosted: Mon Sep 10, 2018 12:24 pm 

Joined: Mon Sep 10, 2018 12:20 pm
Posts: 1
You think that is bad, look at this :lol:


Attachments:
55trillon.png
55trillon.png [ 25.25 KiB | Viewed 25953 times ]
Top
 Profile  
Reply with quote  
 Post subject: Re: High storage latency 3Par 8200 and Vmware
PostPosted: Mon Sep 10, 2018 3:58 pm 

Joined: Tue Feb 04, 2014 4:28 pm
Posts: 8
Hi Paulv666

This is odd,. I recently saw high latency numbers on a 3-tier 3par with AO. This situation was caused by multiple things, but the start of the mess pointed back to VEEAM and software snapshots that were left behind for several months. (VMware snapshots)

Anyway, assuming that's not your problem here, there are a couple things I'd like to add to the discussion:

1. Note that the VC Synergy 40Gb F8 module isn't the same as the Synergy 40Gb F8 Switch. It's just the switch that has the problem with a specific SPP. That's the one that doesn't require the ICM cluster cables between modules on QSFP ports 7 and 8.

2. A little confusion on the configuration of your 3PAR -- the 8200 doesn't have an option for a 1TB SSD, so it might be a good idea to check and report back what it does have installed. Just log in to the 3PAR with SSH and type showpd <enter> at the command line.

3. Sorry if I missed it but did you try increasing the load, with IOMeter or some other IO tool?

4. How do you like that Synergy product? I think it's amazing.

_________________
Jeff Gray
Chief Technologist
MASE MASE MASE MCSE
OneView Whisperer
Arlington Computer Products


Top
 Profile  
Reply with quote  
 Post subject: Re: High storage latency 3Par 8200 and Vmware
PostPosted: Wed Sep 12, 2018 3:41 am 

Joined: Tue Sep 04, 2018 4:53 am
Posts: 6
jgmke wrote:
2. A little confusion on the configuration of your 3PAR -- the 8200 doesn't have an option for a 1TB SSD, so it might be a good idea to check and report back what it does have installed. Just log in to the 3PAR with SSH and type showpd <enter> at the command line.

Yes sorry, a typo by me, they are actually 2TB drives

3. Sorry if I missed it but did you try increasing the load, with IOMeter or some other IO tool?

That's in progress, i'm getting some help with troubleshooting from a HP technical consultant who made the same suggestion so i'm just putting together some servers with IOmeter installed at the moment.

4. How do you like that Synergy product? I think it's amazing.

It's early days at the moment and were still a few teething issues getting it stable. I previously worked extensively with the c7000 so it does look/feel familiar, i do like using Oneview. I'm sure once we get through the problems it'll be great.


Also just as a reference, i have made a few changes to VMware settings based on recommendations regarding the latency issues found online, would be interesting to see if other users also made/recommend these changes.

DelayedACK disabled
LRO disabled (Only did this recently, made no difference)
Round Robin enabled on all Hosts/Datastores, RR IOPS limit changed to 1 from default
Set Queue-Full-Threshold to 4 and queue-full-sample-size to 32 for all Datastore LUNS


Top
 Profile  
Reply with quote  
 Post subject: Re: High storage latency 3Par 8200 and Vmware
PostPosted: Wed Sep 12, 2018 6:11 am 

Joined: Tue Feb 04, 2014 4:28 pm
Posts: 8
Quote:
Yes sorry, a typo by me, they are actually 2TB drives


The 2TB SFF is a 7200 RPM nearline drive, not an SSD. With 12 of those drives, I would expect latency in the 20 to 30 ms range at 1,000 "front end" 8K IOPS with a 50/50 r/w ratio on Raid6. Once you get past the 1000 IOPS mark, I would expect latency to go through the roof.

The tuning you've done will help mitigate problems that could result in 100s of ms of sporadic latency spikes, but the primary cause in this situation is almost certainly related to the drive type itself.

Just to make sure what's what, ssh into the 3PAR and type

Code:
showpd 

this will confirm the drive type, then

Code:
statpd -iter 1 -d 5 -rw


This will show a 5 second sample of the "back end" IO stats for each drive.

From that info we can pretty much confirm that the disks are the root of the problem. If those are 2TB SFF drives, then the original latency graph is not out of line with expectations.

_________________
Jeff Gray
Chief Technologist
MASE MASE MASE MCSE
OneView Whisperer
Arlington Computer Products


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 13 posts ]  Go to page 1, 2  Next


Who is online

Users browsing this forum: Google [Bot] and 46 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group | DVGFX2 by: Matt