I continue to struggle with these latency issues. We've evacuated all NL disk and are only running on FC disk per HP support direction. Will be running tunesys and possibly tunepd as well ( although PD performance looks find with statpd ).
So what's left? I watched it last night/over night before, during, after the event.
Like I said, statvlun will show those critical VLUNs during the critical time with terrible latency, while statpd will show all FC PDs mostly balanced and singled digit latency.
The particular host I'm watching is an AIX host with two hdisks, both running 100% and with high IOP rates and consistently double-digit servqfull...I asked my teammate if adding disks would help balance the load across and he thinks it would just exacerbate the problem by potentially allowing more data through only to cause more problems.
bad VV performance, good PD performance
Re: bad VV performance, good PD performance
Look at the port ulization in 3par and cache as well.
Are you seeing high service times on specific luns? Do you see it specific to read or write operation?
Thanks,
G Kapoor
Are you seeing high service times on specific luns? Do you see it specific to read or write operation?
Thanks,
G Kapoor
Re: bad VV performance, good PD performance
During the backup window when we have issues, 7pm - 2am, I would say high read ( but on our DB server, it would be reading from one hdisk/LUN and write to another hdisk/LUN ). During the day, however, much higher reads.
Looking at Cache performance for node2 and node3, Read Hit % is between 40% and 60% most of the time it looks like, occasional jumps to 80% 90%. Write Hit % during the day around 30%, then mostly falls off during the evening.
Lock blocks is zero.
And Page States, the Clean stat is around 630,000, while everything else is at 100,000 or less most of the time. I do see a bump in the other Page States during the overnight ( as Clean consequently drops ).
Port utilization is very, very high on the front-end. I asked support about it, thinking 4x4GB ports shouldn't be so heavily utilized, but I guess it's not a matter of "yes, you are pushing nearly 16GB of data through those ports", but more of a "yes, those ports are very busy".
And mostly balanced between them, so that tells me zoning should be balanced.
But overnight, they will be at or very near 100% for extended periods of time.
Looking at Cache performance for node2 and node3, Read Hit % is between 40% and 60% most of the time it looks like, occasional jumps to 80% 90%. Write Hit % during the day around 30%, then mostly falls off during the evening.
Lock blocks is zero.
And Page States, the Clean stat is around 630,000, while everything else is at 100,000 or less most of the time. I do see a bump in the other Page States during the overnight ( as Clean consequently drops ).
Port utilization is very, very high on the front-end. I asked support about it, thinking 4x4GB ports shouldn't be so heavily utilized, but I guess it's not a matter of "yes, you are pushing nearly 16GB of data through those ports", but more of a "yes, those ports are very busy".
And mostly balanced between them, so that tells me zoning should be balanced.
But overnight, they will be at or very near 100% for extended periods of time.
Re: bad VV performance, good PD performance
I haven't read the whole thread as it looks like it's now been split between multiple posts which doesn't really help, but if you are running large sequential workloads overnight (both read and write) then this will obviously impact service times for other I/O on the system. Hopefully I'm not stating the obvious but the smaller random I/O's will have to queue behind the sequential traffic, 3PAR is typically better at this than most as the traffic is processed in parallel, but a single large sequential I/O will still take longer to process and so impact other I/O in the queue. If this is the problem or you ports are oversubscribed you could try QOS either at the 3PAR or in the fabric to reduce the available bandwidth during the backup window, or maybe look at using snaps and avoid the need entirely. If I'm completely off base with the above then the first question you should always ask is "what has changed ?", if you can't answer that then it's just a question of tracking things down in a methodical fashion.
- Attachments
-
- perf trbl.png (18.51 KiB) Viewed 14655 times