Page 1 of 2
PD Performance against VV Performance
Posted: Wed Aug 13, 2014 7:07 am
by glamic26
I'm trying to get my head around quite a complex performance issue that we are suffering from. It seems to be storage based and it seems to be intermittent. I am trying to use 3PAR System Reporter to look at some historic 3PAR performance and work out what might be causing the problem and why. However, something doesn't seem to add up.
When I run a PD performance chart I see that the physical disks average around 10k-15k IOPs throughout the day but spike every 2 hours (between 5 and 10 minutes past the hour) up to between 25k-35k IOPs. Firstly I wanted to see if I could find the VV responsible for the IOPs spikes. However, after getting a report for the 16 highest VV in terms of IOPs they got nowhere near the IOPs of the Physical Disks and the total for those 16 VVs maxed out at 4k IOPs. I therefore ran a report to show the total IOPs across all VVs and that maxes out at 12k IOPs.
So does anyone know why the Physical Disks are hitting 35k IOPs when the Virtual Volumes only hit 12k IOPs total between them? And can anyone guide me in the right direction on how to find the culprit for the PD IOPs spikes? I see the same spikes on the VVs (but only a fraction of the size).
Re: PD Performance against VV Performance
Posted: Wed Aug 13, 2014 10:32 am
by nsnidanko
I have the same data here. I am not sure if this is true by I was suggested that VV IOPS show "real" IOPS generated by given load. PD IOPS show "real" IOPS plus RAID overhead. If we look at our raw numbers it makes scene in our case with RAID5, but with Read IOPS penalty it just doesn't make any sense...
Re: PD Performance against VV Performance
Posted: Wed Aug 13, 2014 12:31 pm
by Cleanur
PD will show Back End I/O including raid overheads etc, VV (well sort of...see below) and VLUN show the Front End I/O so this is what the host sees prior to Raid overheads etc on the backend.
Looking at the VV metrics is not the most useful as it typically includes cache specific metrics and the latency measurements are all internal. So unless you understand how to interpret these or you're comparing it to a VLUN for troubleshooting then it's not that useful and will just lead to confusion.
Instead I would look at the VLUN's which will give you more of a host side view showing what's being received from the host and what the array responds with. Best way to do this is look end to end, I'd typically start at the back end and work forward as anything changing at the front end tends to be amplified at the back.
statport -host -ni -rw
statvlun -ni -rw
statcmp -d 5
statport -disk -ni -rw
statpd -ni -rw
You can get the same from system reporter, but it you know when these issues occur the above will give real time numbers. Open a few CLI sessions from putty or similar and run them concurrently, you can also export each to a log.
Be aware that depending on how things are configured and the workload at the host, even relatively small front end changes in I/O, especially sequential IO, can translate into large back end PD IO increases, think parity overheads, snaps etc hence why you need an end view.
Re: PD Performance against VV Performance
Posted: Fri Aug 15, 2014 4:27 am
by 3ParDude_1
I spent some time trying to compare front and backend IOP's recently and also got a headache. Even accounting for RAID penalty I couldn't get them to add up. I think part of this is due to the IO size varying between the front and backend.
Also other things that can influence the difference are if you are running any snapshots, remote copy, AO or DO activities.
nsnidanko - don't forget for your calculations RAID penalty's only apply to writes not reads
Re: PD Performance against VV Performance
Posted: Fri Aug 15, 2014 11:41 am
by hdtvguy
I think the controller cache impacts the numbers. I get 80% cache hit on reads so my front end IOPS should be much larger compared to back-end read IOPS.
Re: PD Performance against VV Performance
Posted: Wed Aug 27, 2014 10:35 am
by 3ParDude_1
I have asked support the best way to compare front and backend IOP's since I was still getting some funny results when using IOP's. They have advised me the IO size varies between the front end and backend so the best way to compare FE v BE is to use bandwidth stats as this is IOP's*IO size and therefore accounts for the IO size difference.
I am now seeing that the BE is 2.5 times the front which seems a much better figure
Re: PD Performance against VV Performance
Posted: Fri Aug 29, 2014 6:58 am
by Cleanur
There's all kinds of things going on in the backend depending on the workload, write coalescing, full stripe writes etc which even taking into account raid overheads means you can't do a 1 for 1 compare from an I/O perspective.
Re: PD Performance against VV Performance
Posted: Tue Sep 02, 2014 4:58 am
by slink
I understand you can't do like for like comparisons with FE & BE but I am confused with how my 3PAR is graphing total IOPS at the backend way beyond what I would expect.
I have a 4-node 7400 with 160 10K FC drives in it and the PD IOPS are showing as topping out to 500 IOPS when running a 4K 100% random read IOmeter test with latency on the frontend <4ms.
The other confusing thing is when it graphs IOPS like this:
This is a 4K 100% random 50/50 r/w test. This is a Physical Disk chart with the red line being reads, blue is writes, the top light blue is the total IOPS. This is just weird because those drives are not capable of doing >50000 random IOPS, the host doing the test is reporting average latencies of ~5ms and IOPS of ~25,000 split 50/50 r/w so it is more like the red OR the blue line is the total IOPS. I would expect to see those red and blue lines closer to the 10,000 IOPS line and the light blue line around where the red and blue are currently sitting. Why does the 3PAR report IOPS like this? 50,000 IOPS on a random read/write test with 160x10K spinning disk? No way.
Re: PD Performance against VV Performance
Posted: Wed Sep 03, 2014 4:51 am
by 3ParDude_1
Slink
This is exactly what I started looking at a little while ago, like you I tied up the front end IOP's with what hosts where reporting but the backend seemed very high. I think the key to this is the varying IO size between front and back end, you are probably best off comparing bandwidth since this will account for variance in IO size and you can see if the multiple between the front and backend bandwidth seems reasonable.
Re: PD Performance against VV Performance
Posted: Wed Sep 03, 2014 5:02 am
by slink
OK I will do that and to be honest the VLUN charts for frontend performance are the most useful but I'm just wondering what the 3PAR is doing here, how/why is reporting such high figures for spinning disk in the first place? It makes no sense.