Page 1 of 1

Suddenly poor performance on 3PAR 8200 SSD

Posted: Thu Jun 30, 2022 3:42 pm
by ARDiver86
We use our 3PAR 8200 (all SSD) to another physical location and run some Elastic monitoring nodes on it. Prior to the move we never noticed a performance issue with Elastic and for about a week we didn't notice anything. Suddenly everything started getting pretty slow but was still working.

Digging further into it I noticed that the virtual volumes had pretty high service times:
Image

This particular volume is presented to a Hyper-V Cluster which lives on a c7000 blade enclosure running BL460 Gen8 servers. The reason I don't think it has anything to do with the enclosure, fiber switches, physical cabling, etc is because we also have a 3PAR 7200 HDD connected to the same stuff and it isn't having any performance issues as far as I can tell.

I went through and checked the service time on each individual SSD and they appear to be fine showing 2ms or less read. Host ports appear to be half utilized between 400,000KBps - 600,000KBps throughput but the service time on the host ports show 50ms READ on two but the third one is close to 200ms+, however the write is very low (like 1ms).

Could it be the cable to this one host port causing the issue?

Re: Suddenly poor performance on 3PAR 8200 SSD

Posted: Fri Jul 01, 2022 2:00 am
by MammaGutt
ARDiver86 wrote:We use our 3PAR 8200 (all SSD) to another physical location and run some Elastic monitoring nodes on it. Prior to the move we never noticed a performance issue with Elastic and for about a week we didn't notice anything. Suddenly everything started getting pretty slow but was still working.

Digging further into it I noticed that the virtual volumes had pretty high service times:
Image

This particular volume is presented to a Hyper-V Cluster which lives on a c7000 blade enclosure running BL460 Gen8 servers. The reason I don't think it has anything to do with the enclosure, fiber switches, physical cabling, etc is because we also have a 3PAR 7200 HDD connected to the same stuff and it isn't having any performance issues as far as I can tell.

I went through and checked the service time on each individual SSD and they appear to be fine showing 2ms or less read. Host ports appear to be half utilized between 400,000KBps - 600,000KBps throughput but the service time on the host ports show 50ms READ on two but the third one is close to 200ms+, however the write is very low (like 1ms).

Could it be the cable to this one host port causing the issue?


If two hosts ports were 2msec and one was 200+ I would think it was the cable. But as the others are 50msec I would say it is probably something else.....

8200 and SSD ... Any dedupe, compression or stuff like that? How is the CPU utilization? 2ms read for local SSD is a little bit high, but that isn't the problem ...

What if you compare statvv and statvlun ? If statvv is low and statvlun is high, then the problem would most likely be external to the array. If statvv is high and statvlun is high, then the problem would most likely be internal to the array.

Re: Suddenly poor performance on 3PAR 8200 SSD

Posted: Fri Jul 01, 2022 9:46 am
by ARDiver86
CPU usage is sitting around 60% right now but the performance seems to have stablized. We are using one CPG with dedup and compression on version 3.3.1.410 (MU2) (no support so can't get firmware files to upgrade).

We actually only have 3 ports available to the SAN and right now all three are under 10% throughput rate. When this picture was taken all three were around 60% to 80% utilized. We only have four Elatsic HOT nodes on this device and each node is pushing around 1,000 to 2,500 IO/sec based on the performance graphs in SSMC. I think something was going on with my Elastic cluster because it eventually stopped working and all IOPS just died (the cluster died).

Now that the cluster is back online and functioning performance seems much better but I still see that the service time on each volume is around 20ms read and less than 1ms write. This could be additional latency because they are virtual machines using fiber channel passthrough in Hyper-V. The two other volumes presented to the hyper-v nodes which are just running the operating system (not much IO) are only showing less than 2ms read/write with 20 IO/sec.

Here is statvv:
Image

Here is statvlun:
Image

Image

Image

Image

Image

Image

Image


It appears statvv is high but statvlun is low. I took a look at the CPG and it appears it is configured for RAID 5 with a set size of 7 data 1 parity

Re: Suddenly poor performance on 3PAR 8200 SSD

Posted: Fri Jul 01, 2022 1:10 pm
by MammaGutt
ARDiver86 wrote:CPU usage is sitting around 60% right now but the performance seems to have stablized. We are using one CPG with dedup and compression on version 3.3.1.410 (MU2) (no support so can't get firmware files to upgrade).

We actually only have 3 ports available to the SAN and right now all three are under 10% throughput rate. When this picture was taken all three were around 60% to 80% utilized. We only have four Elatsic HOT nodes on this device and each node is pushing around 1,000 to 2,500 IO/sec based on the performance graphs in SSMC. I think something was going on with my Elastic cluster because it eventually stopped working and all IOPS just died (the cluster died).

Now that the cluster is back online and functioning performance seems much better but I still see that the service time on each volume is around 20ms read and less than 1ms write. This could be additional latency because they are virtual machines using fiber channel passthrough in Hyper-V. The two other volumes presented to the hyper-v nodes which are just running the operating system (not much IO) are only showing less than 2ms read/write with 20 IO/sec.

Here is statvv:
Image

Here is statvlun:
Image

Image

Image

Image

Image

Image

Image


It appears statvv is high but statvlun is low. I took a look at the CPG and it appears it is configured for RAID 5 with a set size of 7 data 1 parity

I think you are seeing the expected performance of the array. WIth 60% CPU, there is probably a couple of cores that are 100%.

Also, you'll never get 100% utilization of all host ports on the system. A 8200 system can have 6x 16Gb FC ports per node, so 12x16 Gb = 192 Gbit = ~24GB/s.....

What is interessting is that you have 1,9GB/s on backend traffic and 443MB/s frontend... What is the read/write ratio here? As for latency they are high (probably due to load)... but large IO size isn't helping.....

Just asking, what is the dedupe and compression rate on the system? If it is low, you would gain performance by converting the volumes to thin.

Re: Suddenly poor performance on 3PAR 8200 SSD

Posted: Fri Jul 01, 2022 6:35 pm
by ARDiver86
MammaGutt wrote:I think you are seeing the expected performance of the array. WIth 60% CPU, there is probably a couple of cores that are 100%.

Also, you'll never get 100% utilization of all host ports on the system. A 8200 system can have 6x 16Gb FC ports per node, so 12x16 Gb = 192 Gbit = ~24GB/s.....

What is interessting is that you have 1,9GB/s on backend traffic and 443MB/s frontend... What is the read/write ratio here? As for latency they are high (probably due to load)... but large IO size isn't helping.....

Just asking, what is the dedupe and compression rate on the system? If it is low, you would gain performance by converting the volumes to thin.


Based on what I am seeing it is about 60% to 70% read. What I am seeing is the CPG shows a 3.7:! compaction, 1.2:1 compression and 1:1 deduplication. I am running these four virtual machines on Hyper-V 2019 with fiber channel passthrough. I adjusted the queue depth from the default 32 to 64 and the Elastic Nodes are running Ubuntu 22.

The strange thing is I moved this from a 2 Elastic Node cluster as a test to a 5 Elastic Node cluster. It is all the same data so the amount of data or the amount of devices reporting in data hasn't changed. The only thing I can think of that was a big change is the virtual machines previously were virtual disks on the Hyper-V cluster and not they are using fiber channel passthrough utilizing NPIV. I can try to convert the volume to thin since deduplication isn't doing anything anyways and compression is hardly doing anything.

Is there anything with 3PAR and Ubuntu using fiber (specially fiber passthrough NPIV) that I should of done differently? I setup the multipathing and all that but just want to make sure I didn't miss any basic setup.

Re: Suddenly poor performance on 3PAR 8200 SSD

Posted: Sat Jul 02, 2022 10:37 am
by MammaGutt
ARDiver86 wrote:
Based on what I am seeing it is about 60% to 70% read. What I am seeing is the CPG shows a 3.7:! compaction, 1.2:1 compression and 1:1 deduplication. I am running these four virtual machines on Hyper-V 2019 with fiber channel passthrough. I adjusted the queue depth from the default 32 to 64 and the Elastic Nodes are running Ubuntu 22.

The strange thing is I moved this from a 2 Elastic Node cluster as a test to a 5 Elastic Node cluster. It is all the same data so the amount of data or the amount of devices reporting in data hasn't changed. The only thing I can think of that was a big change is the virtual machines previously were virtual disks on the Hyper-V cluster and not they are using fiber channel passthrough utilizing NPIV. I can try to convert the volume to thin since deduplication isn't doing anything anyways and compression is hardly doing anything.

Is there anything with 3PAR and Ubuntu using fiber (specially fiber passthrough NPIV) that I should of done differently? I setup the multipathing and all that but just want to make sure I didn't miss any basic setup.

60-70% read and 3x the IO backend sounds like the array is doing some else as well.
1:1 in deduplication is a clear indication to get away from dedupe. You are wasting CPU cycles on the array for no benefit (it might even be worse as you could have negative capacity effect of dedupe but it will never show worse than 1:1).
1.2:1 in compression isn't a lot either but it is something. I think HPEs recommendation is don't do anything that's less than 1.3:1 , but if 1.2:1 is what makes your data fit into your array I would absolutely keep it. If you have a lot of free capacity I would consider dropping compression as well to benefit from increased performance.
Check "statport -host" as long as traffic is balanced you seem to be okay with multipathing.

Keep in mind that increased queue length = more IOPS at higher latency (because the "latency counter" starts when the IO is queued)