Latency on Primera with 256KB io and 512KB io

JinSXS · Post by **JinSXS** » Wed May 12, 2021 9:54 am

we have a few hosts (mssql servers) that off and on, are doing read/writes at 256KB & 512KB io size..

but some how, that is causing WRITE latency spikes on other HOSTS up to 20ms

those servers are not even using the same FE host port, any how to address this ?

Supports reply, is asking us not to do such large IO operations, as though we have control on it..

any sharing/experience in dealing with this is highly appreciated

Post by **Richard Siemers** » Fri May 14, 2021 12:22 am

On the hosts that are latency 'victims', how many IOPs are they averaging when seeing the 20ms write latency? I ask because very low IO/idle hosts can report high latency when other systems are getting priority, however when other systems actually need to do some real IO, the latency corrects itself and the get a share of the pie too. I am not sure if this is part of the intelligent optimization, or if it's just math anomaly from calculating an "average" with too few data points.

Do you have an alert emailing you when this happens, or are you observing it in SSMC/Infosight? Is the reported high latency manifesting as application issues, or only observed in alerts and reports?

You may have more control over that SQL IO than you think! Have conversations with the DBAs to explain what you see, and understand what is going on when that unusually high block size is observed. I suspect they may be backing up the database from the Primera, TO the Primera! Possible to the same CPG. It could be batch imports/exports or something else too like refreshing a DEV box with a copy of PROD. There could be an opportunity for you to flex your storage skills and save the company time and money by leverage snapshots instead!

Worst case scenario... if this host is an absolute 'bully' hogging resources and after your draw your DBAs attention to it, they still can not, or will not, help... you can look into applying a QoS policy to it to keep it from stepping on everyone else's toes.

JinSXS · Post by **JinSXS** » Sun May 16, 2021 11:43 pm

i'm seeing latency from the reports, and also from the DB server as we have few critical DB servers that is being monitored by dynatrace, and the bosses as getting edgy, due to the high latency..

i've cross check the node cpu/ssd total iops/service time latency and the node cache performance (where i can see delayack up to 3k )

i'm trying to corelate where is the "bottle-neck" , as our primera is a mixworkload primera, but we do have a few highly latency sensitive db running..

snapshot leveraging isn't possible, there is alot of work required from server & db admin to use the snapshot as there some hosts that is residing on solaris ldom...

then we also have system that does schema backup before their batch processing..

i do agree with you on the QoS, but the document is vague and i'm not really sure how to implement it, let say i have a mssql server, that is latency sensitive, i create a QOS and put latency 1ms and iops and mb as per average workload, so that if there is other noisy neighbor , they won't impact it right ?

Post by **Richard Siemers** » Sun May 30, 2021 12:04 pm

Any luck pinpointing the bottleneck? Have you opened a support case for a deeper dive?

How does VLUN latency compare to VV latency, as reported by the Primera/SSMC?

Regarding QoS, or Priority Optimization:

Page 185 of the SSMC User Guide covers some of it. Note in addition to the PO policy, there are also PO reports in system reporter and also PO alert settings you can manage.
https://support.hpe.com/hpesc/public/do ... cale=en_US

Page 295 of the CLI guide covers "setqos" with additional details.
https://support.hpe.com/hpesc/public/do ... 88929en_us

JinSXS · Post by **JinSXS** » Tue Jun 01, 2021 6:40 am

Any luck pinpointing the bottleneck? Have you opened a support case for a deeper dive?

not really, we logged a case, and backline just say our SSD BE are overloaded as we are doing 15k to 20k IOPS...

i've asked if converting to thin lun, removing the deco, will yield lower load on the BE SSD, but somehow my country HPE team is trying to get intouch with the performance team for some deep dive..

i've ask does the PRIMERA SSD.. 3TB/7TB/15TB all have the same IOPS watermark or not, they didnt reply to my answer, not sure if the Ninja performance that was used to size our Primera from our 8440 is accurately calculating the load or not

bbarbaros · Post by **bbarbaros** » Sun Jun 05, 2022 11:30 am

Richard Siemers wrote:On the hosts that are latency 'victims', how many IOPs are they averaging when seeing the 20ms write latency? I ask because very low IO/idle hosts can report high latency when other systems are getting priority, however when other systems actually need to do some real IO, the latency corrects itself and the get a share of the pie too. I am not sure if this is part of the intelligent optimization, or if it's just math anomaly from calculating an "average" with too few data points.

So, if may piggyback on this. When I look at the performance charts from SSMC. I see 13 hosts showing high latency ranging from 25ms to 80ms for a period of time every night around 11PM, but 12 of them are barely doing any IOPS during that period. So, all those 12 hosts` high latency can be ignored?

That 1 host which is an SQL server is doing close to 8000 IOPs and has read latency between 30ms and 40ms which is still high for an all-flash Primera array. That lasts about an hour.

What do you suggest?

MammaGutt · Post by **MammaGutt** » Sun Jun 05, 2022 11:28 pm

bbarbaros wrote:
Richard Siemers wrote:On the hosts that are latency 'victims', how many IOPs are they averaging when seeing the 20ms write latency? I ask because very low IO/idle hosts can report high latency when other systems are getting priority, however when other systems actually need to do some real IO, the latency corrects itself and the get a share of the pie too. I am not sure if this is part of the intelligent optimization, or if it's just math anomaly from calculating an "average" with too few data points.

So, if may piggyback on this. When I look at the performance charts from SSMC. I see 13 hosts showing high latency ranging from 25ms to 80ms for a period of time every night around 11PM, but 12 of them are barely doing any IOPS during that period. So, all those 12 hosts` high latency can be ignored?

That 1 host which is an SQL server is doing close to 8000 IOPs and has read latency between 30ms and 40ms which is still high for an all-flash Primera array. That lasts about an hour.

What do you suggest?

Ignore any host with less that 10 IOps. Comparing array stats with host stats usually tell that the host isnâ€™t seeing this.

For the remaining hosts, compare statvlun to statvv to see if the array is strugling. If vlun is high and vv is low, the problem is outside the array. Also look at the queue. One can easily generate latency by increasing the queue on the host. The Â«timerÂ» starts once the OS sends the IO to the HBA.

HPE Storage Users Group

Latency on Primera with 256KB io and 512KB io

Latency on Primera with 256KB io and 512KB io

Re: Latency on Primera with 256KB io and 512KB io

Re: Latency on Primera with 256KB io and 512KB io

Re: Latency on Primera with 256KB io and 512KB io

Re: Latency on Primera with 256KB io and 512KB io

Re: Latency on Primera with 256KB io and 512KB io

Re: Latency on Primera with 256KB io and 512KB io