Page 1 of 2

AO - Task isn't looking at the right timerange?

Posted: Mon Mar 03, 2014 6:29 pm
by spencer.ryan
We've got a simple AO policy (3 tiers, performance) that starts M-F at 6 PM, and should look at data from 8AM to 6PM.

I'm seeing two problems, first it looks like it is missing the start region by an hour, the command is for 10 hours in the past unless I'm calculating something wrong. Second it looks like it never catches the sample data from 5:30 to 6pm. I assume this is from the AO task starting right at 6 and the system isn't done collecting data. Moving the start time to 6:15 might fix this?

Here's the task:
startao -btsecs -36000 -maxrunh 12 -compact auto AO-Policy


18:00 - 10 should be 8:00
Image



Any ideas?

Re: AO - Task isn't looking at the right timerange?

Posted: Tue Mar 04, 2014 1:49 pm
by spencer.ryan
To test my theory I changed the schedule to run at 6:15, -btsecs to -40500 (added the extra hour, plus the 15 minutes), and -etsecs -900 to stop at 6pm, -etsecs isn't likely needed since it wouldn't grab a new data set before 6:30, I think anyway.


I'll report back on how it goes.

Re: AO - Task isn't looking at the right timerange?

Posted: Tue Mar 04, 2014 2:21 pm
by kwalters
I have seen the same behavior. I called HP and this was the response I got, hope it helps:

We have a limited amount of space for the on-node region data and we need to make sure that we can handle the largest supported capacity with this limited space.
So we cannot simply store half-hour samples for 24 hours or more.
In fact, for the largest capacity we can only store about 25 samples.
So that we can cover a long interval of time so we use keep the most recent data at the highest resolution, and get progressively coarser grain (larger interval between samples) for older data.

Here is the extract for the way we store LD region data.

Since there can be a very large number of regions per sample, we create a separate database file per time (currently every 30 minutes). The intervals between these samples is not uniform so that we can cover a longer period of time with fewer data files. We keep half-hour samples for 2 hours, 1-hour samples for 8 hours, 3-hour samples for 24 hours, 12-hour samples for 3 days and 24-hour samples for 7 days.

So if you ask for a sample interval that is several hours ago, it will not necessarily have samples that cover the entire interval exactly. If your interval is over 8 hours ago, it is not guaranteed to have a sample every hour. It will try to find samples that cover as much of the interval as possible and it will print out which samples it used. If the range that the available samples can cover is too small a fraction of the requested range then it will not do the requested AO because it does not want to do moves based on too small a time interval.

Re: AO - Task isn't looking at the right timerange?

Posted: Tue Mar 04, 2014 6:38 pm
by spencer.ryan
Thanks for that info, it's helpful. It just started it's run and it's looking at the same range (9am to 5:30)

It's too bad we can't configure how much space the system uses. At least the vvols I can see it only reserves 90GB for itself. I've got over 200TB usable. Please, take more of that and keep high res data for X days. Let me pick X.

Re: AO - Task isn't looking at the right timerange?

Posted: Wed Mar 05, 2014 6:42 am
by hdtvguy
I am seeing same thing and going to open a case, AO has sucked since they moved it onto the controllers in 3.1.2, no more System Reporter reports on what AO did and now these limitations. I miss 3.1.1 more and more every day. Looks like they store this in .srdata and that is 80GB, I don't recall needing that much space in System Reporter when it was collected there. Also 80GB on my 440TB array is nothing, let me configure that.

BTW, thanks for pointing this out. I have been going under the assumption AO was as robust as before and capturing my entire window.

Re: AO - Task isn't looking at the right timerange?

Posted: Wed Mar 05, 2014 7:25 am
by spencer.ryan
I'll be interested to hear what they say.


Another method they could use is to collect high-res data only for the time range that is set up in any active schedule. Why does it need to collect perf data from 6pm to 8am when I don't care about those time ranges?

This would prevent you from running AO *RIGHT NOW*, but really, who does that?

Re: AO - Task isn't looking at the right timerange?

Posted: Wed Mar 05, 2014 8:14 am
by kwalters
I am told customers used to complain about having to buy system reporter to run AO, so they moved it into the array, but they only gave it the a small amount of space keep its data (why?). I would like to have 24 hours of detailed stats or better yet, let me decide how much space I want to dedicate to this sort of stuff. I take it when it was in SR it kept more granular data for AO and this did not happen?

I started with 3PAR at 3.1.2 so I am not familiar with having to use SR for AO, though I do have SR I find it a little clunky, I expected something more sophisticated. There is the 3PAR built in report for AO Space Moved which tells me most of what I want to know. What exactly is it they there used to be as far as SR AO reports that are now lacking?

Be that as it may, AO seems to work fine for me. It is moving and re-balancing things every day. It generally runs in about 2-3 hours out of a max of 6.

Re: AO - Task isn't looking at the right timerange?

Posted: Wed Mar 05, 2014 11:43 am
by hdtvguy
The main reason I think AO went on the controller is the huge overhead SR had. Our SR was a dog and consuming space and we were always tuning and fighting with it in the early months, but got it to a working stable state. Also SR with AO really only worked well with MySQL as MS SQL crashed and burned on us and we migrated to MySQL. So I think the main reason for the move was to simplify AO, by not needing external components and then to take the pressure off SR. What pissed me off is moving it to the nodes lost all the AO reports in SR. Not sure why SR can't just extract the data from the nodes and do its thing and then generate reports. It seems to me that SR is fracturing and some info is moving into the IMC and other data in SR. I disagree with that because SR allows me to publish reports that anyone can look at without needing the IMC.

Re: AO - Task isn't looking at the right timerange?

Posted: Wed Mar 05, 2014 11:51 am
by hdtvguy
OK support gave me virtually the same canned answer, with the note that the issue is resolved in 3.1.3 due this month.

Re: AO - Task isn't looking at the right timerange?

Posted: Thu Mar 06, 2014 7:34 am
by spencer.ryan
Well good to know that it will "get better".


Just to update, having AO start at 18:15 with "-etsecs -900" resulted in it sill grabbing the last sample window from 17:30


Having it start at 18:15 and dropping -etsecs from the task caused it to get a sample right up to 18:00.

It still won't grab data before 09:00 though. Which in all reality isn't the end of the world, but I still don't like it.