V400 and SQL 2012 (not in VM environment)

hdtvguy
Posts: 576
Joined: Sun Jul 29, 2012 9:30 am

Re: V400 and SQL 2012 (not in VM environment)

Post by hdtvguy »

Richard Siemers wrote:
hdtvguy wrote:
Consistency across replication within a Remote Copy Group does NOT exist in 3par, any references to consistency are not around a RC Group. Test it you will see. Start an RC group with 2 volumes in it. Let one volume finish replicate and while the other is still replicating break the link. The volume that finished replicating will stay replicated while the volume in flight will roll back to the snap taken at the beginning of replication. Find me many systems that will deal with that if they have multiple volumes as part of the system. Trust me we have been fighting this battle for a year now, I replicate over 500 volumes every day and have some legacy systems that have 10-15 volumes as part of their system.


Hold on HDTVguy, take a moment to re-read what Cleanur posted back in August of last year. Also checkout the Remote Copy Users Guide page 237. I suspect you may have been overlooking some details and shamelessly spreading doom and gloom where it wasn't deserved.

Your scenario: Start RC group with 2 volumes, let 1 volume sync and then break the link before the 2nd finishes (I assume you really mean after initial syncs have been established). What you see with the admin tool is 1 up to date replica, and 1 rolled back. I'm with you thus far, but what you're not telling us about is the snapshots of the replica base volumes that would be promoted in the event a real DR was needed of this group in its current state. If the rc group were to be put into failover mode, it is suppose to promote those snaps and revert to a state of "exactly what you want". The previous consistent set of snaps will be promoted, reverting all VVs in the group the last consistent sync. Can you witness/confirm this by listing snapshots while a resync is in progress?




So I do not think I am spreading doom and gloom. Sure if you failover the array it will try to promote the volumes. But let's examine this. First not sure how many people have actually experimented with the RC snaps promoting, but it is painfully slow. Let's look at a scenario like mine, I have 250+ RC Groups and over 500 volumes that replicate. Due to the hap hazord way volumes replicate in the 20 concurrent slots available for replication I can have dozens of RC groups in an inconsistent state at any given time. Lets say we have a DR event. OK, first what ever was in flight rolls back. Then I have to put the array in failover state and then any RC groups not consistent are supposed to roll forward. Let's assume a dozen reasonable sized volumes across just as many RC groups are inconsistent. Now I have to identify those RC groups and wait before attempting recovery of those systems. I will tell you I have seen large volumes take and hour or more to promote back after a failed RC. Now do that for a dozen volumes and an array that is being brought up after an event. You have no chance at meeting any critical RTOs.

Let's assume I am truly OK with all this, then as a company should I put my array and/or RC Groups in failover mode for my DR drills to make sure the array does what it is supposed to?

I do believe the idea of consistency groups for RC needs to be added over time, in the mean time we work around this by working off snapshots of the replicated volumes and I have a script that watches the state of the volumes in the RC group and when they are all up to date I refresh the snaps and that is what is presented out to our Dr systems. The advantage of this is for DR tests I can allow replication to continue without letting our SLAs lag and then when done the snaps will be refreshed and replication does not have to be stopped or started. Wish they would work this feature into the array.
Cleanur
Posts: 254
Joined: Wed Aug 07, 2013 3:22 pm

Re: V400 and SQL 2012 (not in VM environment)

Post by Cleanur »

Appreciate what you're saying here, but your use case of periodically checking the mounting of DR copies without issuing a failover isn't really the way the Periodic Async solution was designed. Periodic Async is really targeted for low bandwidth and latency challenged environments, since less data needs to be moved, only the most current copy of a given block in the window between replication events.

As explained the copies become consistent when you issue a failover, but in order for that to happen replication must be stopped on the group to maintain data integrity. There's likely some good news on the horizon regarding your options here, but whether it will fit with your particular use case we'll have to wait and see.
Post Reply