How do u setup the Remote Copy configuration, for example:
- each LUN in different remote group ?
- different lun's (what belong together) in one remote group?
- how many lun's in a group?
Regards,
G.
Remote Copy
Re: Remote Copy
Typically I have a group per host or cluster for general luns and then a group for specific applications like a DB or home directories.
Most things in 3PAR I tend to keep to around 32 items per group, an old habit from back when there were performance recommendations around that number for a few things but I've not seen that number mentioned in recent docs so suspect not an issue with 3.1.x. My biggest at the moment is 35.
Most things in 3PAR I tend to keep to around 32 items per group, an old habit from back when there were performance recommendations around that number for a few things but I've not seen that number mentioned in recent docs so suspect not an issue with 3.1.x. My biggest at the moment is 35.
Re: Remote Copy
We have will have about 1000 volumes and about 200+ RC Groups when done, right now I am at 700 volumes and 160 RC Groups. Some background, we use vmware with SRM and then have a large AIX environments ruining on VIO (IBM's take on virtualized AIX similar in functionality to vmware)
Some things to consider with RC groups depending on what the volumes are for;
First the 3par can only replicate up to 20 volumes concurrently so you want to keep your RC groups small to have better chance of all volumes in group replicating at the same time, not a precise way to do this, but it helps.
My approach is to logically group volumes based on application/server. Many of our vmware datastores are single datastore/volume per RC Group unless the application has so many servers that are in multiple datastores, then we put the related datastores in the same RC group.
Our SQL servers are virtualized and have a datastore and several RDM volumes for each server. We group each of those as a single RC Group so we can control replication at a per server or per application interval. So SQL Server 1 has a RC GRoup called DB_VM_servername and it contains the datastore that the OS and app vmdks are in as well as the 3 related RDM volumes. We are modifying this to split it up since we don;t need tempdb and pagefile drive replicating frequently.
In that scenario we have 2 RC Groups:
DB_VM_servername_da contains OS, app datastore and database and log RDMs (replicates every 30 minutes)
DB_VM_servername_td contains datastore with tempdb and page file vmdks. (replicates once a day)
- the most important thing to understand is that is the volumes you place in a RC group are for a single system there is no way to guarantee consistency with recovery on the destination side. For example:
You have an AIX system (or even Windows) that has 4 volumes, say bootlun, dblun, loglun and applun. You place these in a single RC group because you you want to replicate the system as an entire entity. So let's say it is time for that RC group to replicate and because not much else is going on (there are not anywhere near the 20 concurrent limit) now because each volume has different amounts of changed data to replicate they may finish at different times, so say the dblun and bootlun finish and the other 2 are still replicating, and now you have a DR event. This is where 3par replication sucks, you have 2 volumes that finished replication and 2 that did not. The 3par will roll back the 2 that were in flight to the point before replication started while the 2 that finished are left in the completed state. As you can imagine a database server would choke when trying to come up in this scenario and would likely be unusable. I have been pushing 3par for over a year about this huge issue in their replication and it will be partly fixed in 3.1.3 due soon, but only if you use a certain replication mode. We get around this by taking snapshots on the destination side and work of the snapshots. I then have a script that runs periodically and watches the snapshots date/time vs the base volumes and if all the base volumes are newer than the snapshots and the RC group is not replicating it will refresh the snapshot, thus we have scripted consistency into the RC groups via snapshots. This approach has some concerns as you are constantly polling the array with scripts and with almost 1000 volumes and over 150 RC groups this gets very script and array access intensive.
IMO 3par replication is horrible and I argue with HP all the time. The good news is some of these issues are being fixed in the next 2 releases of the software, but I still feel they are not going far enough to really make RC a reliable and dependable service. HP's biggest issue is the legacy 3par folks, they are smart people and engineered a great product, but are not seeing the proper use cases for large Enterprises nor do know how to think like large Enterprises. HP is trying to infuse the HP mentality into the 3par group, but it taking too long and still not developing as well as some of their more mature competitors.
Sorry for the long winded answer, but you could spend hours discussing RC on 3par. RC as well as everything with designing your 3par environment the critical component is good solid design at the logical level to make sure you have consistency in naming and such to make it easier to manage large arrays with numerous volumes.
Some things to consider with RC groups depending on what the volumes are for;
First the 3par can only replicate up to 20 volumes concurrently so you want to keep your RC groups small to have better chance of all volumes in group replicating at the same time, not a precise way to do this, but it helps.
My approach is to logically group volumes based on application/server. Many of our vmware datastores are single datastore/volume per RC Group unless the application has so many servers that are in multiple datastores, then we put the related datastores in the same RC group.
Our SQL servers are virtualized and have a datastore and several RDM volumes for each server. We group each of those as a single RC Group so we can control replication at a per server or per application interval. So SQL Server 1 has a RC GRoup called DB_VM_servername and it contains the datastore that the OS and app vmdks are in as well as the 3 related RDM volumes. We are modifying this to split it up since we don;t need tempdb and pagefile drive replicating frequently.
In that scenario we have 2 RC Groups:
DB_VM_servername_da contains OS, app datastore and database and log RDMs (replicates every 30 minutes)
DB_VM_servername_td contains datastore with tempdb and page file vmdks. (replicates once a day)
- the most important thing to understand is that is the volumes you place in a RC group are for a single system there is no way to guarantee consistency with recovery on the destination side. For example:
You have an AIX system (or even Windows) that has 4 volumes, say bootlun, dblun, loglun and applun. You place these in a single RC group because you you want to replicate the system as an entire entity. So let's say it is time for that RC group to replicate and because not much else is going on (there are not anywhere near the 20 concurrent limit) now because each volume has different amounts of changed data to replicate they may finish at different times, so say the dblun and bootlun finish and the other 2 are still replicating, and now you have a DR event. This is where 3par replication sucks, you have 2 volumes that finished replication and 2 that did not. The 3par will roll back the 2 that were in flight to the point before replication started while the 2 that finished are left in the completed state. As you can imagine a database server would choke when trying to come up in this scenario and would likely be unusable. I have been pushing 3par for over a year about this huge issue in their replication and it will be partly fixed in 3.1.3 due soon, but only if you use a certain replication mode. We get around this by taking snapshots on the destination side and work of the snapshots. I then have a script that runs periodically and watches the snapshots date/time vs the base volumes and if all the base volumes are newer than the snapshots and the RC group is not replicating it will refresh the snapshot, thus we have scripted consistency into the RC groups via snapshots. This approach has some concerns as you are constantly polling the array with scripts and with almost 1000 volumes and over 150 RC groups this gets very script and array access intensive.
IMO 3par replication is horrible and I argue with HP all the time. The good news is some of these issues are being fixed in the next 2 releases of the software, but I still feel they are not going far enough to really make RC a reliable and dependable service. HP's biggest issue is the legacy 3par folks, they are smart people and engineered a great product, but are not seeing the proper use cases for large Enterprises nor do know how to think like large Enterprises. HP is trying to infuse the HP mentality into the 3par group, but it taking too long and still not developing as well as some of their more mature competitors.
Sorry for the long winded answer, but you could spend hours discussing RC on 3par. RC as well as everything with designing your 3par environment the critical component is good solid design at the logical level to make sure you have consistency in naming and such to make it easier to manage large arrays with numerous volumes.
Re: Remote Copy
Sounds like you're having a lot of fun with RC.
RC has always been a bit of a bolt-on for 3PAR, but it is the one feature that has been enhanced the most in the last few years (there was a lot of room for improvements admittedly) . We've only used sync mode so far so not had issues with timing but after the next upgrade I'll be able to consider doing both sync and async on the same array pair so might look at what servers/apps can be synced less often.
RC has always been a bit of a bolt-on for 3PAR, but it is the one feature that has been enhanced the most in the last few years (there was a lot of room for improvements admittedly) . We've only used sync mode so far so not had issues with timing but after the next upgrade I'll be able to consider doing both sync and async on the same array pair so might look at what servers/apps can be synced less often.
Re: Remote Copy
Thanks all for your response.
We have 2 new 3par in production (with RC). Installed and configure with HP and i am curious how others have configured there storage systems.
So far I have experience with EVA 5000 / 8000 / 8100 and EVA 8400 and soon the basic and advanced training of HP 3par.
We have 2 new 3par in production (with RC). Installed and configure with HP and i am curious how others have configured there storage systems.
So far I have experience with EVA 5000 / 8000 / 8100 and EVA 8400 and soon the basic and advanced training of HP 3par.
Re: Remote Copy
We have hit an interesting bug with RC. We use RC in async mode and have been happily using it for a couple of years.
We started noticing that some of our syncs were running overtime (2hour sync period)
The LUNS affected were larger than 4TB. We raised and call and after many months of investigation have found that a bug was introduced in 3.1.2 that has caused this.
It seems that async RC kicks off a full scan of the metadata of the VLUN even the unused portion of the drives. As the drives in question are 12TB, this sometimes takes a while
This will be fixed in the Feb 14 release.
We started noticing that some of our syncs were running overtime (2hour sync period)
The LUNS affected were larger than 4TB. We raised and call and after many months of investigation have found that a bug was introduced in 3.1.2 that has caused this.
It seems that async RC kicks off a full scan of the metadata of the VLUN even the unused portion of the drives. As the drives in question are 12TB, this sometimes takes a while
This will be fixed in the Feb 14 release.
Re: Remote Copy
Do you have a bug number as we have been complaining about this for months and get all sorts of excuses that it has nothing to do with them, yadda yadda....
Re: Remote Copy
No bug number at the moment. Only confirmation from the 3par L3 techs that they may have changed an algorithm on the last version upgrade that changes they behavior of large LUNS via RC.
Will update the forum when we find more
Will update the forum when we find more
Re: Remote Copy
Would you mind sharing a case number so I can get my escalation people to look at your case and see if they are similar. You can PM me the case number if you like, thanks!
Re: Remote Copy
The fix for our issue has been recognised and will be part of the 3.1.3 release in February. We have been advised that the scan of the metadata on a 12TB VLUN is holding up the works. Any of our VLUNs in a RC group that is larger than around 4TB is taking up to 4 hours to sync even if the changes are minimal.