I have VMware 5.1 running on datastores presented from two 7400 arrays which are running sync remote copy between two sites over FC. I had to perform maintenance on a blade chassis at the primary site, so I ran the switchover command on all of my remote copy groups containing datastores.
All of the datastores switched correctly, however, when maintenance was over and I performed another switchover, all of the datastores switched back except one. I confirmed with a showrcopy command that indeed all remote copy groups returned to their original state, but for some reason the ALUA pathing on some of the hosts failed to update and their "Active (I/O)" paths were still pointed to the secondary storage array.
In order to correct this, I ran another switchover command on the remote copy group containing the datastore with the incorrect active paths. This time, the switchover did not happen successfully. I ended up with a 3par alert and the remote copy group ended up performing a full resync. During this time, this datastore was completely offline, even though some of the hosts had the correct active paths to the secondary array. The datastore was visible within VMware, but when I attempted to browse the files with the vSphere client, it showed no files within the datastore.
At this point, I called and opened a case, but so far HP's suggestion is to perform another switchover and to unmount and remount the datastore. In conclusion, I'm left with a datastore only visible on two out of 12 servers. The 10 servers that cannot view the datastore have all of their active (I/O) paths pointed at the wrong (secondary) site. The paths that it should be using are all in "stand by" mode. I have a production VM running a DB on this datastore so I cannot afford to lose all connectivity to it and I cannot storage vMotion it.
My long-term solution is to move the VMs off of this datastore and delete it, but I'm hoping I can get a better understanding of what is happening. Luckily this happened on my least populated datastore. Is there a way to force the active (i/o) paths back to the correct storage array?
FYI, my host sets are using persona 11 and the 3par os firmware is 3.1.3 mu1 and I'm using round-robin multipathing. Here is the alert I got when attempting the switchover:
Severity: Degraded
Type: Component state change
Message: Remote Copy Volume 15795(vv_name_goes_here)
Degraded (Volume Unsynced - promote of snapshot failed {0x8} )
ID: 2474
Message Code: 0x03700de
Issue with peer persistence in VMware involving pathing
Re: Issue with peer persistence in VMware involving pathing
I figured out the problem. The policy "path_management" was not set on this remote copy group. I have to wait until the weekend to actually apply the change and test it, but all of my other remote copy groups are working correctly and they have the policy applied. Here is a paragraph from the HP remote copy user manual explaining more about this setting:
http://h20565.www2.hp.com/hpsc/doc/publ ... -c03618143
"
If path_management is not enabled, the ALUA state of all exported volumes is set to ACTIVE.
Both source and target replication volumes are accessible to the host cluster, and host access must
be controlled at the host level.
Should an ESX host become disconnected from the primary volumes and be able to see only
secondary volumes (that is, have non-uniform volume access), it may be necessary to disable the
path_management policy for those volume groups until the uniform cluster is re-established.
"
My experience so far is that when the switchover occurred, the host's paths don't update, however, not all of the paths are active like what the manual describes.
http://h20565.www2.hp.com/hpsc/doc/publ ... -c03618143
"
If path_management is not enabled, the ALUA state of all exported volumes is set to ACTIVE.
Both source and target replication volumes are accessible to the host cluster, and host access must
be controlled at the host level.
Should an ESX host become disconnected from the primary volumes and be able to see only
secondary volumes (that is, have non-uniform volume access), it may be necessary to disable the
path_management policy for those volume groups until the uniform cluster is re-established.
"
My experience so far is that when the switchover occurred, the host's paths don't update, however, not all of the paths are active like what the manual describes.
-
- Posts: 142
- Joined: Wed May 07, 2014 10:29 am
Re: Issue with peer persistence in VMware involving pathing
The fact that HP changed the path_management default policy in 3.1.3 is pretty bad.
That they almost hide it in the relase notes is unforgiveable.
Had HP consulting doing an early 3.1.3 upgrade with HP support,
both blamed VMware and was not aware of the change in behavior...
That they almost hide it in the relase notes is unforgiveable.
Had HP consulting doing an early 3.1.3 upgrade with HP support,
both blamed VMware and was not aware of the change in behavior...