Use VMware Site Recovery Manager for Disaster Recovery of NFS datastores
The utilization of ONTAP tools for VMware vSphere 10 and the Site Replication Adapter (SRA) in conjunction with VMware Site Recovery Manager (SRM) brings significant value to disaster recovery efforts. ONTAP tools 10 provide robust storage capabilities, including native high availability and scalability for the VASA Provider, supporting iSCSI and NFS vVols. This ensures data availability and simplifies the management of multiple VMware vCenter servers and ONTAP clusters. By using the SRA with VMware Site Recovery Manager, organizations can achieve seamless replication and failover of virtual machines and data between sites, enabling efficient disaster recovery processes. The combination of ONTAP tools and the SRA empowers businesses to protect critical workloads, minimize downtime, and maintain business continuity in the face of unforeseen events or disasters.
ONTAP tools 10 simplifies storage management and efficiency features, enhances availability, and reduces storage costs and operational overhead, whether you are using SAN or NAS. It uses best practices for provisioning datastores and optimizes ESXi host settings for NFS and block storage environments. For all these benefits, NetApp recommends this plug-in when using vSphere with systems running ONTAP software.
The SRA is used together with SRM to manage the replication of VM data between production and disaster recovery sites for traditional VMFS and NFS datastores and also for the nondisruptive testing of DR replicas. It helps automate the tasks of discovery, recovery, and reprotection.
In this scenario we will demonstrate how to deploy and use VMWare Site Recovery manager to protect datastores and run both a test and final failover to a secondary site. Reprotection and failback are also discussed.
Scenario Overview
This scenario covers the following high level steps:
-
Configure SRM with vCenter servers at primary and secondary sites.
-
Install the SRA adapter for ONTAP tools for VMware vSphere 10 and register with vCenters.
-
Create SnapMirror relationships between source and destination ONTAP storage systems
-
Configure Site Recovery for SRM.
-
Conduct test and final failover.
-
Discuss reprotection and failback.
Architecture
The following diagram shows a typical VMware Site Recovery architecture with ONTAP tools for VMware vSphere 10 configured in a 3-node high availability configuration.
Prerequisites
This scenario requires the following components and configurations:
-
vSphere 8 clusters installed at both the primary and secondary locations with suitable networking for communications between environments.
-
ONTAP storage systems at both the primary and secondary locations, with physical data ports on ethernet switches dedicated to NFS storage traffic.
-
ONTAP tools for VMware vSphere 10 is installed and has both vCenter servers registered.
-
VMware Site Replication Manager appliances have been installed for the primary and secondary sites.
-
Inventory mappings (network, folder, resource, storage policy) have been configured for SRM.
-
NetApp recommends a redundant network designs for NFS, providing fault tolerance for storage systems, switches, networks adapters and host systems. It is common to deploy NFS with a single subnet or multiple subnets depending on the architectural requirements.
Refer to Best Practices For Running NFS with VMware vSphere for detailed information specific to VMware vSphere.
For network guidance on using ONTAP with VMware vSphere refer to the Network configuration - NFS section of the NetApp enterprise applications documentation.
For NetApp documentation on using ONTAP storage with VMware SRM refer to VMware Site Recovery Manager with ONTAP
Deployment Steps
The following sections outline the deployment steps to implement and test a VMware Site Recovery Manager configuration with ONTAP storage system.
Create SnapMirror relationship between ONTAP storage systems
A SnapMirror relationship must be established between the source and destination ONTAP storage systems, for the datastore volumes to be protected.
Refer to ONTAP documentation starting HERE for complete information on creating SnapMirror relationships for ONTAP volumes.
Step-by-step instructions are outline in the following document, located HERE. These steps outline how to create cluster peer and SVM peer relationships and then SnapMirror relationships for each volume. These steps can be performed in ONTAP System Manager or using the ONTAP CLI.
Configure the SRM appliance
Complete the following steps to configure the SRM appliance and SRA adapter.
Connect the SRM appliance for primary and secondary sites
The following steps must be completed for both the primary and secondary sites.
-
In a web browser, navigate to
https://<SRM_appliance_IP>:5480
and log in. Click on Configure Appliance to get started. -
On the Platform Services Controller page of the Configure Site Recovery Manager wizard, fill in the credentials of the vCenter server to which SRM will be registered. Click on Next to continue.
-
On the vCenter Server page, view the connected vServer and click on Next to continue.
-
On the Name and extension page, fill in a name for the SRM site, an administrators email address, and the local host to be used by SRM. Click on Next to continue.
-
On the Ready to complete page review the summary of changes
Configure SRA on the SRM appliance
Complete the following steps to configure the SRA on the SRM appliance:
-
Download the SRA for ONTAP tools 10 at the NetApp support site and save the tar.gz file to a local folder.
-
From the SRM management appliance click on Storage Replication Adapters in the left hand menu and then on New Adapter.
-
Follow the steps outlined on the ONTAP tools 10 documentation site at Configure SRA on the SRM appliance. Once complete, the SRA can communicate with SRA using the provided IP address and credentials of the vCenter server.
Configure Site Recovery for SRM
Complete the following steps to configure Site Pairing, create Protection Groups,
Configure Site Pairing for SRM
The following step is completed in the vCenter client of the primary site.
-
In the vSphere client click on Site Recovery in the left hand menu. A new browser windows opens to the SRM management UI on the primary site.
-
On the Site Recovery page, click on NEW SITE PAIR.
-
On the Pair type page of the New Pair wizard, verify that the local vCenter server is selected and select the Pair type. Click on Next to continue.
-
On the Peer vCenter page fill out the credentials of the vCenter at the secondary site and click on Find vCenter Instances. Verify the the vCenter instance has been discovered and click on Next to continue.
-
On the Services page, check the box next the proposed site pairing. Click on Next to continue.
-
On the Ready to complete page, review the proposed configuration and then click on the Finish button to create the Site Pairing
-
The new Site Pair and its summary can be viewed on the Summary page.
Add an Array Pair for SRM
The following step is completed in the Site Recovery interface of the primary site.
-
In the Site Recovery interface navigate to Configure > Array Based Replication > Array Pairs in the left hand menu. Click on ADD to get started.
-
On the Storage replication adapter page of the Add Array Pair wizard, verify the SRA adapter is present for the primary site and click on Next to continue.
-
On the Local array manager page, enter a name for the array at the primary site, the FQDN of the storage system, the SVM IP addresses serving NFS, and optionally, the names of specific volumes to be discovered. Click on Next to continue.
-
On the Remote array manager fill out the same information as the last step for the ONTAP storage system at the secondary site.
-
On the Array pairs page, select the array pairs to enable and click on Next to continue.
-
Review the information on the Ready to complete page and click on Finish to create the array pair.
Configure Protection Groups for SRM
The following step is completed in the Site Recovery interface of the primary site.
-
In the Site Recovery interface click on the Protection Groups tab and then on New Protection Group to get started.
-
On the Name and direction page of the New Protection Group wizard, provide a name for the group and choose the site direction for protection of the data.
-
On the Type page select the protection group type (datastore, VM, or vVol) and select the array pair. Click on Next to continue.
-
On the Datastore groups page, select the datastores to include in the protection group. VMs currently residing on the datastore are displayed for each datastore selected. Click on Next to continue.
-
On the Recovery plan page, optionally choose to add the protection group to a recovery plan. In this case, the recovery plan is not yet created so Do not add to recovery plan is selected. Click on Next to continue.
-
On the Ready to complete page, review the new protection group parameters and click on Finish to create the group.
Configure Recovery Plan for SRM
The following step is completed in the Site Recovery interface of the primary site.
-
In the Site Recovery interface click on the Recovery plan tab and then on New Recovery Plan to get started.
-
On the Name and direction page of the Create Recovery Plan wizard, provide a name for the recovery plan and choose the direction between source and destination sites. Click on Next to continue.
-
On the Protection groups page, select the previously created protection groups to include in the recovery plan. Click on Next to continue.
-
On the Test Networks configure specific networks that will be used during the test of the plan. If no mapping exists or if no network is selected, an isolated test network will be created. Click on Next to continue.
-
On the Ready to complete page, review the chosen parameters and then click on Finish to create the recovery plan.
Disaster recovery operations with SRM
In this section various functions of using disaster recovery with SRM will be covered including, testing failover, performing failover, performing reprotection and failback.
Refer to Operational best practices for more information on using ONTAP storage with SRM disaster recovery operations.
Testing failover with SRM
The following step is completed in the Site Recovery interface.
-
In the Site Recovery interface click on the Recovery plan tab and then select a recovery plan. Click on the Test button to begin testing failover to the secondary site.
-
You can view the progress of the test from the Site Recovery task pane as well the vCenter task pane.
-
SRM sends commands via the SRA to the secondary ONTAP storage system. A FlexClone of the most recent snapshot is created and mounted at the secondary vSphere cluster. The newly mounted datastore can be viewed in the storage inventory.
-
Once the test has completed, click on Cleanup to unmount the datastore and revert back to the original environment.
Run Recovery Plan with SRM
Perform a full recovery and failover to the secondary site.
-
In the Site Recovery interface click on the Recovery plan tab and then select a recovery plan. Click on the Run button to begin failover to the secondary site.
-
Once the failover is complete you can see the datastore mounted and the VMs registered at the secondary site.
Additional functions are possible in SRM once a failover has completed.
Reprotection: Once the recovery process is complete, the previously designated recovery site assumes the role of the new production site. However, it's important to note that the SnapMirror replication is disrupted during the recovery operation, leaving the new production site vulnerable to future disasters. To ensure continued protection, it is recommended to establish new protection for the new production site by replicating it to another site. In cases where the original production site remains functional, the VMware administrator can repurpose it as a new recovery site, effectively reversing the direction of protection. It's crucial to highlight that re-protection is only feasible in non-catastrophic failures, necessitating the eventual recoverability of the original vCenter Servers, ESXi servers, SRM servers, and their respective databases. If these components are unavailable, the creation of a new protection group and a new recovery plan becomes necessary.
Failback: A failback operation is a reverse failover, returning operations to the original site. It's crucial to ensure that the original site has regained functionality before initiating the failback process. To ensure a smooth failback, it's recommended to conduct a test failover after completing the reprotection process and before executing the final failback. This practice serves as a verification step, confirming that the systems at the original site are fully capable of handling the operation. By following this approach, you can minimize risks and ensure a more reliable transition back to the original production environment.
Additional information
For NetApp documentation on using ONTAP storage with VMware SRM refer to VMware Site Recovery Manager with ONTAP
For information on configuring ONTAP storage systems refer to the ONTAP 9 Documentation center.
For information on configuring VCF refer to VMware Cloud Foundation Documentation.