Using Veeam Replication and Azure NetApp Files datastore for disaster recovery to Azure VMware Solution
Azure NetApp Files (ANF) datastores decouples storage from compute and unlocks the flexibility needed for any organisation to take their workloads to the cloud. It provides customers with flexible, high-performance storage infrastructure that scales independently of compute resources. Azure NetApp Files datastore’s simplifies and optimizes the deployment alongside Azure VMware Solution (AVS) as a disaster recovery site for on premises VMWare environments.
Author: Niyaz Mohamed - NetApp Solutions Engineering
Overview
Azure NetApp Files (ANF) volume based NFS datastores can be used to replicate data from on-premises using any validated third-party solution that provides VM replication capability. By adding Azure NetApp Files datastores, it will enable cost optimised deployment vs building an Azure VMware Solution SDDC with enormous amount of ESXi hosts to accommodate the storage. This approach is called a “Pilot Light Cluster”. A pilot light cluster is a minimal AVS host configuration (3 x AVS nodes) along with Azure NetApp Files Datastore capacity.
The objective is to maintain a low-cost infrastructure with all the core components to handle a failover. A pilot light cluster can scale out and provision more AVS hosts if a failover does occur. And once the failover is complete and normal operations are restored, the pilot light cluster can scale back down to low-cost mode of operations.
Purposes of this document
This article describes how to use Azure NetApp Files datastore with Veeam Backup and replication to set up disaster recovery for on-premises VMware VMs to (AVS) using the Veeam VM replication software functionality.
Veeam Backup & Replication is a backup and replication application for virtual environments. When virtual machines are replicated, Veeam Backup & Replication is replicated from on AVS, the software will create an exact copy of the VMs in the native VMware vSphere format on the target AVS SDDC cluster. Veeam Backup & Replication will keep the copy synchronized with the original VM. Replication provides the best recovery time objective (RTO) as there is a mounted copy of a VM at the DR site in a ready-to-start state.
This replication mechanism ensures that the workloads can quickly start in a AVS SDDC in the case of a disaster event. The Veeam Backup & Replication software also optimizes traffic transmission for replication over WAN and slow connections. In addition, it also filters out duplicate data blocks, zero data blocks, swap files, and “excluded VM guest OS files”. The software will also compress the replica traffic. To prevent replication jobs from consuming the entire network bandwidth, WAN accelerators and network throttling rules can be utilized.
The replication process in Veeam Backup & Replication is job driven which means replication is performed by configuring replication jobs. In the case of a disaster event, failover can be triggered to recover the VMs by failing over to its replica copy. When failover is performed, a replicated VM takes over the role of the original VM. Failover can be performed to the latest state of a replica or to any of its good known restore points. This enables ransomware recovery or isolated testing as needed. Veeam Backup & Replication offers multiple options to handle different disaster recovery scenarios.
Solution Deployment
High level steps
-
Veeam Backup and Replication software is running in an on-premises environment with appropriate network connectivity.
-
Deploy Azure VMware Solution (AVS) private cloud and attach Azure NetApp Files datastores to Azure VMware Solution hosts.
A pilot-light environment set up with a minimal configuration can be used for DR purposes. VMs will fail over to this cluster in the event of an incident, and additional nodes can be added).
-
Set up replication job to create VM replicas using Veeam Backup and Replication.
-
Create failover plan and perform failover.
-
Switch back to production VMs once the disaster event is complete and primary site is Up.
Pre-requisites for Veeam VM Replication to AVS and ANF datastores
-
Ensure the Veeam Backup & Replication backup VM is connected to the source as well as the target AVS SDDC clusters.
-
The backup server must be able to resolve short names and connect to source and target vCenters.
-
The target Azure NetApp Files datastore must have enough free space to store VMDKs of replicated VMs.
For additional information, refer to "Considerations and Limitations” covered here.
Deployment Details
Step 1: Replicate VMs
Veeam Backup & Replication leverages VMware vSphere snapshot capabilities/During replication, Veeam Backup & Replication requests VMware vSphere to create a VM snapshot. The VM snapshot is the point-in-time copy of a VM that includes virtual disks, system state, configuration and metadata. Veeam Backup & Replication uses the snapshot as a source of data for replication.
To replicate VMs, follow the below steps:
-
Open the Veeam Backup & Replication Console.
-
On the Home view. Right click the jobs node and select Replication Job > Virtual machine.
-
Specify a job name and select the appropriate advanced control checkbox. Click Next.
-
Select the Replica seeding check box if connectivity between on-premises and Azure has restricted bandwidth.
*Select the Network remapping (for AVS SDDC sites with different networks) check box if segments on Azure VMware Solution SDDC do not match that of on-premises site networks. -
If the IP addressing scheme in on-premises production site differs from the scheme in the target AVS site, select the Replica re-IP (for DR sites with different IP addressing scheme) check box.
-
-
Select the VMs to be replicated to Azure NetApp Files datastore attached to a Azure VMware Solution SDDC in the Virtual Machines* step. The Virtual machines can be placed on vSAN to fill the available vSAN datastore capacity. In a pilot light cluster, the usable capacity of a 3-node cluster will be limited. The rest of the data can be easily placed on Azure NetApp Files datastores so that the VMs can recovered, and cluster can be expanded to meet the CPU/mem requirements. Click Add, then in the Add Object window select the necessary VMs or VM containers and click Add. Click Next.
-
After that, select the destination as Azure VMware Solution SDDC cluster / host and the appropriate resource pool, VM folder and FSx ONTAP datastore for VM replicas. Then click Next.
-
In the next step, create the mapping between source and destination virtual network as needed.
-
In the Job Settings step, specify the backup repository that will store metadata for VM replicas, retention policy and so on.
-
Update the Source and Target proxy servers in the Data Transfer step and leave Automatic selection (default) and keep Direct option selected and click Next.
-
At the Guest Processing step, select Enable application-aware processing option as needed. Click Next.
-
Choose the replication schedule to run the replication job to run on a regular basis.
-
At the Summary step of the wizard, review details of the replication job. To start the job right after the wizard is closed, select the Run the job when I click Finish check box, otherwise leave the check box unselected. Then click Finish to close the wizard.
Once the replication job starts, the VMs with the suffix specified will be populated on the destination AVS SDDC cluster / host.
For additional information for Veeam replication, refer How Replication Works
Step 2: Create a failover plan
When the initial replication or seeding is complete, create the failover plan. Failover plan helps in performing failover for dependent VMs one by one or as a group automatically. Failover plan is the blueprint for the order in which the VMs are processed including the boot delays. The failover plan also helps to ensure that critical dependant VMs are already running.
To create the plan, navigate to the new sub section called Replicas and select Failover Plan. Choose the appropriate VMs. Veeam Backup & Replication will look for the closest restore points to this point in time and use them to start VM replicas.
The failover plan can only be added once the initial replication is complete and the VM replicas are in Ready state. |
The maximum number of VMs that can be started simultaneously when running a failover plan is 10 |
During the failover process, the source VMs will not be powered off |
To create the Failover Plan, do the following:
-
On the Home view. Right click the Replicas node and select Failover Plans > Failover Plan > VMware vSphere.
-
Next provide a name and a description to the plan. Pre and Post-failover script can be added as required. For instance, run a script to shutdown VMs before starting the replicated VMs.
-
Add the VMs to the plan and modify the VM boot order and boot delays to meet the application dependencies.
For additional information for creating replication jobs, refer Creating Replication Jobs.
Step 3: Run the failover plan
During failover, the source VM in the production site is switched over to its replica at the disaster recovery site. As part of the failover process, Veeam Backup & Replication restores the VM replica to the required restore point and moves all I/O activities from the source VM to its replica. Replicas can be used not only in case of a disaster, but also to simulate DR drills. During failover simulation, the source VM remains running. Once all the necessary tests have been conducted, you can undo the failover and return to normal operations.
Make sure network segmentation is in place to avoid IP conflicts during failover. |
To start the failover plan, simply click in Failover Plans tab and right click on your failover plan. Select *Start. This will failover using the latest restore points of VM replicas. To fail over to specific restore points of VM replicas, select Start to.
The state of the VM replica changes from Ready to Failover and VMs will start on the destination Azure VMware Solution (AVS) SDDC cluster / host.
Once the failover is complete, the status of the VMs will change to “Failover”.
Veeam Backup & Replication stops all replication activities for the source VM until its replica is returned to the Ready state. |
For detailed information about failover plans, refer Failover Plans.
Step 4: Failback to the Production site
When the failover plan is running, it is considered as an intermediate step and needs to be finalized based on the requirement. The options include the following:
-
Failback to production - switch back to the original VM and transfer all changes that took place while the VM replica was running to the original VM.
When you perform failback, changes are only transferred but not published. Choose Commit failback (once the original VM is confirmed to work as expected) or Undo failback to get back to the VM replica If the original VM is not working as expected. |
-
Undo failover - switch back to the original VM and discard all changes made to the VM replica while it was running.
-
Permanent Failover - permanently switch from the original VM to a VM replica and use this replica as the original VM.
In this demo, Failback to production was chosen. Failback to the original VM was selected during the Destination step of the wizard and “Power on VM after restoring” check box was enabled.
Failback commit is one of the ways to finalize failback operation. When failback is committed, it confirms that the changes sent to the VM which is failed back (the production VM) are working as expected. After the commit operation, Veeam Backup & Replication resumes replication activities for the production VM.
For detailed information about the failback process, refer Veeam documentation for Failover and Failback for replication.
After failback to production is successful, the VMs are all restored back to the original production site.
Conclusion
Azure NetApp Files datastore capability enables Veeam or any validated third-party tool to provide a low-cost DR solution by leveraging Pilot light clusters instead of standing up a large cluster only to accommodate VM replicas. This provides an efficacious way to handle a tailored, customized disaster recovery plan and to reuse existing backup products in house for DR, enabling cloud-based disaster recovery by exiting on-premises DR datacenters. It is possible to failover by clicking a button in case of disaster or to failover automatically if a disaster occurs.
To learn more about this process, feel free to follow the detailed walkthrough video.