Using Veeam Replication and Google Cloud NetApp Volumes datastore for disaster recovery to Google Cloud VMware Engine
A comprehensive disaster recovery plan is critical for businesses in times of crisis. Many organizations leverage cloud computing for daily operations and disaster recovery. This proactive approach can reduce or eliminate expensive business disruptions.
This article describes how to use Veeam Backup & Replication to set up disaster recovery for on-premises VMware VMs to Google Cloud VMware Engine (GCVE) with Google Cloud NetApp Volumes (NetApp Volumes).
Overview
Google Cloud NetApp Volumes is a storage service from Google and NetApp that is available for Google Cloud. NetApp Volumes service provides high performance NFS/SMB storage. VMware certified NetApp Volumes NFS storage can be used as an external datastore for ESXi hosts in GCVE. Users are required to make a peering connection between their GCVE private cloud and NetApp Volumes project. There are no network charges resulting from storage access within a region. Users can create NetApp Volumes volumes in the Google Cloud console and enable deletion protection before mounting volumes as datastores to their ESXi hosts.
NetApp Volumes based NFS datastores can be used to replicate data from on-premises using any validated third-party solution that provides VM replication capability. By adding NetApp Volumes datastores, it enables cost optimized deployment instead of building a Google Cloud VMware Engine (GCVE) based SDDC with a large number of ESXi hosts to accommodate the storage. This approach is called a “Pilot Light Cluster”. A pilot light cluster is a minimal GCVE host configuration (3 x GCVE ESXi hosts) along with NetApp Volumes datastores capacity to allow for independent scaling to meet capacity requirements.
The objective is to sustain a cost-effective infrastructure with just the core components to manage a failover. A pilot light cluster can expand and add more GCVE hosts in the event of a failover. Once the failover is resolved and normal operations resume, the pilot light cluster can reduce its scale, returning to a low-cost operational mode.
Purposes of this document
This article describes how to use a Google Cloud NetApp Volumes datastore with Veeam Backup & Replication to set up disaster recovery for on-premises VMware VMs to GCVE using the Veeam VM replication software functionality.
Veeam Backup & Replication is a backup and replication application for virtual environments. When virtual machines are replicated, Veeam Backup & Replication will create an exact copy of the VMs in the native VMware vSphere format on the target GCVE SDDC cluster. Veeam Backup & Replication will keep the copy synchronized with the original VM. Replication provides the best recovery time objective (RTO) as there is a mounted copy of a VM at the DR site in a ready-to-start state.
This replication mechanism ensures that the workloads can quickly start in GCVE in the case of a disaster event. The Veeam Backup & Replication software also optimizes traffic transmission for replication over WAN and slow connections. In addition, it also filters out duplicate data blocks, zero data blocks, swap files, and “excluded VM guest OS files”. The software will also compress the replica traffic. To prevent replication jobs from consuming the entire network bandwidth, WAN accelerators and network throttling rules can be utilized.
The replication process in Veeam Backup & Replication is job driven which means replication is performed by configuring replication jobs. In the case of a disaster event, failover can be triggered to recover the VMs by failing over to its replica copy. When failover is performed, a replicated VM takes over the role of the original VM. Failover can be performed to the latest state of a replica or to any of its known good restore points. This enables ransomware recovery or isolated testing as needed. Veeam Backup & Replication offers multiple options to handle different disaster recovery scenarios.
Solution Overview
This solution covers the following high level steps:
-
Create an NFS volume using Google Cloud NetApp Volumes
-
Follow GCP process to create a GCVE datastore from the NetApp Volumes NFS volume.
-
Set up a replication job to create VM replicas using Veeam Backup & Replication.
-
Create a failover plan and perform failover.
-
Switch back to production VMs once the disaster event is complete and the primary site is up.
When creating a volume in NetApp Volumes, for use as a GCVE datastore, only NFS v3 is supported. |
For more information on using NetApp Volumes NFS volumes as datastores for GCVE, check out Using NFS volume as vSphere datastore hosted by Google Cloud NetApp Volumes.
Architecture
The following diagram shows the architecture of the solution presented in this documentation. A recommended best practice is to have a Veeam Backup & Replication server located at both the on-premises site and in the GCVE SDDC. Backup and recovery is performed and managed by the Veeam server on-premises, and replication is managed by the Veeam server in the GCVE SDDC. This architecture provides the highest availability when a failure occurs in the primary datacenter.
Pre-requisites for Veeam Replication to GCVE and NetApp Volumes datastores
This solution requires the following components and configurations:
-
NetApp Volumes has a Storage Pool available with enough free capacity to accommodate the NFS volume to be created.
-
Veeam Backup and Replication software is running in an on-premises environment with appropriate network connectivity.
-
Ensure the Veeam Backup & Replication backup VM is connected to the source as well as the target GCVE SDDC clusters.
-
Ensure the Veeam Backup & Replication backup VM is connected to the Veeam Proxy server VMs at both the source and target GCVE clusters.
-
The backup server must be able to resolve short names and connect to source and target vCenters.
Users are required to make a peering connection between their GCVE private cloud and NetApp Volumes project using the VPC Network peering or Private connections pages within the VMware Engine Cloud console UI.
Veeam requires a GCVE solution user account with elevated privileges when adding the GCVE vCenter server to the Veeam Backup and Replication inventory. For more information refer to the Google Cloud Platform (GCP) documentation, Elevating VMware Engine privileges. |
For additional information refer to Considerations and Limitations in the Veeam Backup & Replication documentation.
Deployment steps
The following sections outline the deployment steps to create and mount an NFS datastore using Google Cloud NetApp Volumes, and use Veeam Backup and Replication to implement a full disaster recovery solution between an on-premises datacenter and Google Cloud VMware Engine.
Create NetApp Volumes NFS volume and datastore for GCVE
Refer to Using NFS volume as vSphere datastore hosted by Google Cloud NetApp Volumes for an overview of how to Google Cloud NetApp Volumes as a datastore for GCVE.
Complete the following steps to create and use an NFS datastore for GCVE using NetApp Volumes:
Create NetApp Volumes NFS volume
Google Cloud NetApp Volumes is accessed from the Google Cloud Platform (GCP) console.
Refer to Create a volume in the Google Cloud NetApp Volumes documentation for detailed information on this step.
-
In a web browser, navigate to https://console.cloud.google.com/ and log into your GCP console. Search for NetApp Volumes to get started.
-
In the NetApp Volumes management interface, click on Create to get started creating an NFS volume.
-
In the Create a volume wizard, fill out all required information:
-
A name for the volume.
-
The Storage Pool on which to create the volume.
-
A share name used when mounting the NFS volume.
-
The capacity of the volume in GiB.
-
The storage protocol to be used.
-
Check the box to Block volume from deletion when clients are connected (required by GCVE when mounting as a datastore).
-
The export rules for accessing the volume. This is the IP addresses of the ESXi adapters on the NFS network.
-
A snapshot schedule used to protect the volume using local snapshots.
-
Optionally, choose to backup the volume and/or create labels for the volume.
When creating a volume in NetApp Volumes, for use as a GCVE datastore, only NFS v3 is supported.
Click on Create to finish creating the volume.
-
-
Once the volume is created, the NFS export path required to mount the volume can be viewed from the volume's properties page.
Mount the NFS datastore in GCVE
At the time of this writing the process to mount a datastore in GCVE requires opening a GCP support ticket to have the volume mounted as an NFS datastore.
Refer to Using NFS volume as vSphere datastore hosted by Google Cloud NetApp Volumes for more information.
Replicate VMs to GCVE and execute failover plan, and failback
Replicate VMs to NFS datastore in GCVE
Veeam Backup & Replication leverages VMware vSphere snapshot capabilities during replication, Veeam Backup & Replication requests VMware vSphere to create a VM snapshot. The VM snapshot is the point-in-time copy of a VM that includes virtual disks, system state, configuration and metadata. Veeam Backup & Replication uses the snapshot as a source of data for replication.
To replicate VMs, complete the following steps:
-
Open the Veeam Backup & Replication Console.
-
On the Home tab, click on Replication Job > Virtual machine…
-
On the Name page of the New Replication Job wizard, specify a job name and select the appropriate advanced control checkboxes.
-
Select the Replica seeding check box if connectivity between on-premises and GCP has restricted bandwidth.
-
Select the Network remapping (for GCVE SDDC sites with different networks) check box if segments on the GCVE SDDC do not match that of the on-premises site networks.
-
Select the Replica re-IP (for DR sites with different IP addressing scheme) check box if the IP addressing scheme in the on-premises production site differs from the scheme in the target GCVE site.
-
-
On the Virtual Machines page, select the VMs to be replicated to the NetApp Volumes datastore attached to a GCVE SDDC. Click Add, then in the Add Object window select the necessary VMs or VM containers and click Add. Click Next.
The Virtual machines can be placed on vSAN to fill the available vSAN datastore capacity. In a pilot light cluster, the usable capacity of a 3-node vSAN cluster will be limited. The rest of the data can be easily placed on Google Cloud NetApp Volumes datastores so that the VMs can be recovered, and the cluster can later be expanded to meet the CPU/mem requirements. -
On the Destination page, select the destination as the GCVE SDDC cluster / hosts and the appropriate resource pool, VM folder and NetApp Volumes datastore for the VM replicas. Click Next to continue.
-
On the Network page, create the mapping between source and target virtual networks as needed. Click Next to continue.
-
On the Re-IP page, click on the Add… button to add a new re-ip rule. Fill out the source and target VM ip ranges to specify the networking that will be applied to the source VM's in the case of a failover. Use asterisks to specify a range of addresses is indicated for that octet. Click Next to continue.
-
On the Job Settings page, specify the backup repository that will store metadata for VM replicas, the retention policy and select the button at the bottom for Advanced… button at the bottom for additional job settings. Click Next to continue.
-
On the Data Transfer, select the proxy servers that reside at the source and targets sites, and keep the Direct option selected. WAN accelerators can also be selected here, if configured. Click Next to continue.
-
On the Guest Processing page, check the box for Enable application-aware processing as needed and select the Guest OS credentials. Click Next to continue.
-
On the Schedule page, define the times and frequency at which the replication job will run. Click Next to continue.
-
Finally, review the job setting on the Summary page. Check the box to Run the job when I click Finish, and click on Finish to complete creating the replication job.
-
Once run, the replication job can be viewed in the job status window.
For additional information on Veeam replication, refer to How Replication Works
Create a failover plan
When the initial replication or seeding is complete, create the failover plan. Failover plan helps in performing failover for dependent VMs one by one or as a group automatically. Failover plan is the blueprint for the order in which the VMs are processed including the boot delays. The failover plan also helps to ensure that critical dependent VMs are already running.
After completing the initial replication or seeding, create a failover plan. This plan serves as a strategic blueprint for orchestrating the failover of dependent VMs, either individually or as a group. It defines the processing order of VMs, incorporates necessary boot delays, and ensures that critical dependent VMs are operational before others. By implementing a well-structured failover plan, organizations can streamline their disaster recovery process, minimizing downtime and maintaining the integrity of interdependent systems during a failover event.
When creating the plan, Veeam Backup & Replication automatically identifies and uses the most recent restore points to initiate the VM replicas.
The failover plan can only be created once the initial replication is complete and the VM replicas are in Ready state. |
The maximum number of VMs that can be started simultaneously when running a failover plan is 10. |
During the failover process, the source VMs will not be powered off. |
To create the Failover Plan, complete the following steps:
-
On the Home view, Click on the Failover Plan button in the Restore section. In the drop down, select VMware vSphere…
-
On the General page of the New Failover Plan wizard, provide a name and a description to the plan. Pre and Post-failover scripts can be added as required. For instance, run a script to shutdown VMs before starting the replicated VMs.
-
On the Virtual Machines page, click the button to Add VM and select From replicas…. Choose the VMs to be part of the failover plan, and then modify the VM boot order and any required boot delays to meet application dependencies.
Click on Apply to continue.
-
Finally review all of the failover plan settings and click on Finish to create the failover plan.
For additional information on creating replication jobs, refer to Creating Replication Jobs.
Run the failover plan
During failover, the source VM in the production site switches over to its replica at the disaster recovery site. As part of the process, Veeam Backup & Replication restores the VM replica to the required restore point and transfers all I/O activities from the source VM to its replica. Replicas serve not only for actual disasters but also for simulating DR drills. In failover simulation, the source VM continues running. Upon completion of necessary tests, the failover can be undone, returning operations to normal.
Make sure network segmentation is in place to avoid IP conflicts during failover. |
Complete the follow steps to start the failover plan:
-
To get started, in the Home view, click on Replicas > Failover Plans in the left-hand menu and then on the Start button. Alternately, the Start to… button can be used to failover to a prior restore point.
-
Monitor the progress of the failover in the Executing failover plan window.
Veeam Backup & Replication stops all replication activities for the source VM until its replica is returned to the Ready state. |
For detailed information about failover plans, refer Failover Plans.
Failback to the production site
Conducting a failover is considered an intermediate step and needs to be finalized based on the requirement. The options include the following:
-
Failback to production - Revert to the original VM and synchronize all modifications made during the replica's active period back to the source VM.
During failback, changes are transferred but not immediately applied. Select Commit failback once the original VM's functionality is verified. Alternatively, choose Undo failback to revert to the VM replica if the original VM exhibits unexpected behavior. |
-
Undo failover - Revert to the original VM, discarding all changes made to the VM replica during its operational period.
-
Permanent Failover - Permanently switch from the original VM to its replica, establishing the replica as the new primary VM for ongoing operations.
In this scenario, the "Failback to production" option was selected.
Complete the following steps to perform a failback to the production site:
-
From the Home view, click on Replicas > Active in the left-hand menu. Select the VMs to be included and click on the Failback to Production button in the top menu.
-
On the Replica page of the Failback wizard, select the replicas to include in the failback job.
-
On the Destination page, select Failback to the original VM and click on Next to continue.
-
On the Failback Mode page, select Auto to start the failback as soon as possible.
-
On the Summary page, choose whether to Power on target VM after restoring and then click on Finish to start the failback job.
Failback commit finalizes the failback operation, confirming the successful integration of changes to the production VM. Upon commit, Veeam Backup & Replication resumes regular replication activities for the restored production VM. This changes the status of the restored replica from Failback to Ready.
-
To commit failback, navigate to Replicas > Active, select the VMs to be committed, right click and select Commit failback.
After failback to production is successful, the VMs are all restored back to the original production site.
For detailed information about the failback process, refer Veeam documentation for Failover and Failback for replication.
Conclusion
Google Cloud NetApp Volumes datastore functionality empowers Veeam and other validated third-party tools to deliver cost-effective disaster recovery (DR) solutions. By utilizing Pilot light clusters instead of large, dedicated clusters for VM replicas, organizations can significantly reduce expenses. This approach enables tailored DR strategies that leverage existing in-house backup solutions for cloud-based disaster recovery, eliminating the need for additional on-premises datacenters. In the event of a disaster, failover can be initiated with a single click or configured to occur automatically, ensuring business continuity with minimal downtime.
To learn more about this process, feel free to follow the detailed walkthrough video.