The NetApp solution
You can use Azure NetApp Files NetApp Snapshot technology to create database backups in seconds. The time needed to create a Snapshot copy is independent of the size of the database because a Snapshot copy does not move any physical data blocks on the storage platform.
In addition, the use of Snapshot technology has no performance effect on the live SAP system. Therefore, you can schedule the creation of Snapshot copies without considering peak dialog or batch activity periods. SAP on Azure NetApp Files customers typically schedule multiple online Snapshot backups during the day; for example, every four hours is common. These Snapshot backups are typically kept for three to five days on the primary storage system before being removed.
Snapshot copies also provide key advantages for restore and recovery operations. A volume revert operation enables the restoration of an entire database to any point in time, based on the available Snapshot copies. Such restore processes are finished in a few seconds, independent of the size of the database. Because several online Snapshot backups are created during the day, the time needed for the recovery process is significantly reduced relative to a traditional backup approach. Because a restore operation can be performed with a Snapshot copy that is only a few hours old (rather than up to 24 hours), fewer transaction logs must be applied. Therefore, the RTO is reduced to several minutes rather than the several hours required for conventional single-cycle backups.
Snapshot copy backups are stored on the same disk system as the active online data. Therefore, NetApp recommends using Snapshot copy backups as a supplement rather than a replacement for backups to a secondary location. Most restore and recovery actions are handled by using a volume revert operation on the primary storage system. Restores from a secondary location are only necessary if the primary storage system containing the Snapshot copies is damaged. You can use the secondary location if it is necessary to restore a backup that is no longer available from a Snapshot copy.
You can back up to a secondary location by using additional HANA file-based backups. These file-based backups can be scheduled with a much lower frequency, and therefore with less impact on the production system performance.
When Azure NetApp Files Cross Region Replication is used for disaster recovery, all Snapshot backups are also replicated to the disaster recovery site and would then also be available as an offloaded secondary backup. See also TR-4891: SAP HANA disaster recovery with Azure NetApp Files.
You can also offload Snapshot copies for longer term retention to Azure Blob storage using the Azure AzCopy tool. This requires additional scripting outside of SnapCenter Service.
The following figure shows an example of a data protection configuration using Snapshot backups and file-based backups.
Runtime of Snapshot backups
The following figure shows a screenshot of a customer’s HANA Studio running SAP HANA on NetApp storage. The customer is using NetApp Snapshot copies to back up the HANA database. As the figure shows, the HANA database (approximately 2.3TB) is backed up in 2 minutes and 11 seconds by using Snapshot backup technology.
The largest part of the overall backup workflow runtime is the time needed to execute the HANA backup savepoint operation—this step is dependent on the load on the HANA database. The storage Snapshot backup itself always finishes in a couple of seconds.
Recovery Time Objective comparison
This section provides an RTO comparison of file-based and storage-based Snapshot backups. The RTO is defined by the sum of the time needed to restore the database and the time needed to start and recover the database.
Time needed to restore database
With a file-based backup, the restore time depends on the size of the database and backup infrastructure, which defines the restore speed in megabytes per second (MBps). For example, if the infrastructure supports a restore operation at a speed of 250MBps, it takes approximately 1 hour and 10 minutes to restore a database 1TB in size.
With storage Snapshot copy backups, the restore time is independent of the size of the database and is in the range of a couple of seconds when you can perform the restore from primary storage. A restore from a file-based backup is only required in the case of a disaster when the primary storage is no longer available.
Time needed to start database
The database start time depends on the size of the row and column store. For the column store, the start time also depends on how much data is preloaded during the database start. In the following examples, we assume that the start time is 30 minutes. The start time is the same for a file-based restore and recovery and a restore and recovery based on Snapshot copies.
Time needed to recover database
The recovery time depends on the number of logs that must be applied after the restore. This number is determined by the frequency at which data backups are taken.
With file-based data backups, the backup schedule is typically once per day. A higher backup frequency is normally not possible because the backup degrades production performance. Therefore, in the worst case, all the logs that were written during the day must be applied during forward recovery.
Storage Snapshot copy data backups are typically scheduled with a higher frequency because they do not influence the performance of the SAP HANA database. For example, if Snapshot copy backups are scheduled every six hours, the recovery time is, in the worst case, one-fourth of the recovery time for a file-based backup (6 hours / 24 hours = 1/4).
The following figure shows an RTO example for a 1TB database when file-based data backups are used. In this example, a backup is taken once per day. The RTO differs depending on when the restore and recovery were performed. If the restore and recovery were performed immediately after a backup was taken, the RTO is primarily based on the restore time, which is 1 hour and 10 minutes in the example. The recovery time increased to 2 hours and 50 minutes when restore and recovery were performed immediately before the next backup was taken, and the maximum RTO was 4 hours and 30 minutes.
The following figure shows an RTO example for a 1TB database when Snapshot backups are used. With storage-based Snapshot backups, the RTO only depends on the database start time and the forward recovery time because the restore is completed in a few seconds, independent of the size of the database. The forward recovery time also increases depending on when the restore and recovery are done, but due to the higher frequency of backups (every 6 hours in this example), the forward recovery time is 43 minutes at most. In this example, the maximum RTO is 1 hour and 13 minutes.
The following figure shows an RTO comparison of file-based and storage-based Snapshot backups for different database sizes and different frequencies of Snapshot backups. The green bar shows the file-based backup. The other bars show Snapshot copy backups with different backup frequencies.
With a single Snapshot copy data backup per day, the RTO is already reduced by 40% when compared to a file-based data backup. The reduction increases to 70% when four Snapshot backups are taken per day. The figure also shows that the curve goes flat if you increase the Snapshot backup frequency to more than four to six Snapshot backups per day. Our customers, therefore, typically configure four to six Snapshot backups per day.
This graph shows the HANA server RAM size. The database size in memory is calculated to be half of the server RAM size.
The restore and recovery time is calculated based on the following assumptions: the database can be restored at 250MBps; the number of log files per day is 50% of the database size (for example, a 1TB database creates 500MB of log files per day); and a recovery can be performed at 100MBps.