Snapshot-based backups

03/31/2026 Contributors

The foundation of Oracle database data protection on ONTAP is NetApp Snapshot technology.

The key values are as follows:

Simplicity. A snapshot is a read-only copy of the contents of a container of data at a specific point in time.
Efficiency. Snapshots require no space at the moment of creation. Space is only consumed when data is changed.
Manageability. A backup strategy based on snapshots is easy to configure and manage because snapshots are a native part of the storage OS. If the storage system is powered on, it is ready to create backups.
Scalability. Up to 1024 backups of a single container of files and LUNs can be preserved. For complex datasets, multiple containers of data can be protected by a single, consistent set of snapshots.
Performance is unaffected, whether data is prorected by 1024 snapshots or none.

Although many storage vendors offer snapshot technology, the Snapshot technology within ONTAP is unique and offers significant benefits to enterprise application and database environments:

Snapshot copies are part of the underlying Write-Anywhere File Layout (WAFL). They are not an add-on or external technology. This simplifies management because the storage system is the backup system.
Snapshot copies do not affect performance, except for some edge cases such as when so much data is stored in snapshots that the underlying storage system fills up.
The term "consistency group" is often used to refer to a grouping of storage objects that are managed as a consistent collection of data. A snapshot of a particular AFF volume constitutes consistency group backup.
Furthermore, multiple AFF volumes or ASA LUNs/namespaces can be easily bonded together as a consistency group, with data protection policies applied as a single unit.

ONTAP snapshots also scale better than competing technology. Customers can store 5, 50, or 500 snapshots without affecting performance. The maximum number of snapshots currently allowed in an AFF volume or ASA LUN/namespace is 1024. If additional snapshot retention is required, there are options to cascade snapshots.

As a result, protecting a dataset hosted on ONTAP is simple and highly scalable. Backups do not require movement of data, therefore a backup strategy can be tailored to the needs of the business rather than the limitations of network transfer rates, large number of tape drives, or disk staging areas.

Is a snapshot a backup?

One commonly asked question about the use of snapshots as a data protection strategy is the fact that the "real" data and the snapshot data are located on the same drives. Loss of those drives would result in the loss of both the primary data and the backup.

This is a valid concern. Local snapshots are used for day-to-day backup and recovery needs, and in that respect the snapshot is a backup. Close to 99% of all recovery scenarios in NetApp environments rely on snapshots to meet even the most aggressive RTO requirements.

Local snapshots should, however, never be the only backup strategy, which is why NetApp offers technology such as SnapMirror and SnapVault replication to quickly and efficiently replicate snapshots to an independent set of drives. In a properly architected solution with snapshots plus snapshot replication, the use of tape can be minimized to perhaps a quarterly archive or eliminated entirely.

Snapshot-based backups

There are many options for using ONTAP Snapshot copies to protect your data, and snapshots are the basis for many other ONTAP features, including replication, disaster recovery, and cloning. A complete description of snapshot technology is beyond the scope of this document, but the following sections provide a general overview.

There are two primary approaches to creating a snapshot of a dataset:

Crash-consistent backups
Application-consistent backups

A crash-consistent backup of a dataset refers to the capture of the entire dataset structure at a single point in time. If the dataset is stored in a single volume, then the process is simple; a Snapshot can be created at any time. If a dataset spans volumes, a consistency group (CG) snapshot must be created. Several options exist for creating CG snapshots, including NetApp SnapCenter software, native ONTAP consistency group features, and user-maintained scripts.

Crash-consistent backups are primarily used when point-of-the-backup recovery is sufficient. When more granular recover is required, application-consistent backups are usually required.

The word "consistent" in "application-consistent" is often a misnomer. For example, placing an Oracle database in backup mode is referred to as an application-consistent backup, but the data is not made consistent or quiesced in any way. The data continue to change throughout the backup. In contrast, most MySQL and Microsoft SQL Server backups do indeed quiesce the data before executing the backup. VMware may or may not make certain files consistent.

Consistency groups

The term "consistency group" refers to the ability of a storage array to manage multiple storage resources as a single image. For example, a database might consist of 10 LUNs. The array must be able to back up, restore, and replicate those 10 LUNs in a consistent manner. Restoration is not possible if the images of the LUNs were not consistent at the point of backup. Replicating those 10 LUNs requires that all the replicas are perfectly synchronized with each other.

ONTAP has always been able to capture consistent local and replicated images of data. Although the various volumes on an ONTAP AFF/FAS system are not usually formally described as a consistency group, that is what they are. A snapshot of that volume is a consistency group image, restoration for that snapshot is a consistency group restoration, and both SnapMirror and SnapVault offer consistency group replication.

AFF systems also include a broader type of consistency group. Multiple volumes can be defined as a CG within ONTAP. Snapshots, clones, and replication can then be configured at the CG level. This simplifies data protection strategies because it allows policies to be set on datasets, not just individual volumes or LUNs. CGs also exist in ASA systems. Multiple LUNs or namespaces can be bound together as a CG and that CG can then be protected with snapshots, replicated, restored, or cloned.

Consistency group snapshots

Consistency group (CG) snapshots are an extension of the basic ONTAP Snapshot technology. A standard snapshot operation creates a consistent image of all data within a single AFF/FAS volume or ASA LUN/namespace, but sometimes it is necessary to create a consistent set of snapshots across multiple storage resources and even across multiple storage systems. The result is a set of snapshots that can be used in the same way as a snapshot of just one individual volume. They can be used for local data recovery, replicated for disaster recovery purposes, or cloned as a single consistent unit.

CG snapshots scale extremely well. The largest known use of a CG snapshot is for a database environment of approximately 1PB in size spanning 12 controllers. The CG snapshots created on this system have been used for backup, recovery and cloning.

Most of the time, when a data set spans AFF volumes or ASA LUNs/namespaces, and write order must be preserved, an ONTAP consistency group can simply be defined and the group of volumes, LUNs, or namespaces can be managed natively to create snapshots. If management software is used, it should detect the need for a CG snapshots and call the required APIs.

There is no need to understand the technical details of CG snapshot in such cases. However, there are situations in which complicated data protection requirements require detailed control over the data protection and replication process. Automation workflows or the use of custom scripts to call the CG snapshot APIs are some of options. Understanding the best option and the role of CG snapshots requires a more detailed explanation of the technology.

Creation of a set of consistency group snapshots is a two-step process:

Establish write fencing on all target AFF volumes or ASA LUNs/namespaces.
Create snapshots of those volumes, LUNs, or namespaces while in the fenced state.

Write fencing is established serially. This means that as the fencing process is set up across multiple storage targets, write I/O is frozen on the first object in the sequence as it continues to be frozen on targets that appear later in the list. This might initially appear to violate the requirement for write order to be preserved, but that only applies to I/O that is issued asynchronously on the host and does not depend on any other writes.

For example, a database might issue a lot of asynchronous datafile updates and allow the OS to reorder the I/O and complete them according to its own scheduler configuration. The order of this type of I/O cannot be guaranteed because the application and operating system have already released the requirement to preserve write order.

As a counter example, most database logging activity is synchronous. The database does not proceed with further log writes until the I/O is acknowledged, and the order of those writes must be preserved. If a log I/O arrives on a fenced LUN, it is not acknowledged and the application blocks on further writes. Likewise, file system metadata I/O is usually synchronous. For example, a file deletion operation must not be lost. If an operating system with an xfs file system deleted a file and the I/O that updated the xfs file system metadata to remove the reference to that file landed on a fenced LUN, then the file system activity would pause. This guarantees the integrity of the file system during CG operations.

After write fencing is set up across the targets, they are ready for snapshot creation. The snapshots need not be created at precisely the same time because the state of the targets is frozen from a dependent write point of view. To guard against a flaw in the application creating the CG snapshots, the initial write fencing includes a configurable timeout in which ONTAP automatically releases the fencing and resumes write processing after a defined number of seconds. If all the snapshots are created before the timeout period lapses, then the resulting set of snapshots are a valid consistency group.

Dependent write order

From a technical point of view, the key to a consistency group is preserving write order and, specifically, dependent write order. For example, a database writing to 10 LUNs writes simultaneously to all of them. Many writes are issued asynchronously, meaning that the order in which they are completed is unimportant and the actual order they are completed varies based on operating system and network behavior.

Some write operations must be present on disk before the database can proceed with additional writes. These critical write operations are called dependent writes. Subsequent write I/O depends on the presence of these writes on disk. Any snapshot, recovery, or replication of these 10 LUNs must make sure that dependent write order is guaranteed. File system updates are another example of write-order dependent writes. The order in which file system changes are made must be preserved or the entire file system could become corrupt.

Strategies

There are two primary approaches to snapshot-based backups:

Crash-consistent backups
Snapshot-protected online backups

A crash-consistent backup of a database refers to the capture of the entire database structure, including datafiles, redo logs, and control files, at a single point in time. If the database is stored in a single volume, LUN or namespace, then the process is simple; a snapshot can be created at any time. If a database spans AFF volumes or ASA LUNs/namespaces , a consistency group (CG) snapshot must be created. Several options exist for creating CG snapshots, including NetApp SnapCenter software, native ONTAP consistency group features, and user-maintained scripts.

Crash-consistent snapshot backups are primarily used when point-of-the-backup recovery is sufficient. Archive logs can be applied under some circumstances, but when more granular point-in-time recovery is required, a online backup is preferable.

The basic procedure for a snapshot-based online backup is as follows:

Place the database in backup mode.
Create a snapshot of all storage resources (NFS exports, LUNs, or NVMe namespaces) hosting datafiles.
Exit backup mode.
Run the command alter system archive log current to force log archiving.
Create snapshots of all storage resources hosting the archive logs.

This procedure yields a set of snapshots containing datafiles in backup mode and the critical archive logs generated while in backup mode. These are the two requirements for recovering a database. Files such as control files should also be protected for convenience, but the only absolute requirement is protection for datafiles and archive logs.

Although different customers might have very different strategies, almost all of these strategies are ultimately based on the the same principles outlined below.

Snapshot-based recovery

When designing storage layouts for Oracle databases, the first decision is whether to use volume-based NetApp SnapRestore (VBSR) technology, which is the underlying technology used for restoring AFF volumes and ASA LUNs/namespaces.

VBSR allows data to be almost instantly reverted to an earlier point in time. Because all of the data on the reverted, VBSR might not be appropriate for all use cases. For example, if an entire database, including datafiles, redo logs, and archive logs, is stored on a single AFF volume and this volume is restored with VBSR, then data is lost because the newer archive log and redo data are discarded. The same applies to ASA data. If the entire database was stored in a single ASA consistency group, and that CG was restored to an earlier state, some of the later archive log and redo data will be lost.

VBSR is not required for restore. Many databases can be restored by using file-based single-file SnapRestore (SFSR) or by simply cloning files from the snapshot back into the active file system.

VBSR is preferred when a database is very large or when it must be recovered as quickly as possible, and the use of VBSR requires isolation of the datafiles. In an NFS environment, the datafiles of a given database must be stored in dedicated volumes that are uncontaminated by any other type of file. In a SAN environment, datafiles must be stored in dedicated LUNs or namespaces. If a volume manager is used (including Oracle Automatic Storage Management [ASM]), the diskgroup must also be dedicated to datafiles.

Isolating datafiles in this manner allows them to be reverted to an earlier state without damaging other filesystems.

AFF snapshot reserve

For each volume with Oracle data in an AFF SAN environment, the percent-snapshot-space should be set to zero because reserving space for a snapshot in a LUN environment is not useful. If the fractional reserve is set to 100, a snapshot of a volume with LUNs requires enough free space in the volume, excluding the snapshot reserve, to absorb 100% turnover of all of the data. If the fractional reserve is set to a lower value, then a correspondingly smaller amount of free space is required, but it always excludes the snapshot reserve. This means that the snapshot reserve space in a LUN environment is wasted.

Snapshot reserve does not apply to ASA storage.

In an NFS environment, there are two options:

Set the percent-snapshot-space based on expected snapshot space consumption.
Set the percent-snapshot-space to zero and manage active and snapshot space consumption collectively.

With the first option, percent-snapshot-space is set to a nonzero value, typically around 20%. This space is then hidden from the user. This value does not, however, create a limit on utilization. If a database with a 20% reservation experiences 30% turnover, the snapshot space can grow beyond the bounds of the 20% reserve and occupy unreserved space.

The main benefit of setting a reserve to a value such as 20% is to verify that some space is always available for snapshots. For example, a 1TB volume with a 20% reserve would only permit a database administrator (DBA) to store 800GB of data. This configuration guarantees at least 200GB of space for snapshot consumption.

When percent-snapshot-space is set to zero, all space in the volume is available to the end user, which delivers better visibility. A DBA must understand that, if he or she sees a 1TB volume that leverages snapshots, this 1TB of space is shared between active data and Snapshot turnover.

There is no clear preference between option one and option two among end users.

ONTAP and third-party snapshots

Oracle Doc ID 604683.1 explains the requirements for third-party snapshot support and the multiple options available for backup and restore operations.

The third-party vendor must guarantee that the company's snapshots conform to the following requirements:

Snapshots must integrate with Oracle's recommended restore and recovery operations.
Snapshots must be database crash consistent at the point of the snapshot.
Write ordering is preserved for each file within a snapshot.

ONTAP and NetApp Oracle management products comply with these requirements.

Snapshot-based backups

Creating your file...

Is a snapshot a backup?

Snapshot-based backups

Consistency groups

Consistency group snapshots

Dependent write order

Strategies

Snapshot-based recovery

AFF snapshot reserve

ONTAP and third-party snapshots