Online backups
Two sets of data are required to protect and recover an Oracle database in backup mode. Note that this is not the only Oracle backup option, but it is the most common.
-
A snapshot of the datafiles in backup mode
-
The archive logs created while the datafiles were in backup mode
If complete recovery including all committed transactions is required, a third item is required:
-
A set of current redo logs
There are a number of ways to drive recovery of an online backup. Many customers restore snapshots by using the ONTAP CLI and then using Oracle RMAN or sqlplus to complete the recovery. This is especially common with large production environments in which the probability and frequency of database restores is extremely low and any restore procedure is handled by a skilled DBA. For complete automation, solutions such as NetApp SnapCenter include an Oracle plug-in with both command- line and graphical interfaces.
Some large-scale customers have taken a simpler approach by configuring basic scripting on the hosts to place the databases in backup mode at a specific time in preparation for a scheduled snapshot. For example, schedule the command alter database begin backup
at 23:58, alter database end backup
at 00:02, and then schedule snapshots directly on the storage system at midnight. The result is a simple, highly scalable backup strategy that requires no external software or licenses.
Data layout
The simplest layout is to isolate datafiles into one or more dedicated volumes. They must be uncontaminated by any other file type. This is to make sure that the datafile volumes can be rapidly restored through a SnapRestore operation without destroying an important redo log, controlfile, or archive log.
SAN has similar requirements for datafile isolation within dedicated volumes. With an operating system such as Microsoft Windows, a single volume might contain multiple datafile LUNs, each with an NTFS file system. With other operating systems, there is generally a logical volume manager. For example, with Oracle ASM, the simplest option would be to confine the LUNs of an ASM disk group to a single volume that can be backed up and restored as a unit. If additional volumes are required for performance or capacity management reasons, creating an additional disk group on the new volume results in simpler management.
If these guidelines are followed, snapshots can be scheduled directly on the storage system with no requirement for performing a consistency group snapshot. The reason is that Oracle backups do not require datafiles to be backed up at the same time. The online backup procedure was designed to allow datafiles to continue to be updated as they are slowly streamed to tape over the course of hours.
A complication arises in situations such as the use of an ASM disk group that is distributed across volumes. In these cases, a cg-snapshot must be performed to make sure that the ASM metadata is consistent across all constituent volumes.
Caution: Verify that the ASM spfile
and passwd
files are not in the disk group hosting the datafiles. This interferes with the ability to selectively restore datafiles and only datafiles.
Local recovery procedure—NFS
This procedure can be driven manually or through an application such as SnapCenter. The basic procedure is as follows:
-
Shut down the database.
-
Recover the datafile volume(s) to the snapshot immediately prior to the desired restore point.
-
Replay archive logs to the desired point.
-
Replay current redo logs if complete recovery is desired.
This procedure assumes that the desired archive logs are still present in the active file system. If they are not, the archive logs must be restored or rman/sqlplus can be directed to the data in the snapshot directory.
In addition, for smaller databases, datafiles can be recovered by an end user directly from the .snapshot
directory without assistance from automation tools or storage administrators to execute a snaprestore
command.
Local recovery procedure—SAN
This procedure can be driven manually or through an application such as SnapCenter. The basic procedure is as follows:
-
Shut down the database.
-
Quiesce the disk group(s) hosting the datafiles. The procedure varies depending on the logical volume manager chosen. With ASM, the process requires dismounting the disk group. With Linux, the file systems must be dismounted, and the logical volumes and volume groups must be deactivated. The objective is to stop all updates on the target volume group to be restored.
-
Restore the datafile disk groups to the snapshot immediately prior to the desired restore point.
-
Reactivate the newly restored disk groups.
-
Replay archive logs to the desired point.
-
Replay all redo logs if complete recovery is desired.
This procedure assumes that the desired archive logs are still present in the active file system. If they are not, the archive logs must be restored by taking the archive log LUNs offline and performing a restore. This is also an example in which dividing up archive logs into dedicated volumes is useful. If the archive logs share a volume group with redo logs, then the redo logs must be copied elsewhere before restoration of the overall set of LUNs. This step prevents the loss of those final recorded transactions.