EC2 and FSx Oracle database management

09/12/2024 Contributors

In addition to the AWS EC2 and FSx management console, the Ansible control node and the SnapCenter UI tool are deployed for database management in this Oracle environment.

An Ansible control node can be used to manage Oracle environment configuration, with parallel updates that keep primary and standby instances in sync for kernel or patch updates. Failover, resync, and failback can be automated with the NetApp Automation Toolkit to archive fast application recovery and availability with Ansible. Some repeatable database management tasks can be executed using a playbook to reduce human errors.

The SnapCenter UI tool can perform database snapshot backup, point-in-time recovery, database cloning, and so on with the SnapCenter plugin for Oracle databases. For more information about Oracle plugin features, see the SnapCenter Plug-in for Oracle Database overview.

The following sections provide details on how key functions of Oracle database management are fulfilled with the SnapCenter UI:

Database snapshot backups
Database point-in-time restore
Database clone creation

Database cloning creates a replica of a primary database on a separate EC2 host for data recovery in the event of logical data error or corruption, and clones can also be used for application testing, debugging, patch validation, and so on.

Taking a snapshot

An EC2/FSx Oracle database is regularly backed up at intervals configured by the user. A user can also take a one-off snapshot backup at any time. This applies to both full-database snapshot backups as well as archive-log-only snapshot backups.

Taking a full database snapshot

A full database snapshot includes all Oracle files, including data files, control files, and archive log files.

Log into the SnapCenter UI and click Resources in the left-side menu. From the View dropdown, change to the Resource Group view.
Click the full backup resource name, and then click the Backup Now icon to initiate an add-hoc backup.
Click Backup and then confirm the backup to start a full database backup.

From the Resource view for the database, open the database Managed Backup Copies page to verify that the one-off backup completed successfully. A full database backup creates two snapshots: one for the data volume and one for the log volume.

Taking an archive log snapshot

An archive log snapshot is only taken for the Oracle archive log volume.

Log into the SnapCenter UI and click the Resources tab in the left-side menu bar. From the View dropdown, change to the Resource Group view.
Click the log backup resource name, and then click the Backup Now icon to initiate an add-hoc backup for archive logs.
Click Backup and then confirm the backup to start an archive log backup.

From the Resource view for the database, open the database Managed Backup Copies page to verify that the one-off archive log backup completed successfully. An archive log backup creates one snapshot for the log volume.

Restoring to a point in time

SnapCenter-based restore to a point in time is executed on the same EC2 instance host. Complete the following steps to perform the restore:

From the SnapCenter Resources tab > Database view, click the database name to open the database backup.
Select the database backup copy and the desired point in time to be restored. Also mark down the corresponding SCN number for the point in time. The point-in-time restore can be performed using either the time or the SCN.
Highlight the log volume snapshot and click the Mount button to mount the volume.
Choose the primary EC2 instance to mount the log volume.
Verify that the mount job completes successfully. Also check on the EC2 instance host to see the that log volume mounted and also the mount point path.

Copy the archive logs from the mounted log volume to the current archive log directory.

[ec2-user@ip-10-0-0-151 ~]$ cp /var/opt/snapcenter/sco/backup_mount/ip-10-0-0-151_03-25-2022_11.15.01.1503_1/ORCL/1/db/ORCL_A/arch/*.arc /ora_nfs_log/db/ORCL_A/arch/

Return to the SnapCenter Resource tab > database backup page, highlight the data snapshot copy, and click the Restore button to start the database restore workflow.
Check "All Datafiles" and "Change database state if needed for restore and recovery", and click Next.
Choose a desired recovery scope using either SCN or time. Rather than copying the mounted archive logs to the current log directory as demonstrated in step 6, the mounted archive log path can be listed in "Specify external archive log files locations" for recovery.
Specify an optional prescript to run if necessary.
Specify an optional afterscript to run if necessary. Check the open database after recovery.
Provide an SMTP server and email address if a job notification is needed.
Restore the job summary. Click finish to launch the restore job.
Validate the restore from SnapCenter.
Validate the restore from the EC2 instance host.
To unmount the restore log volume, reverse the steps in step 4.

Creating a database clone

The following section demonstrates how to use the SnapCenter clone workflow to create a database clone from a primary database to a standby EC2 instance.

Take a full snapshot backup of the primary database from SnapCenter using the full backup resource group.
From the SnapCenter Resource tab > Database view, open the Database Backup Management page for the primary database that the replica is to be created from.
Mount the log volume snapshot taken in step 4 to the standby EC2 instance host.
Highlight the snapshot copy to be cloned for the replica, and click the Clone button to start the clone procedure.
Change the replica copy name so that it is different from the primary database name. Click Next.
Change the clone host to the standby EC2 host, accept the default naming, and click Next.
Change your Oracle home settings to match those configured for the target Oracle server host, and click Next.
Specify a recovery point using either time or the SCN and mounted archive log path.
Send the SMTP email settings if needed.
Clone the job summary, and click Finish to launch the clone job.
Validate the replica clone by reviewing the clone job log.

The cloned database is registered in SnapCenter immediately.
Turn off Oracle archive log mode. Log into the EC2 instance as oracle user and execute following command:
```
sqlplus / as sysdba
```
Cli
Copy
```
shutdown immediate;
```
Cli
Copy
```
startup mount;
```
Cli
Copy
```
alter database noarchivelog;
```
Cli
Copy
```
alter database open;
```
Cli
Copy

Instead primary Oracle backup copies, a clone can also be created from replicated secondary backup copies on target FSx cluster with same procedures.

HA failover to standby and resync

The standby Oracle HA cluster provides high availability in the event of failure in the primary site, either in the compute layer or in the storage layer. One significant benefit of the solution is that a user can test and validate the infrastructure at any time or with any frequency. Failover can be user simulated or triggered by real failure. The failover processes are identical and can be automated for fast application recovery.

See the following list of failover procedures:

For a simulated failover, run a log snapshot backup to flush the latest transactions to the standby site, as demonstrated in the section Taking an archive log snapshot. For a failover triggered by an actual failure, the last recoverable data is replicated to the standby site with the last successful scheduled log volume backup.
Break the SnapMirror between primary and standby FSx cluster.
Mount the replicated standby database volumes at the standby EC2 instance host.
Relink the Oracle binary if the replicated Oracle binary is used for Oracle recovery.
Recover the standby Oracle database to the last available archive log.
Open the standby Oracle database for application and user access.
For an actual primary site failure, the standby Oracle database now takes the role of the new primary site and database volumes can be used to rebuild the failed primary site as a new standby site with the reverse SnapMirror method.
For a simulated primary site failure for testing or validation, shut down the standby Oracle database after the completion of testing exercises. Then unmount the standby database volumes from the standby EC2 instance host and resync replication from the primary site to the standby site.

These procedures can be performed with the NetApp Automation Toolkit available for download at the public NetApp GitHub site.

git clone https://github.com/NetApp-Automation/na_ora_hadr_failover_resync.git

Read the README instruction carefully before attempting setup and failover testing.