Restore object data using Grid Manager

02/01/2024 Contributors

You can restore object data for a failed storage volume or Storage Node using Grid Manager. You can also use Grid Manager to monitor restoration processes in progress and display a restoration history.

Before you begin

You have completed either of these procedures to format failed volumes:
- Remount and reformat appliance storage volumes (manual steps)
- Remount and reformat storage volumes (manual steps)
You have confirmed that the Storage Node where you are restoring objects has a Connection State of Connected on the NODES > Overview tab in the Grid Manager.
You have confirmed the following:
- A grid expansion to add a Storage Node is not in process.
- A Storage Node decommission is not in process or failed.
- A recovery of a failed storage volume is not in process.
- A recovery of a Storage Node with a failed system drive is not in process.
- An EC rebalance job is not in process.
- Appliance node cloning is not in process.

About this task

After you have replaced the drives and performed the manual steps to format the volumes, Grid Manager displays the volumes as candidates for restoration on the MAINTENANCE > Volume restoration > Nodes to restore tab.

Whenever possible, restore object data using the Volume restoration page in the Grid Manager. You can either enable automatic restore mode to automatically start volume restoration when the volumes are ready to be restored or manually perform volume restoration. Follow these guidelines:

If the volumes are listed at MAINTENANCE > Volume restoration > Nodes to restore, restore object data as described in the steps below. The volumes will be listed if:
- Some, but not all, storage volumes in a node have failed
- All storage volumes in a node have failed and are being replaced with the same number of volumes or more volumes
The Volume restoration page in the Grid Manager also allows you to monitor the volume restoration process and view restoration history.
If the volumes aren't listed in the Grid Manager as candidates for restoration, follow the appropriate steps for using the repair-data script to restore object data:
- Restoring object data to storage volume (system drive failure)
- Restore object data to storage volume where system drive is intact
- Restore object data to storage volume for appliance
  
  The repair-data script is deprecated and will be removed in a future release.
If the recovered Storage Node contains fewer volumes than the node it is replacing, you must use the repair-data script.

You can restore two types of object data:

Replicated data objects are restored from other locations, assuming that the grid's ILM rules were configured to make object copies available.
- If an ILM rule was configured to store only one replicated copy and that copy existed on a storage volume that failed, you will not be able to recover the object.
- If the only remaining copy of an object is in a Cloud Storage Pool, StorageGRID must issue multiple requests to the Cloud Storage Pool endpoint to restore object data.
- If the only remaining copy of an object is on an Archive Node, object data is retrieved from the Archive Node. Restoring object data to a Storage Node from an Archive Node takes longer than restoring object copies from other Storage Nodes.
Erasure-coded (EC) data objects are restored by reassembling the stored fragments. Corrupt or lost fragments are recreated by the erasure-coding algorithm from the remaining data and parity fragments.

Repairs of erasure-coded data can begin while some Storage Nodes are offline. However, if all erasure-coded data cannot be accounted for, the repair can't be completed. Repair will complete after all nodes are available.

Volume restoration is dependent on the availability of resources where object copies are stored. Progress of volume restoration is nonlinear and might take days or weeks to complete.

Enable automatic restore mode

When you enable Automatic restore mode, volume restoration automatically starts when the volumes are ready to be restored.

Steps

In Grid Manager go to MAINTENANCE > Volume restoration.
Select the Nodes to restore tab, then slide the toggle for Automatic restore mode to the enabled position.

When the confirmation dialog box appears, review the details.

You will not be able to start volume restoration jobs manually on any nodes.
Volume restorations will begin automatically only when no other maintenance procedures are in progress.
You can monitor the status of the job from the progress monitoring page.
StorageGRID automatically retries volume restorations that fail to start.

When you understand the results of enabling Automatic restore mode, select Yes in the confirmation dialog box.

You can disable Automatic restore mode at any time.

Manually restore failed volume or node

Follow these steps to restore a failed volume or node.

Steps

In Grid Manager go to MAINTENANCE > Volume restoration.
Select the Nodes to restore tab, then slide the toggle for Automatic restore mode to the disabled position.

The number on the tab indicates the number of nodes with volumes requiring restoration.
Expand each node to see the volumes in it that need restoration and their status.
Correct any issues preventing restoration of each volume. Issues will be indicated when you select Waiting for manual steps, if it displays as the volume status.
Select a node to restore where all the volumes indicate a Ready to restore status.

You can only restore the volumes for one node at a time.

Each volume in the node must indicate that it is ready to restore.
Select Start restore.
Address any warnings that might appear or select Start anyway to ignore the warnings and start the restoration.

Nodes are moved from the Nodes to restore tab to the Restoration progress tab when the restoration starts.

If a volume restoration can't be started, the node returns to the Nodes to restore tab.

View restoration progress

The Restoration progress tab shows the status of the volume restoration process and information about the volumes for a node being restored.

Data repair rates for replicated and erasure-coded objects in all volumes are averages summarizing all restorations in process, including those restorations initiated using the repair-data script. The percentage of objects in those volumes that are intact and don't require restoration is also indicated.

Replicated data restoration is dependent on the availability of resources where the replicated copies are stored. Progress of replicated data restoration is nonlinear and might take days or weeks to complete.

The Restoration jobs section displays information about volume restorations started from Grid Manager.

The number in the Restoration jobs section heading indicates the number of volumes that are either being restored or queued for restoration.
The table displays information about each volume in a node being restored and its progress.
- The progress for each node displays the percentage for each job.
- Expand the Details column to display the restoration start time and job ID.
If a volume restoration fails:
- The Status column indicates failed (attempting retry), and will be retried automatically.
- If multiple restoration jobs have failed, the most recent job will be retried automatically first.
- The EC repair failure alert is triggered if the retries continue to fail. Follow the steps in the alert to resolve the issue.

View restoration history

The Restoration history tab shows information about all volume restorations that have successfully completed.

Sizes aren't applicable for replicated objects and appear only for restorations that contain erasure-coded (EC) data objects.

Restore object data using Grid Manager

Creating your file...

Enable automatic restore mode

Manually restore failed volume or node

View restoration progress

View restoration history