Restore object data using Grid Manager
You can restore object data for a failed storage volume or Storage Node using Grid Manager. You can also use Grid Manager to monitor restoration processes in progress and display a restoration history.
-
You have completed either of these procedures to format failed volumes:
-
You have confirmed that the Storage Node where you are restoring objects has a Connection State of Connected on the NODES > Overview tab in the Grid Manager.
-
You have confirmed the following:
-
A grid expansion to add a Storage Node is not in process.
-
A Storage Node decommission is not in process or failed.
-
A recovery of a failed storage volume is not in process.
-
A recovery of a Storage Node with a failed system drive is not in process.
-
An EC rebalance job is not in process.
-
Appliance node cloning is not in process.
-
After you have replaced the drives and performed the manual steps to format the volumes, Grid Manager displays the volumes as candidates for restoration on the MAINTENANCE > Volume restoration > Nodes to restore tab.
Whenever possible, restore object data using the Volume restoration page in the Grid Manager. You can either enable automatic restore mode to automatically start volume restoration when the volumes are ready to be restored or manually perform volume restoration. Follow these guidelines:
-
If the volumes are listed at MAINTENANCE > Volume restoration > Nodes to restore, restore object data as described in the steps below. The volumes will be listed if:
-
Some, but not all, storage volumes in a node have failed
-
All storage volumes in a node have failed and are being replaced with the same number of volumes or more volumes
The Volume restoration page in the Grid Manager also allows you to monitor the volume restoration process and view restoration history.
-
-
If the volumes aren't listed in the Grid Manager as candidates for restoration, follow the appropriate steps for using the
repair-data
script to restore object data:-
Restoring object data to storage volume (system drive failure)
-
Restore object data to storage volume where system drive is intact
-
Restore object data to storage volume for appliance
The repair-data script is deprecated and will be removed in a future release.
If the recovered Storage Node contains fewer volumes than the node it is replacing, you must use the
repair-data
script. -
You can restore two types of object data:
-
Replicated data objects are restored from other locations, assuming that the grid's ILM rules were configured to make object copies available.
-
If an ILM rule was configured to store only one replicated copy and that copy existed on a storage volume that failed, you will not be able to recover the object.
-
If the only remaining copy of an object is in a Cloud Storage Pool, StorageGRID must issue multiple requests to the Cloud Storage Pool endpoint to restore object data.
-
-
Erasure-coded (EC) data objects are restored by reassembling the stored fragments. Corrupt or lost fragments are recreated by the erasure-coding algorithm from the remaining data and parity fragments.
Repairs of erasure-coded data can begin while some Storage Nodes are offline. However, if all erasure-coded data cannot be accounted for, the repair can't be completed. Repair will complete after all nodes are available.
Volume restoration is dependent on the availability of resources where object copies are stored. Progress of volume restoration is nonlinear and might take days or weeks to complete. |
Enable automatic restore mode
When you enable Automatic restore mode, volume restoration automatically starts when the volumes are ready to be restored.
-
In Grid Manager go to MAINTENANCE > Volume restoration.
-
Select the Nodes to restore tab, then slide the toggle for Automatic restore mode to the enabled position.
-
When the confirmation dialog box appears, review the details.
-
You will not be able to start volume restoration jobs manually on any nodes.
-
Volume restorations will begin automatically only when no other maintenance procedures are in progress.
-
You can monitor the status of the job from the progress monitoring page.
-
StorageGRID automatically retries volume restorations that fail to start.
-
-
When you understand the results of enabling Automatic restore mode, select Yes in the confirmation dialog box.
You can disable Automatic restore mode at any time.
Manually restore failed volume or node
Follow these steps to restore a failed volume or node.
-
In Grid Manager go to MAINTENANCE > Volume restoration.
-
Select the Nodes to restore tab, then slide the toggle for Automatic restore mode to the disabled position.
The number on the tab indicates the number of nodes with volumes requiring restoration.
-
Expand each node to see the volumes in it that need restoration and their status.
-
Correct any issues preventing restoration of each volume. Issues will be indicated when you select Waiting for manual steps, if it displays as the volume status.
-
Select a node to restore where all the volumes indicate a Ready to restore status.
You can only restore the volumes for one node at a time.
Each volume in the node must indicate that it is ready to restore.
-
Select Start restore.
-
Address any warnings that might appear or select Start anyway to ignore the warnings and start the restoration.
Nodes are moved from the Nodes to restore tab to the Restoration progress tab when the restoration starts.
If a volume restoration can't be started, the node returns to the Nodes to restore tab.
View restoration progress
The Restoration progress tab shows the status of the volume restoration process and information about the volumes for a node being restored.
Data repair rates for replicated and erasure-coded objects in all volumes are averages summarizing all restorations in process, including those restorations initiated using the repair-data
script. The percentage of objects in those volumes that are intact and don't require restoration is also indicated.
Replicated data restoration is dependent on the availability of resources where the replicated copies are stored. Progress of replicated data restoration is nonlinear and might take days or weeks to complete. |
The Restoration jobs section displays information about volume restorations started from Grid Manager.
-
The number in the Restoration jobs section heading indicates the number of volumes that are either being restored or queued for restoration.
-
The table displays information about each volume in a node being restored and its progress.
-
The progress for each node displays the percentage for each job.
-
Expand the Details column to display the restoration start time and job ID.
-
-
If a volume restoration fails:
-
The Status column indicates
failed (attempting retry)
, and will be retried automatically. -
If multiple restoration jobs have failed, the most recent job will be retried automatically first.
-
The EC repair failure alert is triggered if the retries continue to fail. Follow the steps in the alert to resolve the issue.
-