Recover failed storage volumes and rebuild Cassandra database
You must run a script that reformats and remounts storage on failed storage volumes, and rebuilds the Cassandra database on the Storage Node if the system determines that it is necessary.
-
You have the
Passwords.txt
file. -
The system drives on the server are intact.
-
The cause of the failure has been identified and, if necessary, replacement storage hardware has already been acquired.
-
The total size of the replacement storage is the same as the original.
-
You have checked that a Storage Node decommissioning is not in progress, or you have paused the node decommission procedure. (In the Grid Manager, select MAINTENANCE > Tasks > Decommission.)
-
You have checked that an expansion is not in progress. (In the Grid Manager, select MAINTENANCE > Tasks > Expansion.)
-
You have reviewed the warnings about storage volume recovery.
-
As needed, replace failed physical or virtual storage associated with the failed storage volumes that you identified and unmounted earlier.
Don't remount the volumes in this step. The storage is remounted and added to
/etc/fstab
in a later step. -
In the Grid Manager, go to NODES >
appliance Storage Node
> Hardware. In the StorageGRID Appliance section of the page, verify that the Storage RAID mode is healthy. -
Log in to the failed Storage Node:
-
Enter the following command:
ssh admin@grid_node_IP
-
Enter the password listed in the
Passwords.txt
file. -
Enter the following command to switch to root:
su -
-
Enter the password listed in the
Passwords.txt
file.When you are logged in as root, the prompt changes from
$
to#
.
-
-
Use a text editor (vi or vim) to delete failed volumes from the
/etc/fstab
file and then save the file.Commenting out a failed volume in the /etc/fstab
file is insufficient. The volume must be deleted fromfstab
as the recovery process verifies that all lines in thefstab
file match the mounted file systems. -
Reformat any failed storage volumes and rebuild the Cassandra database if it is necessary. Enter:
reformat_storage_block_devices.rb
-
When storage volume 0 is unmounted, prompts and messages will indicate that the Cassandra service is being stopped.
-
You will be prompted to rebuild the Cassandra database if it is necessary.
-
Review the warnings. If none of them apply, rebuild the Cassandra database. Enter: y
-
If more than one Storage Node is offline or if another Storage Node has been rebuilt in the last 15 days. Enter: n
The script will exit without rebuilding Cassandra. Contact technical support.
-
-
For each rangedb drive on the Storage Node, when you are asked:
Reformat the rangedb drive <name> (device <major number>:<minor number>)? [y/n]?
, enter one of the following responses:-
y to reformat a drive that had errors. This reformats the storage volume and adds the reformatted storage volume to the
/etc/fstab
file. -
n if the drive contains no errors, and you don't want to reformat it.
Selecting n exits the script. Either mount the drive (if you think the data on the drive should be retained and the drive was unmounted in error) or remove the drive. Then, run the reformat_storage_block_devices.rb
command again.Some StorageGRID recovery procedures use Reaper to handle Cassandra repairs. Repairs occur automatically as soon as the related or required services have started. You might notice script output that mentions "reaper" or "Cassandra repair." If you see an error message indicating the repair has failed, run the command indicated in the error message. In the following example output, the drive
/dev/sdf
must be reformatted, and Cassandra did not need to be rebuilt:root@DC1-S1:~ # reformat_storage_block_devices.rb Formatting devices that are not in use... Skipping in use device /dev/sdc Skipping in use device /dev/sdd Skipping in use device /dev/sde Reformat the rangedb drive /dev/sdf (device 8:64)? [Y/n]? y Successfully formatted /dev/sdf with UUID b951bfcb-4804-41ad-b490-805dfd8df16c All devices processed Running: /usr/local/ldr/setup_rangedb.sh 12368435 Cassandra does not need rebuilding. Starting services. Informing storage services of new volume Reformatting done. Now do manual steps to restore copies of data.
-
-
After the storage volumes are reformatted and remounted and necessary Cassandra operations are complete, you can restore object data using Grid Manager.