Recovering failed storage volumes and rebuilding the Cassandra database

You must run a script that reformats and remounts storage on failed storage volumes, and rebuilds the Cassandra database on the Storage Node if the system determines that it is necessary.

Before you begin

Steps

  1. As needed, replace failed physical or virtual storage associated with the failed storage volumes that you identified and unmounted earlier.
    After you replace the storage, make sure you rescan or reboot to make sure that it is recognized by the operating system, but do not remount the volumes. The storage is remounted and added to /etc/fstab in a later step.
  2. From the service laptop, log in to the failed Storage Node:
    1. Enter the following command: ssh admin@grid_node_IP
    2. Enter the password listed in the Passwords.txt file.
    3. Enter the following command to switch to root: su -
    4. Enter the password listed in the Passwords.txt file.
    When you are logged in as root, the prompt changes from $ to #.
  3. Use a text editor (vi or vim) to delete failed volumes from the /etc/fstab file and then save the file.
    Note: Commenting out a failed volume in the /etc/fstab file is insufficient. The volume must be deleted from fstab as the recovery process verifies that all lines in the fstab file match the mounted file systems.
  4. Reformat any failed storage volumes and rebuild the Cassandra database if it is necessary. Enter: reformat_storage_block_devices.rb
    • If storage services are running, you will be prompted to stop them. Enter: y
    • You will be prompted to rebuild the Cassandra database if it is necessary.
      • Review the warnings. If none of them apply, rebuild the Cassandra database. Enter: y
      • If more than one Storage Node is offline or if another Storage Node has been rebuilt in the last 15 days. Enter: n

        The script will exit without rebuilding Cassandra. Contact technical support.

    • For each rangedb drive on the Storage Node, when you are asked to Reformat the rangedb drive <name> (device <major number>:<minor number>)? [Y/n]?, enter one of the following responses:
      • y to reformat a drive that had errors. This reformats the storage volume and adds the reformatted storage volume to the /etc/fstab file.
      • n if the drive contains no errors, and you do not want to reformat it.
      Note: Selecting n exits the script. Either mount the drive (if you think the data on the drive should be retained and the drive was unmounted in error) or remove the drive. Then, run the reformat_storage_block_devices.rb command again.

    In the following sample output, the drive /dev/sdf must be reformatted, and Cassandra did not need to be rebuilt:

    root@DC1-S1:~ # reformat_storage_block_devices.rb
    Storage services must be stopped before running this script.
    Stop storage services [y/N]? y
    Shutting down storage services.
    Storage services stopped.
    Formatting devices that are not in use...
    Skipping in use device /dev/sdc
    Skipping in use device /dev/sdd
    Skipping in use device /dev/sde
    Reformat the rangedb drive /dev/sdf (device 8:64)? [Y/n]? y
    Successfully formatted /dev/sdf with UUID c817f87f-f989-4a21-8f03-b6f42180063f
    Skipping in use device /dev/sdg
    All devices processed
    Running: /usr/local/ldr/setup_rangedb.sh 12075630
    Cassandra does not need rebuilding.
    Starting services.
     
    Reformatting done.  Now do manual steps to
    restore copies of data.