Skip to main content

Identify and unmount failed storage volumes

Contributors netapp-pcarriga netapp-lhalbert

When recovering a Storage Node with failed storage volumes, you must identify and unmount the failed volumes. You must verify that only the failed storage volumes are reformatted as part of the recovery procedure.

Before you begin

You are signed in to the Grid Manager using a supported web browser.

About this task

You should recover failed storage volumes as soon as possible.

The first step of the recovery process is to detect volumes that have become detached, need to be unmounted, or have I/O errors. If failed volumes are still attached but have a randomly corrupted file system, the system might not detect any corruption in unused or unallocated parts of the disk.

Note You must finish this procedure before performing manual steps to recover the volumes, such as adding or re-attaching the disks, stopping the node, starting the node, or rebooting. Otherwise, when you run the reformat_storage_block_devices.rb script, you might encounter a file system error that causes the script to hang or fail.
Note Repair the hardware and properly attach the disks before running the reboot command.
Caution Identify failed storage volumes carefully. You will use this information to verify which volumes must be reformatted. After a volume has been reformatted, data on the volume can't be recovered.

To recover failed storage volumes, you need to know both the device names of the failed storage volumes and their volume IDs.

At installation, each storage device is assigned a file system universal unique identifier (UUID) and is mounted to a rangedb directory on the Storage Node using that assigned file system UUID. The file system UUID and the rangedb directory are listed in the /etc/fstab file. The Mount point, Device name, and size of the volume are displayed in the Grid Manager.

Steps
  1. Complete the following steps to record the failed storage volumes and their device names:

    1. Select Nodes > site > failed Storage Node > Storage.

    2. Scroll down to locate the Volumes table and Object stores table and record the following information for each volume with a status of Unknown or Offline.

      • From the Volumes table, record the Mount point, Device, and Size.

      • From the Object stores table, record the object_store_ID.

        The object_store_ID is the ID of the failed storage volume. For example, specify 0 in the command for an object store with ID 0000.

  2. Log in to the failed Storage Node:

    1. Enter the following command: ssh admin@grid_node_IP

    2. Enter the password listed in the Passwords.txt file.

    3. Enter the following command to switch to root: su -

    4. Enter the password listed in the Passwords.txt file.

      When you are logged in as root, the prompt changes from $ to #.

  3. Run the following script to unmount a failed storage volume:

    sn-unmount-volume object_store_ID

  4. If prompted, press y to stop the Cassandra service depending on storage volume 0.

    Note If the Cassandra service is already stopped, you aren't prompted. The Cassandra service is stopped only for volume 0.
    root@Storage-180:~/var/local/tmp/storage~ # sn-unmount-volume 0
    Services depending on storage volume 0 (cassandra) aren't down.
    Services depending on storage volume 0 must be stopped before running this script.
    Stop services that require storage volume 0 [y/N]? y
    Shutting down services that require storage volume 0.
    Services requiring storage volume 0 stopped.
    Unmounting /var/local/rangedb/0
    /var/local/rangedb/0 is unmounted.

    In a few seconds, the volume is unmounted. Messages appear indicating each step of the process. The final message indicates that the volume is unmounted.

  5. If the unmount fails because the volume is busy, you can force an unmount using the --use-umountof option:

    Note Forcing an unmount using the --use-umountof option might cause processes or services using the volume to behave unexpectedly or crash.
    root@Storage-180:~ # sn-unmount-volume --use-umountof /var/local/rangedb/2
    Unmounting /var/local/rangedb/2 using umountof
    /var/local/rangedb/2 is unmounted.
    Informing LDR service of changes to storage volumes