Troubleshooting SVST (Services: Status - Cassandra) alarm

The SVST alarm indicates that you might need to rebuild the Cassandra database for a Storage Node. Cassandra is used as the metadata store for StorageGRID Webscale.

Before you begin

About this task

If Cassandra is stopped for more than 15 days (for example, the Storage Node is powered off), Cassandra will not start when the node is brought back online. You must rebuild the Cassandra database for the affected DDS service.
Attention: If two or more of the Cassandra database services are down for more than 15 days, contact technical support, and do not proceed with the steps below.

Steps

  1. Select Support > Grid Topology.
  2. Select site > Storage Node > SSM > Services > Alarms > Main to display alarms.
    This example shows that the SVST alarm was triggered.


    Alarms: SSM: Services page

    The SSM Services Main page also indicates that Cassandra is not running.


    Overview: SSM: Services page
  3. Try restarting Cassandra:
    1. At the Storage Node, log in as admin and su to root using the password listed in the Passwords.txt file.
    2. Enter: /etc/init.d/cassandra status
    3. If Cassandra is not running, restart it: /etc/init.d/cassandra restart
  4. If Cassandra does not restart, determine how long Cassandra has been down. If Cassandra has been down for longer than 15 days, you must rebuild the Cassandra database.
    Attention: If two or more of the Cassandra database services are down, contact technical support, and do not proceed with the steps below.

    You can determine how long Cassandra has been down by charting it or by reviewing the servermanager log file.

    To chart Cassandra:
    1. Select Support > Grid Topology. Then select site > Storage Node > SSM > Services > Reports > Charts.
    2. Select Attribute > Service: Status - Cassandra.
    3. Enter a Start Data that is at least 16 days before today’s date and for End Date, today’s date.
    4. Click Update.

      If the chart shows Cassandra as being down for more than 15 days, rebuild the Cassandra database. The following chart example shows that Cassandra has been down for at least 17 days.


      Overview: SSM: Services page

      To review the servermanager log file:
      1. At the Storage Node, log in as admin and su to root using the password listed in the Passwords.txt file.
      2. Enter: cat /var/local/log/servermanager.log

        The contents of the servermanager log file are displayed.

      3. In the servermanager log file, if Cassandra has been down for longer than 15 days, the following message is displayed:
        "2014-08-14 21:01:35 +0000 | cassandra | cassandra not 
        started because it has been offline for longer than 
        its 15 day grace period - rebuild cassandra

        Make sure the timestamp of this message is the time when you attempted restarting Cassandra as instructed in step 3.

        There can be more than one entry for Cassandra; thus, you must locate the most recent entry.

        If Cassandra is down for longer than 15 days, you must rebuild the Cassandra database. For instructions, see "Recovering from a single Storage Node down more than 15 days" in the recovery and maintenance instructions.

        After Cassandra is rebuilt, alarms should clear. If alarms do not clear, contact technical support.