Troubleshooting SVST (Services: Status - Cassandra) alarm

A SVST alarm may be an indication that you need to rebuild the DDS service’s Cassandra database for a Storage Node.

Before you begin

About this task

If Cassandra is stopped for more than 15 days (for example, the Storage Node is powered off), when it is brought back online, Cassandra will not start. You must rebuild the Cassandra database for the affected DDS service.
Attention: If two or more of the Cassandra database services are down for more than 15 days, contact technical support, and do not proceed with the steps below.

Steps

  1. Select Grid.
  2. Select site > Storage Node > SSM > Services > Alarms > Main to display alarms. The SVST alarm was triggered.

    Alarms: SSM: Services page

    Cassandra is not running, which is indicated on the Grid > Site > Storage Node > SSM > Services > Alarms > Main page.


    Overview: SSM: Services page

  3. Try restarting Cassandra:
    1. At the Storage Node, log in as admin and su to root using the password listed in the password.txt file.
    2. Enter: /etc/init.d/cassandra status
    3. If Cassandra is not running, restart it: /etc/init.d/cassandra restart
  4. If Cassandra does not restart, determine how long Cassandra has been down. If Cassandra has been down for longer than 15 days, you must rebuild the Cassandra database.
    Attention: If two or more of the Cassandra database services are down, contact technical support, and do not proceed with the steps below.

    You can determine how long Cassandra has been down by charting it or by reviewing the servermanager log file.

    To chart Cassandra:
    1. Select Grid > site > Storage Node > SSM > Services > Reports > Charts.
    2. Select Attribute > Service: Status - Cassandra
    3. Enter a Start Data that is at least 16 days before today’s date and for End Date, today’s date.
    4. Click Update.

      If the chart shows Cassandra as being down for more than 15 days, rebuild the Cassandra database. The following chart example shows that Cassandra has been down for at least 17 days.


      Overview: SSM: Services page

      To review the servermanager log file:
      1. At the Storage Node, log in as admin and su to root using the password listed in the password.txt file.
      2. Enter: cat /var/local/log/servermanager.log

        The contents of the servermanager log file are displayed.

      3. In the servermanager log file, if Cassandra has been down for longer than 15 days, the following message is displayed:
        "2014-08-14 21:01:35 +0000 | cassandra | cassandra not 
        started because it has been offline for longer than 
        its 15 day grace period - rebuild cassandra

        Make sure the timestamp of this message is the time when you attempted restarting Cassandra as instructed in step 3.

        There can be more than one entry for Cassandra; thus, you must locate the most recent entry.

        If Cassandra is down for longer than 15 days, you must rebuild the Cassandra database. For instructions, see "Recovering from a single Storage Node down more than 15 days" in the Recovery and Maintenance Guide.

        After Cassandra is rebuilt, alarms should clear. If alarms do not clear, contact technical support.