Troubleshooting the Services: Status - Cassandra (SVST) alarm

The Services: Status - Cassandra (SVST) alarm indicates that you might need to rebuild the Cassandra database for a Storage Node. Cassandra is used as the metadata store for StorageGRID.

Before you begin

About this task

If Cassandra is stopped for more than 15 days (for example, the Storage Node is powered off), Cassandra will not start when the node is brought back online. You must rebuild the Cassandra database for the affected DDS service.

You can use the Diagnostics page to obtain additional information on the current state of your grid.

Running diagnostics

Attention: If two or more of the Cassandra database services are down for more than 15 days, contact technical support, and do not proceed with the steps below.

Procedure

  1. Select Support > Tools > Grid Topology.
  2. Select site > Storage Node > SSM > Services > Alarms > Main to display alarms.
    This example shows that the SVST alarm was triggered.

    Alarms: SSM: Services page

    The SSM Services Main page also indicates that Cassandra is not running.


    Overview: SSM: Services page
  3. Try restarting Cassandra from the Storage Node:
    1. Log in to the grid node:
      1. Enter the following command: ssh admin@grid_node_IP
      2. Enter the password listed in the Passwords.txt file.
      3. Enter the following command to switch to root: su -
      4. Enter the password listed in the Passwords.txt file.
      When you are logged in as root, the prompt changes from $ to #.
    2. Enter: /etc/init.d/cassandra status
    3. If Cassandra is not running, restart it: /etc/init.d/cassandra restart
  4. If Cassandra does not restart, determine how long Cassandra has been down. If Cassandra has been down for longer than 15 days, you must rebuild the Cassandra database.
    Attention: If two or more of the Cassandra database services are down, contact technical support, and do not proceed with the steps below.

    You can determine how long Cassandra has been down by charting it or by reviewing the servermanager.log file.

  5. To chart Cassandra:
    1. Select Support > Tools > Grid Topology. Then select site > Storage Node > SSM > Services > Reports > Charts.
    2. Select Attribute > Service: Status - Cassandra.
    3. For Start Date, enter a date that is at least 16 days before the current date. For End Date, enter the current date.
    4. Click Update.
    5. If the chart shows Cassandra as being down for more than 15 days, rebuild the Cassandra database.
    The following chart example shows that Cassandra has been down for at least 17 days.

    Overview: SSM: Services page
  6. To review the servermanager.log file on the Storage Node:
    1. Log in to the grid node:
      1. Enter the following command: ssh admin@grid_node_IP
      2. Enter the password listed in the Passwords.txt file.
      3. Enter the following command to switch to root: su -
      4. Enter the password listed in the Passwords.txt file.
      When you are logged in as root, the prompt changes from $ to #.
    2. Enter: cat /var/local/log/servermanager.log
      The contents of the servermanager.log file are displayed.
      If Cassandra has been down for longer than 15 days, the following message is displayed in the servermanager.log file:
      "2014-08-14 21:01:35 +0000 | cassandra | cassandra not 
      started because it has been offline for longer than 
      its 15 day grace period - rebuild cassandra
    3. Make sure the timestamp of this message is the time when you attempted restarting Cassandra as instructed in step 3.
      There can be more than one entry for Cassandra; you must locate the most recent entry.
    4. If Cassandra has been down for longer than 15 days, you must rebuild the Cassandra database.
      For instructions, see Recovering from a single Storage Node down more than 15 days in the recovery and maintenance instructions.
    5. Contact technical support if alarms do not clear after Cassandra is rebuilt.