Alarms reference (legacy system)

04/13/2023 Contributors

The following table lists all of the legacy Default alarms. If an alarm is triggered, you can look up the alarm code in this table to find the recommended actions.

While the legacy alarm system continues to be supported, the alert system offers significant benefits and is easier to use.

Code Name Service Recommended action

ABRL

Available Attribute Relays

BADC, BAMS, BARC, BCLB, BCMN, BLDR, BNMS, BSSM, BDDS

Restore connectivity to a service (an ADC service) running an Attribute Relay Service as soon as possible. If there are no connected attribute relays, the grid node can't report attribute values to the NMS service. Thus, the NMS service can no longer monitor the status of the service, or update attributes for the service.

If the problem persists, contact technical support.

ACMS

Available Metadata Services

BARC, BLDR, BCMN

An alarm is triggered when an LDR or ARC service loses connection to a DDS service. If this occurs, ingest or retrieve transactions can't be processed. If the unavailability of DDS services is only a brief transient issue, transactions can be delayed.

Check and restore connections to a DDS service to clear this alarm and return the service to full functionality.

ACTS

Cloud Tiering Service Status

ARC

Only available for Archive Nodes with a Target Type of Cloud Tiering - Simple Storage Service (S3).

If the ACTS attribute for the Archive Node is set to Read-Only Enabled or Read-Write Disabled, you must set the attribute to Read-Write Enabled.

If a major alarm is triggered due to an authentication failure, verify the credentials associated with destination bucket and update values, if necessary.

If a major alarm is triggered due to any other reason, contact technical support.

ADCA

ADC Status

ADC

If an alarm is triggered, select SUPPORT > Tools > Grid topology. Then select site > grid node > ADC > Overview > Main and ADC > Alarms > Main to determine the cause of the alarm.

If the problem persists, contact technical support.

ADCE

ADC State

ADC

If the value of ADC State is Standby, continue monitoring the service and if the problem persists, contact technical support.

If the value of ADC State is Offline, restart the service. If the problem persists, contact technical support.

AITE

Retrieve State

BARC

Only available for Archive Node's with a Target Type of Tivoli Storage Manager (TSM).

If the value of Retrieve State is Waiting for Target, check the TSM middleware server and ensure that it is operating correctly. If the Archive Node has just been added to the StorageGRID system, ensure that the Archive Node's connection to the targeted external archival storage system is configured correctly.

If the value of Archive Retrieve State is Offline, attempt to update the state to Online. Select SUPPORT > Tools > Grid topology. Then select site > grid node > ARC > Retrieve > Configuration > Main, select Archive Retrieve State > Online, and click Apply Changes.

If the problem persists, contact technical support.

AITU

Retrieve Status

BARC

If the value of Retrieve Status is Target Error, check the targeted external archival storage system for errors.

If the value of Archive Retrieve Status is Session Lost, check the targeted external archival storage system to ensure it is online and operating correctly. Check the network connection with the target.

If the value of Archive Retrieve Status is Unknown Error, contact technical support.

ALIS

Inbound Attribute Sessions

ADC

If the number of inbound attribute sessions on an attribute relay grows too large, it can be an indication that the StorageGRID system has become unbalanced. Under normal conditions, attribute sessions should be evenly distributed amongst ADC services. An imbalance can lead to performance issues.

If the problem persists, contact technical support.

ALOS

Outbound Attribute Sessions

ADC

The ADC service has a high number of attribute sessions, and is becoming overloaded. If this alarm is triggered, contact technical support.

ALUR

Unreachable Attribute Repositories

ADC

Check network connectivity with the NMS service to ensure that the service can contact the attribute repository.

If this alarm is triggered and network connectivity is good, contact technical support.

AMQS

Audit Messages Queued

BADC, BAMS, BARC, BCLB, BCMN, BLDR, BNMS, BDDS

If audit messages can't be immediately forwarded to an audit relay or repository, the messages are stored in a disk queue. If the disk queue becomes full, outages can occur.

To allow you to respond in time to prevent an outage, AMQS alarms are triggered when the number of messages in the disk queue reaches the following thresholds:

Notice: More than 100,000 messages
Minor: At least 500,000 messages
Major: At least 2,000,000 messages
Critical: At least 5,000,000 messages

If an AMQS alarm is triggered, check the load on the system—if there have been a significant number of transactions, the alarm should resolve itself over time. In this case, you can ignore the alarm.

If the alarm persists and increases in severity, view a chart of the queue size. If the number is steadily increasing over hours or days, the audit load has likely exceeded the audit capacity of the system. Reduce the client operation rate or decrease the number of audit messages logged by changing the audit level to Error or Off. See Configure audit messages and log destinations.

AOTE

Store State

BARC

Only available for Archive Node's with a Target Type of Tivoli Storage Manager (TSM).

If the value of Store State is Waiting for Target, check the external archival storage system and ensure that it is operating correctly. If the Archive Node has just been added to the StorageGRID system, ensure that the Archive Node's connection to the targeted external archival storage system is configured correctly.

If the value of Store State is Offline, check the value of Store Status. Correct any problems before moving the Store State back to Online.

AOTU

Store Status

BARC

If the value of Store Status is Session Lost check that the external archival storage system is connected and online.

If the value of Target Error, check the external archival storage system for errors.

If the value of Store Status is Unknown Error, contact technical support.

APMS

Storage Multipath Connectivity

SSM

If the multipath state alarm appears as “Degraded” (select SUPPORT > Tools > Grid topology, then select site > grid node > SSM > Events), do the following:

Plug in or replace the cable that does not display any indicator lights.
Wait one to five minutes.

Don't unplug the other cable until at least five minutes after you plug in the first one. Unplugging too early can cause the root volume to become read-only, which requires that the hardware be restarted.
Return to the SSM > Resources page, and verify that the “Degraded” Multipath status has changed to “Nominal” in the Storage Hardware section.

ARCE

ARC State

ARC

The ARC service has a state of Standby until all ARC components (Replication, Store, Retrieve, Target) have started. It then transitions to Online.

If the value of ARC State does not transition from Standby to Online, check the status of the ARC components.

If the value of ARC State is Offline, restart the service. If the problem persists, contact technical support.

AROQ

Objects Queued

ARC

This alarm can be triggered if the removable storage device is running slowly due to problems with the targeted external archival storage system, or if it encounters multiple read errors. Check the external archival storage system for errors, and ensure that it is operating correctly.

In some cases, this error can occur as a result of a high rate of data requests. Monitor the number of objects queued as system activity declines.

ARRF

Request Failures

ARC

If a retrieval from the targeted external archival storage system fails, the Archive Node retries the retrieval as the failure can be due to a transient issue. However, if the object data is corrupt or has been marked as being permanently unavailable, the retrieval does not fail. Instead, the Archive Node continuously retries the retrieval and the value for Request Failures continues to increase.

This alarm can indicate that the storage media holding the requested data is corrupt. Check the external archival storage system to further diagnose the problem.

If you determine that the object data is no longer in the archive, the object will have to be removed from the StorageGRID system. For more information, contact technical support.

Once the problem that triggered this alarm is addressed, reset the failures count. Select SUPPORT > Tools > Grid topology. Then select site > grid node > ARC > Retrieve > Configuration > Main, select Reset Request Failure Count and click Apply Changes.

ARRV

Verification Failures

ARC

To diagnose and correct this problem, contact technical support.

Once the problem that triggered this alarm is addressed, reset the failures count. Select SUPPORT > Tools > Grid topology. Then select site > grid node > ARC > Retrieve > Configuration > Main, select Reset Verification Failure Count and click Apply Changes.

ARVF

Store Failures

ARC

This alarm can occur as a result of errors with the targeted external archival storage system. Check the external archival storage system for errors, and ensure that it is operating correctly.

Once the problem that triggered this alarm is addressed, reset the failures count. Select SUPPORT > Tools > Grid topology. Then select site > grid node > ARC > Retrieve > Configuration > Main, select Reset Store Failure Count, and click Apply Changes.

ASXP

Audit Shares

AMS

An alarm is triggered if the value of Audit Shares is Unknown. This alarm can indicate a problem with the installation or configuration of the Admin Node.

If the problem persists, contact technical support.

AUMA

AMS Status

AMS

If the value of AMS Status is DB Connectivity Error, restart the grid node.

If the problem persists, contact technical support.

AUME

AMS State

AMS

If the value of AMS State is Standby, continue monitoring the StorageGRID system. If the problem persists, contact technical support.

If the value of AMS State is Offline, restart the service. If the problem persists, contact technical support.

AUXS

Audit Export Status

AMS

If an alarm is triggered, correct the underlying problem, and then restart the AMS service.

If the problem persists, contact technical support.

BADD

Storage Controller Failed Drive Count

SSM

This alarm is triggered when one or more drives in a StorageGRID appliance has failed or is not optimal. Replace the drives as required.

BASF

Available Object Identifiers

CMN

When a StorageGRID system is provisioned, the CMN service is allocated a fixed number of object identifiers. This alarm is triggered when the StorageGRID system begins to exhaust its supply of object identifiers.

To allocate more identifiers, contact technical support.

BASS

Identifier Block Allocation Status

CMN

By default, an alarm is triggered when object identifiers can't be allocated because ADC quorum can't be reached.

Identifier block allocation on the CMN service requires a quorum (50% + 1) of the ADC services to be online and connected. If quorum is unavailable, the CMN service is unable to allocate new identifier blocks until ADC quorum is reestablished. If ADC quorum is lost, there is generally no immediate impact on the StorageGRID system (clients can still ingest and retrieve content), as approximately one month's supply of identifiers are cached elsewhere in the grid; however, if the condition continues, the StorageGRID system will lose the ability to ingest new content.

If an alarm is triggered, investigate the reason for the loss of ADC quorum (for example, it can be a network or Storage Node failure) and take corrective action.

If the problem persists, contact technical support.

BRDT

Compute Controller Chassis Temperature

SSM

An alarm is triggered if the temperature of the compute controller in a StorageGRID appliance exceeds a nominal threshold.

Check hardware components and environmental issues for overheated condition. If necessary, replace the component.

BTOF

Offset

BADC, BLDR, BNMS, BAMS, BCLB, BCMN, BARC

An alarm is triggered if the service time (seconds) differs significantly from the operating system time. Under normal conditions, the service should resynchronize itself. If the service time drifts too far from the operating system time, system operations can be affected. Confirm that the StorageGRID system's time source is correct.

If the problem persists, contact technical support.

BTSE

Clock State

BADC, BLDR, BNMS, BAMS, BCLB, BCMN, BARC

An alarm is triggered if the service's time is not synchronized with the time tracked by the operating system. Under normal conditions, the service should resynchronize itself. If the time drifts too far from operating system time, system operations can be affected. Confirm that the StorageGRID system's time source is correct.

If the problem persists, contact technical support.

CAHP

Java Heap Usage Percent

DDS

An alarm is triggered if Java is unable to perform garbage collection at a rate that allows enough heap space for the system to properly function. An alarm might indicate a user workload that exceeds the resources available across the system for the DDS metadata store. Check the ILM Activity in the dashboard, or select SUPPORT > Tools > Grid topology, then select site > grid node > DDS > Resources > Overview > Main.

If the problem persists, contact technical support.

CASA

Data Store Status

DDS

An alarm is raised if the Cassandra metadata store becomes unavailable.

Check the status of Cassandra:

At the Storage Node, log in as admin and su to root using the password listed in the Passwords.txt file.
Enter: service cassandra status
If Cassandra is not running, restart it: service cassandra restart

This alarm might also indicate that the metadata store (Cassandra database) for a Storage Node requires rebuilding.

See information about troubleshooting the Services: Status - Cassandra (SVST) alarm in Troubleshoot metadata issues.

If the problem persists, contact technical support.

CASE

Data Store State

DDS

This alarm is triggered during installation or expansion to indicate a new data store is joining the grid.

CCNA

Compute Hardware

SSM

This alarm is triggered if the status of the compute controller hardware in a StorageGRID appliance is Needs Attention.

CDLP

Metadata Used Space (Percent)

DDS

This alarm is triggered when the Metadata Effective Space (CEMS) reaches 70% full (minor alarm), 90% full (major alarm), and 100% full (critical alarm).

If this alarm reaches the 90% threshold, a warning appears on the dashboard in the Grid Manager. You must perform an expansion procedure to add new Storage Nodes as soon as possible. See Expand your grid.

If this alarm reaches the 100% threshold, you must stop ingesting objects and add Storage Nodes immediately. Cassandra requires a certain amount of space to perform essential operations such as compaction and repair. These operations will be impacted if object metadata uses more than 100% of the allowed space. Undesirable results can occur.

Note: Contact technical support if you are unable to add Storage Nodes.

After new Storage Nodes are added, the system automatically rebalances object metadata across all Storage Nodes, and the alarm clears.

Also see information about troubleshooting the Low metadata storage alert in Troubleshoot metadata issues.

If the problem persists, contact technical support.

CMNA

CMN Status

CMN

If the value of CMN Status is Error, select SUPPORT > Tools > Grid topology, then select site > grid node > CMN > Overview > Main and CMN > Alarms > Main to determine the cause of the error and to troubleshoot the problem.

An alarm is triggered and the value of CMN Status is No Online CMN during a hardware refresh of the primary Admin Node when the CMNs are switched (the value of the old CMN State is Standby and the new is Online).

If the problem persists, contact technical support.

CPRC

Remaining Capacity

NMS

An alarm is triggered if the remaining capacity (number of available connections that can be opened to the NMS database) falls below the configured alarm severity.

If an alarm is triggered, contact technical support.

CPSA

Compute Controller Power Supply A

SSM

An alarm is triggered if there is an issue with power supply A in the compute controller for a StorageGRID appliance.

If necessary, replace the component.

CPSB

Compute Controller Power Supply B

SSM

An alarm is triggered if there is an issue with power supply B in the compute controller for a StorageGRID appliance.

If necessary, replace the component.

CPUT

Compute Controller CPU Temperature

SSM

An alarm is triggered if the temperature of the CPU in the compute controller in a StorageGRID appliance exceeds a nominal threshold.

If the Storage Node is a StorageGRID appliance, the StorageGRID system indicates that the controller needs attention.

Check hardware components and environment issues for overheated condition. If necessary, replace the component.

DNST

DNS Status

SSM

After installation completes, a DNST alarm is triggered in the SSM service. After the DNS is configured and the new server information reaches all grid nodes, the alarm is canceled.

ECCD

Corrupt Fragments Detected

LDR

An alarm is triggered when the background verification process detects a corrupt erasure coded fragment. If a corrupt fragment is detected, an attempt is made to rebuild the fragment. Reset the Corrupt Fragments Detected and Copies Lost attributes to zero and monitor them to see if counts go up again. If counts do go up, there might be a problem with the Storage Node's underlying storage. A copy of erasure coded object data is not considered missing until such time that the number of lost or corrupt fragments breaches the erasure code's fault tolerance; therefore, it is possible to have corrupt fragment and to still be able to retrieve the object.

If the problem persists, contact technical support.

ECST

Verification Status

LDR

This alarm indicates the current status of the background verification process for erasure coded object data on this Storage Node.

A major alarm is triggered if there is an error in the background verification process.

FOPN

Open File Descriptors

BADC, BAMS, BARC, BCLB, BCMN, BLDR, BNMS, BSSM, BDDS

FOPN can become large during peak activity. If it does not diminish during periods of slow activity, contact technical support.

HSTE

HTTP State

BLDR

See recommended actions for HSTU.

HSTU

HTTP Status

BLDR

HSTE and HSTU are related to HTTP for all LDR traffic, including S3, Swift, and other internal StorageGRID traffic. An alarm indicates that one of the following situations has occurred:

HTTP has been taken offline manually.
The Auto-Start HTTP attribute has been disabled.
The LDR service is shutting down.

The Auto-Start HTTP attribute is enabled by default. If this setting is changed, HTTP could remain offline after a restart.

If necessary, wait for the LDR service to restart.

Select SUPPORT > Tools > Grid topology. Then select Storage Node > LDR > Configuration. If HTTP is offline, place it online. Verify that the Auto-Start HTTP attribute is enabled.

If HTTP remains offline, contact technical support.

HTAS

Auto-Start HTTP

LDR

Specifies whether to start HTTP services automatically on start-up. This is a user-specified configuration option.

IRSU

Inbound Replication Status

BLDR, BARC

An alarm indicates that inbound replication has been disabled. Confirm configuration settings: Select SUPPORT > Tools > Grid topology. Then select site > grid node > LDR > Replication > Configuration > Main.

LATA

Average Latency

NMS

Check for connectivity issues.

Check system activity to confirm that there is an increase in system activity. An increase in system activity will result in an increase to attribute data activity. This increased activity will result in a delay to the processing of attribute data. This can be normal system activity and will subside.

Check for multiple alarms. An increase in average latency times can be indicated by an excessive number of triggered alarms.

If the problem persists, contact technical support.

LDRE

LDR State

LDR

If the value of LDR State is Standby, continue monitoring the situation and if the problem persists, contact technical support.

If the value of LDR State is Offline, restart the service. If the problem persists, contact technical support.

LOST

Lost Objects

DDS, LDR

Triggered when the StorageGRID system fails to retrieve a copy of the requested object from anywhere in the system. Before a LOST (Lost Objects) alarm is triggered, the system attempts to retrieve and replace a missing object from elsewhere in the system.

Lost objects represent a loss of data. The Lost Objects attribute is incremented whenever the number of locations for an object drops to zero without the DDS service purposely purging the content to satisfy the ILM policy.

Investigate LOST (LOST Object) alarms immediately. If the problem persists, contact technical support.

Troubleshoot lost and missing object data

MCEP

Management Interface Certificate Expiry

CMN

Triggered when the certificate used for accessing the management interface is about to expire.

From the Grid Manager, select CONFIGURATION > Security > Certificates.
On the Global tab, select Management interface certificate.
Upload a new management interface certificate.

MINQ

E-mail Notifications Queued

NMS

Check the network connections of the servers hosting the NMS service and the external mail server. Also confirm that the email server configuration is correct.

Configure email server settings for alarms (legacy system)

MINS

E-mail Notifications Status

BNMS

A minor alarm is triggered if the NMS service is unable to connect to the mail server. Check the network connections of the servers hosting the NMS service and the external mail server. Also confirm that the email server configuration is correct.

Configure email server settings for alarms (legacy system)

MISS

NMS Interface Engine Status

BNMS

An alarm is triggered if the NMS interface engine on the Admin Node that gathers and generates interface content is disconnected from the system. Check Server Manager to determine if the server individual application is down.

NANG

Network Auto Negotiate Setting

SSM

Check the network adapter configuration. The setting must match preferences of your network routers and switches.

An incorrect setting can have a severe impact on system performance.

NDUP

Network Duplex Setting

SSM

Check the network adapter configuration. The setting must match preferences of your network routers and switches.

An incorrect setting can have a severe impact on system performance.

NLNK

Network Link Detect

SSM

Check the network cable connections on the port and at the switch.

Check the network router, switch, and adapter configurations.

Restart the server.

If the problem persists, contact technical support.

NRER

Receive Errors

SSM

The following can be causes of NRER alarms:

Forward error correction (FEC) mismatch
Switch port and NIC MTU mismatch
High link error rates
NIC ring buffer overrun

See information about troubleshooting the Network Receive Error (NRER) alarm in Troubleshoot network, hardware, and platform issues.

NRLY

Available Audit Relays

BADC, BARC, BCLB, BCMN, BLDR, BNMS, BDDS

If audit relays aren't connected to ADC services, audit events can't be reported. They are queued and unavailable to users until the connection is restored.

Restore connectivity to an ADC service as soon as possible.

If the problem persists, contact technical support.

NSCA

NMS Status

NMS

If the value of NMS Status is DB Connectivity Error, restart the service. If the problem persists, contact technical support.

NSCE

NMS State

NMS

If the value of NMS State is Standby, continue monitoring and if the problem persists, contact technical support.

If the value of NMS State is Offline, restart the service. If the problem persists, contact technical support.

NSPD

Speed

SSM

This can be caused by network connectivity or driver compatibility issues. If the problem persists, contact technical support.

NTBR

Free Tablespace

NMS

If an alarm is triggered, check how fast database usage has been changing. A sudden drop (as opposed to a gradual change over time) indicates an error condition. If the problem persists, contact technical support.

Adjusting the alarm threshold allows you to proactively manage when additional storage needs to be allocated.

If the available space reaches a low threshold (see alarm threshold), contact technical support to change the database allocation.

NTER

Transmit Errors

SSM

These errors can clear without being manually reset. If they don't clear, check network hardware. Check that the adapter hardware and driver are correctly installed and configured to work with your network routers and switches.

When the underlying problem is resolved, reset the counter. Select SUPPORT > Tools > Grid topology. Then select site > grid node > SSM > Resources > Configuration > Main, select Reset Transmit Error Count, and click Apply Changes.

NTFQ

NTP Frequency Offset

SSM

If the frequency offset exceeds the configured threshold, there is likely a hardware problem with the local clock. If the problem persists, contact technical support to arrange a replacement.

NTLK

NTP Lock

SSM

If the NTP daemon is not locked to an external time source, check network connectivity to the designated external time sources, their availability, and their stability.

NTOF

NTP Time Offset

SSM

If the time offset exceeds the configured threshold, there is likely a hardware problem with the oscillator of the local clock. If the problem persists, contact technical support to arrange a replacement.

NTSJ

Chosen Time Source Jitter

SSM

This value indicates the reliability and stability of the time source that NTP on the local server is using as its reference.

If an alarm is triggered, it can be an indication that the time source's oscillator is defective, or that there is a problem with the WAN link to the time source.

NTSU

NTP Status

SSM

If the value of NTP Status is Not Running, contact technical support.

OPST

Overall Power Status

SSM

An alarm is triggered if the power of a StorageGRID appliance deviates from the recommended operating voltage.

Check the status of Power Supply A or B to determine which power supply is operating abnormally.

If necessary, replace the power supply.

OQRT

Objects Quarantined

LDR

After the objects are automatically restored by the StorageGRID system, the quarantined objects can be removed from the quarantine directory.

Select SUPPORT > Tools > Grid topology.
Select site > Storage Node > LDR > Verification > Configuration > Main.
Select Delete Quarantined Objects.
Click Apply Changes.

The quarantined objects are removed, and the count is reset to zero.

ORSU

Outbound Replication Status

BLDR, BARC

An alarm indicates that outbound replication is not possible: storage is in a state where objects can't be retrieved. An alarm is triggered if outbound replication is disabled manually. Select SUPPORT > Tools > Grid topology. Then select site > grid node > LDR > Replication > Configuration.

An alarm is triggered if the LDR service is unavailable for replication. Select SUPPORT > Tools > Grid topology. Then select site > grid node > LDR > Storage.

OSLF

Shelf Status

SSM

An alarm is triggered if the status of one of the components in the storage shelf for a storage appliance is degraded. Storage shelf components include the IOMs, fans, power supplies, and drive drawers.If this alarm is triggered, see the maintenance instructions for your appliance.

PMEM

Service Memory Usage (Percent)

BADC, BAMS, BARC, BCLB, BCMN, BLDR, BNMS, BSSM, BDDS

Can have a value of Over Y% RAM, where Y represents the percentage of memory being used by the server.

Figures under 80% are normal. Over 90% is considered a problem.

If memory usage is high for a single service, monitor the situation and investigate.

If the problem persists, contact technical support.

PSAS

Power Supply A Status

SSM

An alarm is triggered if power supply A in a StorageGRID appliance deviates from the recommended operating voltage.

If necessary, replace power supply A.

PSBS

Power Supply B Status

SSM

An alarm is triggered if power supply B in a StorageGRID appliance deviates from the recommended operating voltage.

If necessary, replace the power supply B.

RDTE

Tivoli Storage Manager State

BARC

Only available for Archive Nodes with a Target Type of Tivoli Storage Manager (TSM).

If the value of Tivoli Storage Manager State is Offline, check Tivoli Storage Manager Status and resolve any problems.

Bring the component back online. Select SUPPORT > Tools > Grid topology. Then select site > grid node > ARC > Target > Configuration > Main, select Tivoli Storage Manager State > Online, and click Apply Changes.

RDTU

Tivoli Storage Manager Status

BARC

Only available for Archive Nodes with a Target Type of Tivoli Storage Manager (TSM).

If the value of Tivoli Storage Manager Status is Configuration Error and the Archive Node has just been added to the StorageGRID system, ensure that the TSM middleware server is correctly configured.

If the value of Tivoli Storage Manager Status is Connection Failure, or Connection Failure, Retrying, check the network configuration on the TSM middleware server, and the network connection between the TSM middleware server and the StorageGRID system.

If the value of Tivoli Storage Manager Status is Authentication Failure, or Authentication Failure, Reconnecting, the StorageGRID system can connect to the TSM middleware server, but can't authenticate the connection. Check that the TSM middleware server is configured with the correct user, password, and permissions, and restart the service.

If the value of Tivoli Storage Manager Status is Session Failure, an established session has been lost unexpectedly. Check the network connection between the TSM middleware server and the StorageGRID system. Check the middleware server for errors.

If the value of Tivoli Storage Manager Status is Unknown Error, contact technical support.

RIRF

Inbound Replications — Failed

BLDR, BARC

An Inbound Replications — Failed alarm can occur during periods of high load or temporary network disruptions. After system activity reduces, this alarm should clear. If the count of failed replications continues to increase, look for network problems and verify that the source and destination LDR and ARC services are online and available.

To reset the count, select SUPPORT > Tools > Grid topology, then select site > grid node > LDR > Replication > Configuration > Main. Select Reset Inbound Replication Failure Count, and click Apply Changes.

RIRQ

Inbound Replications — Queued

BLDR, BARC

Alarms can occur during periods of high load or temporary network disruption. After system activity reduces, this alarm should clear. If the count for queued replications continues to increase, look for network problems and verify that the source and destination LDR and ARC services are online and available.

RORQ

Outbound Replications — Queued

BLDR, BARC

The outbound replication queue contains object data being copied to satisfy ILM rules and objects requested by clients.

An alarm can occur as a result of a system overload. Wait to see if the alarm clears when system activity declines. If the alarm recurs, add capacity by adding Storage Nodes.

SAVP

Total Usable Space (Percent)

LDR

If usable space reaches a low threshold, options include expanding the StorageGRID system or move object data to archive through an Archive Node.

SCAS

Status

CMN

If the value of Status for the active grid task is Error, look up the grid task message. Select SUPPORT > Tools > Grid topology. Then select site > grid node > CMN > Grid Tasks > Overview > Main. The grid task message displays information about the error (for example, “check failed on node 12130011”).

After you have investigated and corrected the problem, restart the grid task. Select SUPPORT > Tools > Grid topology. Then select site > grid node > CMN > Grid Tasks > Configuration > Main, and select Actions > Run.

If the value of Status for a grid task being stopped is Error, retry ending the grid task.

If the problem persists, contact technical support.

SCEP

Storage API Service Endpoints Certificate Expiry

CMN

Triggered when the certificate used for accessing storage API endpoints is about to expire.

Select CONFIGURATION > Security > Certificates.
On the Global tab, select S3 and Swift API certificate.
Upload a new S3 and Swift API certificate.

SCHR

Status

CMN

If the value of Status for the historical grid task is Aborted, investigate the reason and run the task again if required.

If the problem persists, contact technical support.

SCSA

Storage Controller A

SSM

An alarm is triggered if there is an issue with storage controller A in a StorageGRID appliance.

If necessary, replace the component.

SCSB

Storage Controller B

SSM

An alarm is triggered if there is an issue with storage controller B in a StorageGRID appliance.

If necessary, replace the component.

Some appliance models don't have a storage controller B.

SHLH

Health

LDR

If the value of Health for an object store is Error, check and correct:

problems with the volume being mounted
file system errors

SLSA

CPU Load Average

SSM

The higher the value the busier the system.

If the CPU Load Average persists at a high value, the number of transactions in the system should be investigated to determine whether this is due to heavy load at the time. View a chart of the CPU load average: Select SUPPORT > Tools > Grid topology. Then select site > grid node > SSM > Resources > Reports > Charts.

If the load on the system is not heavy and the problem persists, contact technical support.

SMST

Log Monitor State

SSM

If the value of Log Monitor State is not Connected for a persistent period of time, contact technical support.

SMTT

Total Events

SSM

If the value of Total Events is greater than zero, check if there are known events (such as network failures) that can be the cause. Unless these errors have been cleared (that is, the count has been reset to 0), Total Events alarms can be triggered.

When an issue is resolved, reset the counter to clear the alarm. Select NODES > site > grid node > Events > Reset event counts.

To reset event counts, you must have the Grid topology page configuration permission.

If the value of Total Events is zero, or the number increases and the problem persists, contact technical support.

SNST

Status

CMN

An alarm indicates that there is a problem storing the grid task bundles. If the value of Status is Checkpoint Error or Quorum Not Reached, confirm that a majority of ADC services are connected to the StorageGRID system (50 percent plus one) and then wait a few minutes.

If the problem persists, contact technical support.

SOSS

Storage Operating System Status

SSM

An alarm is triggered if SANtricity OS indicates that there is a “Needs attention” issue with a component in a StorageGRID appliance.

Select NODES. Then select appliance Storage Node > Hardware. Scroll down to view the status of each component. In SANtricity OS, check other appliance components to isolate the issue.

SSMA

SSM Status

SSM

If the value of SSM Status is Error, select SUPPORT > Tools > Grid topology, then select site > grid node > SSM > Overview > Main and SSM > Overview > Alarms to determine the cause of the alarm.

If the problem persists, contact technical support.

SSME

SSM State

SSM

If the value of SSM State is Standby, continue monitoring, and if the problem persists, contact technical support.

If the value of SSM State is Offline, restart the service. If the problem persists, contact technical support.

SSTS

Storage Status

BLDR

If the value of Storage Status is Insufficient Usable Space, there is no more available storage on the Storage Node and data ingests are redirected to other available Storage Node. Retrieval requests can continue to be delivered from this grid node.

Additional storage should be added. It is not impacting end user functionality, but the alarm persists until additional storage is added.

If the value of Storage Status is Volume(s) Unavailable, a part of the storage is unavailable. Storage and retrieval from these volumes is not possible. Check the volume's Health for more information: Select SUPPORT > Tools > Grid topology. Then select site > grid node > LDR > Storage > Overview > Main. The volume's Health is listed under Object Stores.

If the value of Storage Status is Error, contact technical support.

Troubleshoot the Storage Status (SSTS) alarm

SVST

Status

SSM

This alarm clears when other alarms related to a non-running service are resolved. Track the source service alarms to restore operation.

Select SUPPORT > Tools > Grid topology. Then select site > grid node > SSM > Services > Overview > Main. When the status of a service is shown as Not Running, its state is Administratively Down. The service's status can be listed as Not Running for the following reasons:

The service has been manually stopped (/etc/init.d/<service\> stop).
There is an issue with the MySQL database and Server Manager shuts down the MI service.
A grid node has been added, but not started.
During installation, a grid node has not yet connected to the Admin Node.

If a service is listed as Not Running, restart the service (/etc/init.d/<service\> restart).

This alarm might also indicate that the metadata store (Cassandra database) for a Storage Node requires rebuilding.

If the problem persists, contact technical support.

Troubleshoot the Services: Status - Cassandra (SVST) alarm

TMEM

Installed Memory

SSM

Nodes running with less than 24 GiB of installed memory can lead to performance problems and system instability. The amount of memory installed on the system should be increased to at least 24 GiB.

TPOP

Pending Operations

ADC

A queue of messages can indicate that the ADC service is overloaded. Too few ADC services can be connected to the StorageGRID system. In a large deployment, the ADC service can require adding computational resources, or the system can require additional ADC services.

UMEM

Available Memory

SSM

If the available RAM gets low, determine whether this is a hardware or software issue. If it is not a hardware issue, or if available memory falls below 50 MB (the default alarm threshold), contact technical support.

VMFI

Entries Available

SSM

This is an indication that additional storage is required. Contact technical support.

VMFR

Space Available

SSM

If the value of Space Available gets too low (see alarm thresholds), it needs to be investigated as to whether there are log files growing out of proportion, or objects taking up too much disk space (see alarm thresholds) that need to be reduced or deleted.

If the problem persists, contact technical support.

VMST

Status

SSM

An alarm is triggered if the value of Status for the mounted volume is Unknown. A value of Unknown or Offline can indicate that the volume can't be mounted or accessed due to a problem with the underlying storage device.

VPRI

Verification Priority

BLDR, BARC

By default, the value of Verification Priority is Adaptive. If Verification Priority is set to High, an alarm is triggered because storage verification can slow normal operations of the service.

VSTU

Object Verification Status

BLDR

Select SUPPORT > Tools > Grid topology. Then select site > grid node > LDR > Storage > Overview > Main.

Check the operating system for any signs of block-device or file system errors.

If the value of Object Verification Status is Unknown Error, it usually indicates a low-level file system or hardware problem (I/O error) that prevents the Storage Verification task from accessing stored content. Contact technical support.

XAMS

Unreachable Audit Repositories

BADC, BARC, BCLB, BCMN, BLDR, BNMS

Check network connectivity to the server hosting the Admin Node.

If the problem persists, contact technical support.

Alarms reference (legacy system)

Creating your file...