Alarms reference (legacy system)
The following table lists all of the legacy Default alarms. If an alarm is triggered, you can look up the alarm code in this table to find the recommended actions.
While the legacy alarm system continues to be supported, the alert system offers significant benefits and is easier to use. |
Code | Name | Service | Recommended action | ||
---|---|---|---|---|---|
ABRL |
Available Attribute Relays |
BADC, BAMS, BARC, BCLB, BCMN, BLDR, BNMS, BSSM, BDDS |
Restore connectivity to a service (an ADC service) running an Attribute Relay Service as soon as possible. If there are no connected attribute relays, the grid node can't report attribute values to the NMS service. Thus, the NMS service can no longer monitor the status of the service, or update attributes for the service. If the problem persists, contact technical support. |
||
ACMS |
Available Metadata Services |
BARC, BLDR, BCMN |
An alarm is triggered when an LDR or ARC service loses connection to a DDS service. If this occurs, ingest or retrieve transactions can't be processed. If the unavailability of DDS services is only a brief transient issue, transactions can be delayed. Check and restore connections to a DDS service to clear this alarm and return the service to full functionality. |
||
ACTS |
Cloud Tiering Service Status |
ARC |
Only available for Archive Nodes with a Target Type of Cloud Tiering - Simple Storage Service (S3). If the ACTS attribute for the Archive Node is set to Read-Only Enabled or Read-Write Disabled, you must set the attribute to Read-Write Enabled. If a major alarm is triggered due to an authentication failure, verify the credentials associated with destination bucket and update values, if necessary. If a major alarm is triggered due to any other reason, contact technical support. |
||
ADCA |
ADC Status |
ADC |
If an alarm is triggered, select SUPPORT > Tools > Grid topology. Then select site > grid node > ADC > Overview > Main and ADC > Alarms > Main to determine the cause of the alarm. If the problem persists, contact technical support. |
||
ADCE |
ADC State |
ADC |
If the value of ADC State is Standby, continue monitoring the service and if the problem persists, contact technical support. If the value of ADC State is Offline, restart the service. If the problem persists, contact technical support. |
||
AITE |
Retrieve State |
BARC |
Only available for Archive Node's with a Target Type of Tivoli Storage Manager (TSM). If the value of Retrieve State is Waiting for Target, check the TSM middleware server and ensure that it is operating correctly. If the Archive Node has just been added to the StorageGRID system, ensure that the Archive Node's connection to the targeted external archival storage system is configured correctly. If the value of Archive Retrieve State is Offline, attempt to update the state to Online. Select SUPPORT > Tools > Grid topology. Then select site > grid node > ARC > Retrieve > Configuration > Main, select Archive Retrieve State > Online, and click Apply Changes. If the problem persists, contact technical support. |
||
AITU |
Retrieve Status |
BARC |
If the value of Retrieve Status is Target Error, check the targeted external archival storage system for errors. If the value of Archive Retrieve Status is Session Lost, check the targeted external archival storage system to ensure it is online and operating correctly. Check the network connection with the target. If the value of Archive Retrieve Status is Unknown Error, contact technical support. |
||
ALIS |
Inbound Attribute Sessions |
ADC |
If the number of inbound attribute sessions on an attribute relay grows too large, it can be an indication that the StorageGRID system has become unbalanced. Under normal conditions, attribute sessions should be evenly distributed amongst ADC services. An imbalance can lead to performance issues. If the problem persists, contact technical support. |
||
ALOS |
Outbound Attribute Sessions |
ADC |
The ADC service has a high number of attribute sessions, and is becoming overloaded. If this alarm is triggered, contact technical support. |
||
ALUR |
Unreachable Attribute Repositories |
ADC |
Check network connectivity with the NMS service to ensure that the service can contact the attribute repository. If this alarm is triggered and network connectivity is good, contact technical support. |
||
AMQS |
Audit Messages Queued |
BADC, BAMS, BARC, BCLB, BCMN, BLDR, BNMS, BDDS |
If audit messages can't be immediately forwarded to an audit relay or repository, the messages are stored in a disk queue. If the disk queue becomes full, outages can occur. To allow you to respond in time to prevent an outage, AMQS alarms are triggered when the number of messages in the disk queue reaches the following thresholds:
If an AMQS alarm is triggered, check the load on the system—if there have been a significant number of transactions, the alarm should resolve itself over time. In this case, you can ignore the alarm. If the alarm persists and increases in severity, view a chart of the queue size. If the number is steadily increasing over hours or days, the audit load has likely exceeded the audit capacity of the system. Reduce the client operation rate or decrease the number of audit messages logged by changing the audit level to Error or Off. See Configure audit messages and log destinations. |
||
AOTE |
Store State |
BARC |
Only available for Archive Node's with a Target Type of Tivoli Storage Manager (TSM). If the value of Store State is Waiting for Target, check the external archival storage system and ensure that it is operating correctly. If the Archive Node has just been added to the StorageGRID system, ensure that the Archive Node's connection to the targeted external archival storage system is configured correctly. If the value of Store State is Offline, check the value of Store Status. Correct any problems before moving the Store State back to Online. |
||
AOTU |
Store Status |
BARC |
If the value of Store Status is Session Lost check that the external archival storage system is connected and online. If the value of Target Error, check the external archival storage system for errors. If the value of Store Status is Unknown Error, contact technical support. |
||
APMS |
Storage Multipath Connectivity |
SSM |
If the multipath state alarm appears as "Degraded" (select SUPPORT > Tools > Grid topology, then select site > grid node > SSM > Events), do the following:
|
||
ARCE |
ARC State |
ARC |
The ARC service has a state of Standby until all ARC components (Replication, Store, Retrieve, Target) have started. It then transitions to Online. If the value of ARC State does not transition from Standby to Online, check the status of the ARC components. If the value of ARC State is Offline, restart the service. If the problem persists, contact technical support. |
||
AROQ |
Objects Queued |
ARC |
This alarm can be triggered if the removable storage device is running slowly due to problems with the targeted external archival storage system, or if it encounters multiple read errors. Check the external archival storage system for errors, and ensure that it is operating correctly. In some cases, this error can occur as a result of a high rate of data requests. Monitor the number of objects queued as system activity declines. |
||
ARRF |
Request Failures |
ARC |
If a retrieval from the targeted external archival storage system fails, the Archive Node retries the retrieval as the failure can be due to a transient issue. However, if the object data is corrupt or has been marked as being permanently unavailable, the retrieval does not fail. Instead, the Archive Node continuously retries the retrieval and the value for Request Failures continues to increase. This alarm can indicate that the storage media holding the requested data is corrupt. Check the external archival storage system to further diagnose the problem. If you determine that the object data is no longer in the archive, the object will have to be removed from the StorageGRID system. For more information, contact technical support. Once the problem that triggered this alarm is addressed, reset the failures count. Select SUPPORT > Tools > Grid topology. Then select site > grid node > ARC > Retrieve > Configuration > Main, select Reset Request Failure Count and click Apply Changes. |
||
ARRV |
Verification Failures |
ARC |
To diagnose and correct this problem, contact technical support. After the problem that triggered this alarm is addressed, reset the failures count. Select SUPPORT > Tools > Grid topology. Then select site > grid node > ARC > Retrieve > Configuration > Main, select Reset Verification Failure Count and click Apply Changes. |
||
ARVF |
Store Failures |
ARC |
This alarm can occur as a result of errors with the targeted external archival storage system. Check the external archival storage system for errors, and ensure that it is operating correctly. Once the problem that triggered this alarm is addressed, reset the failures count. Select SUPPORT > Tools > Grid topology. Then select site > grid node > ARC > Retrieve > Configuration > Main, select Reset Store Failure Count, and click Apply Changes. |
||
ASXP |
Audit Shares |
AMS |
An alarm is triggered if the value of Audit Shares is Unknown. This alarm can indicate a problem with the installation or configuration of the Admin Node. If the problem persists, contact technical support. |
||
AUMA |
AMS Status |
AMS |
If the value of AMS Status is DB Connectivity Error, restart the grid node. If the problem persists, contact technical support. |
||
AUME |
AMS State |
AMS |
If the value of AMS State is Standby, continue monitoring the StorageGRID system. If the problem persists, contact technical support. If the value of AMS State is Offline, restart the service. If the problem persists, contact technical support. |
||
AUXS |
Audit Export Status |
AMS |
If an alarm is triggered, correct the underlying problem, and then restart the AMS service. If the problem persists, contact technical support. |
||
BADD |
Storage Controller Failed Drive Count |
SSM |
This alarm is triggered when one or more drives in a StorageGRID appliance has failed or is not optimal. Replace the drives as required. |
||
BASF |
Available Object Identifiers |
CMN |
When a StorageGRID system is provisioned, the CMN service is allocated a fixed number of object identifiers. This alarm is triggered when the StorageGRID system begins to exhaust its supply of object identifiers. To allocate more identifiers, contact technical support. |
||
BASS |
Identifier Block Allocation Status |
CMN |
By default, an alarm is triggered when object identifiers can't be allocated because ADC quorum can't be reached. Identifier block allocation on the CMN service requires a quorum (50% + 1) of the ADC services to be online and connected. If quorum is unavailable, the CMN service is unable to allocate new identifier blocks until ADC quorum is reestablished. If ADC quorum is lost, there is generally no immediate impact on the StorageGRID system (clients can still ingest and retrieve content), as approximately one month's supply of identifiers are cached elsewhere in the grid; however, if the condition continues, the StorageGRID system will lose the ability to ingest new content. If an alarm is triggered, investigate the reason for the loss of ADC quorum (for example, it can be a network or Storage Node failure) and take corrective action. If the problem persists, contact technical support. |
||
BRDT |
Compute Controller Chassis Temperature |
SSM |
An alarm is triggered if the temperature of the compute controller in a StorageGRID appliance exceeds a nominal threshold. Check hardware components and environmental issues for overheated condition. If necessary, replace the component. |
||
BTOF |
Offset |
BADC, BLDR, BNMS, BAMS, BCLB, BCMN, BARC |
An alarm is triggered if the service time (seconds) differs significantly from the operating system time. Under normal conditions, the service should resynchronize itself. If the service time drifts too far from the operating system time, system operations can be affected. Confirm that the StorageGRID system's time source is correct. If the problem persists, contact technical support. |
||
BTSE |
Clock State |
BADC, BLDR, BNMS, BAMS, BCLB, BCMN, BARC |
An alarm is triggered if the service's time is not synchronized with the time tracked by the operating system. Under normal conditions, the service should resynchronize itself. If the time drifts too far from operating system time, system operations can be affected. Confirm that the StorageGRID system's time source is correct. If the problem persists, contact technical support. |
||
CAHP |
Java Heap Usage Percent |
DDS |
An alarm is triggered if Java is unable to perform garbage collection at a rate that allows enough heap space for the system to properly function. An alarm might indicate a user workload that exceeds the resources available across the system for the DDS metadata store. Check the ILM Activity in the dashboard, or select SUPPORT > Tools > Grid topology, then select site > grid node > DDS > Resources > Overview > Main. If the problem persists, contact technical support. |
||
CASA |
Data Store Status |
DDS |
An alarm is raised if the Cassandra metadata store becomes unavailable. Check the status of Cassandra:
This alarm might also indicate that the metadata store (Cassandra database) for a Storage Node requires rebuilding. See information about troubleshooting the Services: Status - Cassandra (SVST) alarm in Troubleshoot metadata issues. If the problem persists, contact technical support. |
||
CASE |
Data Store State |
DDS |
This alarm is triggered during installation or expansion to indicate a new data store is joining the grid. |
||
CCNA |
Compute Hardware |
SSM |
This alarm is triggered if the status of the compute controller hardware in a StorageGRID appliance is Needs Attention. |
||
CDLP |
Metadata Used Space (Percent) |
DDS |
This alarm is triggered when the Metadata Effective Space (CEMS) reaches 70% full (minor alarm), 90% full (major alarm), and 100% full (critical alarm). If this alarm reaches the 90% threshold, a warning appears on the dashboard in the Grid Manager. You must perform an expansion procedure to add new Storage Nodes as soon as possible. See Expand a grid. If this alarm reaches the 100% threshold, you must stop ingesting objects and add Storage Nodes immediately. Cassandra requires a certain amount of space to perform essential operations such as compaction and repair. These operations will be impacted if object metadata uses more than 100% of the allowed space. Undesirable results can occur. Note: Contact technical support if you are unable to add Storage Nodes. After new Storage Nodes are added, the system automatically rebalances object metadata across all Storage Nodes, and the alarm clears. Also see information about troubleshooting the Low metadata storage alert in Troubleshoot metadata issues. If the problem persists, contact technical support. |
||
CMNA |
CMN Status |
CMN |
If the value of CMN Status is Error, select SUPPORT > Tools > Grid topology, then select site > grid node > CMN > Overview > Main and CMN > Alarms > Main to determine the cause of the error and to troubleshoot the problem. An alarm is triggered and the value of CMN Status is No Online CMN during a hardware refresh of the primary Admin Node when the CMNs are switched (the value of the old CMN State is Standby and the new is Online). If the problem persists, contact technical support. |
||
CPRC |
Remaining Capacity |
NMS |
An alarm is triggered if the remaining capacity (number of available connections that can be opened to the NMS database) falls below the configured alarm severity. If an alarm is triggered, contact technical support. |
||
CPSA |
Compute Controller Power Supply A |
SSM |
An alarm is triggered if there is an issue with power supply A in the compute controller for a StorageGRID appliance. If necessary, replace the component. |
||
CPSB |
Compute Controller Power Supply B |
SSM |
An alarm is triggered if there is an issue with power supply B in the compute controller for a StorageGRID appliance. If necessary, replace the component. |
||
CPUT |
Compute Controller CPU Temperature |
SSM |
An alarm is triggered if the temperature of the CPU in the compute controller in a StorageGRID appliance exceeds a nominal threshold. If the Storage Node is a StorageGRID appliance, the StorageGRID system indicates that the controller needs attention. Check hardware components and environment issues for overheated condition. If necessary, replace the component. |
||
DNST |
DNS Status |
SSM |
After installation completes, a DNST alarm is triggered in the SSM service. After the DNS is configured and the new server information reaches all grid nodes, the alarm is canceled. |
||
ECCD |
Corrupt Fragments Detected |
LDR |
An alarm is triggered when the background verification process detects a corrupt erasure-coded fragment. If a corrupt fragment is detected, an attempt is made to rebuild the fragment. Reset the Corrupt Fragments Detected and Copies Lost attributes to zero and monitor them to see if counts go up again. If counts do go up, there might be a problem with the Storage Node's underlying storage. A copy of erasure-coded object data is not considered missing until such time that the number of lost or corrupt fragments breaches the erasure code's fault tolerance; therefore, it is possible to have corrupt fragment and to still be able to retrieve the object. If the problem persists, contact technical support. |
||
ECST |
Verification Status |
LDR |
This alarm indicates the current status of the background verification process for erasure-coded object data on this Storage Node. A major alarm is triggered if there is an error in the background verification process. |
||
FOPN |
Open File Descriptors |
BADC, BAMS, BARC, BCLB, BCMN, BLDR, BNMS, BSSM, BDDS |
FOPN can become large during peak activity. If it does not diminish during periods of slow activity, contact technical support. |
||
HSTE |
HTTP State |
BLDR |
See recommended actions for HSTU. |
||
HSTU |
HTTP Status |
BLDR |
HSTE and HSTU are related to HTTP for all LDR traffic, including S3, Swift, and other internal StorageGRID traffic. An alarm indicates that one of the following situations has occurred:
The Auto-Start HTTP attribute is enabled by default. If this setting is changed, HTTP could remain offline after a restart. If necessary, wait for the LDR service to restart. Select SUPPORT > Tools > Grid topology. Then select Storage Node > LDR > Configuration. If HTTP is offline, place it online. Verify that the Auto-Start HTTP attribute is enabled. If HTTP remains offline, contact technical support. |
||
HTAS |
Auto-Start HTTP |
LDR |
Specifies whether to start HTTP services automatically on start-up. This is a user-specified configuration option. |
||
IRSU |
Inbound Replication Status |
BLDR, BARC |
An alarm indicates that inbound replication has been disabled. Confirm configuration settings: Select SUPPORT > Tools > Grid topology. Then select site > grid node > LDR > Replication > Configuration > Main. |
||
LATA |
Average Latency |
NMS |
Check for connectivity issues. Check system activity to confirm that there is an increase in system activity. An increase in system activity will result in an increase to attribute data activity. This increased activity will result in a delay to the processing of attribute data. This can be normal system activity and will subside. Check for multiple alarms. An increase in average latency times can be indicated by an excessive number of triggered alarms. If the problem persists, contact technical support. |
||
LDRE |
LDR State |
LDR |
If the value of LDR State is Standby, continue monitoring the situation and if the problem persists, contact technical support. If the value of LDR State is Offline, restart the service. If the problem persists, contact technical support. |
||
LOST |
Lost Objects |
DDS, LDR |
Triggered when the StorageGRID system fails to retrieve a copy of the requested object from anywhere in the system. Before a LOST (Lost Objects) alarm is triggered, the system attempts to retrieve and replace a missing object from elsewhere in the system. Lost objects represent a loss of data. The Lost Objects attribute is incremented whenever the number of locations for an object drops to zero without the DDS service purposely purging the content to satisfy the ILM policy. Investigate LOST (LOST Object) alarms immediately. If the problem persists, contact technical support. |
||
MCEP |
Management Interface Certificate Expiry |
CMN |
Triggered when the certificate used for accessing the management interface is about to expire.
|
||
MINQ |
E-mail Notifications Queued |
NMS |
Check the network connections of the servers hosting the NMS service and the external mail server. Also confirm that the email server configuration is correct. |
||
MINS |
E-mail Notifications Status |
BNMS |
A minor alarm is triggered if the NMS service is unable to connect to the mail server. Check the network connections of the servers hosting the NMS service and the external mail server. Also confirm that the email server configuration is correct. |
||
MISS |
NMS Interface Engine Status |
BNMS |
An alarm is triggered if the NMS interface engine on the Admin Node that gathers and generates interface content is disconnected from the system. Check Server Manager to determine if the server individual application is down. |
||
NANG |
Network Auto Negotiate Setting |
SSM |
Check the network adapter configuration. The setting must match preferences of your network routers and switches. An incorrect setting can have a severe impact on system performance. |
||
NDUP |
Network Duplex Setting |
SSM |
Check the network adapter configuration. The setting must match preferences of your network routers and switches. An incorrect setting can have a severe impact on system performance. |
||
NLNK |
Network Link Detect |
SSM |
Check the network cable connections on the port and at the switch. Check the network router, switch, and adapter configurations. Restart the server. If the problem persists, contact technical support. |
||
NRER |
Receive Errors |
SSM |
The following can be causes of NRER alarms:
See information about troubleshooting the Network Receive Error (NRER) alarm in Troubleshoot network, hardware, and platform issues. |
||
NRLY |
Available Audit Relays |
BADC, BARC, BCLB, BCMN, BLDR, BNMS, BDDS |
If audit relays aren't connected to ADC services, audit events can't be reported. They are queued and unavailable to users until the connection is restored. Restore connectivity to an ADC service as soon as possible. If the problem persists, contact technical support. |
||
NSCA |
NMS Status |
NMS |
If the value of NMS Status is DB Connectivity Error, restart the service. If the problem persists, contact technical support. |
||
NSCE |
NMS State |
NMS |
If the value of NMS State is Standby, continue monitoring and if the problem persists, contact technical support. If the value of NMS State is Offline, restart the service. If the problem persists, contact technical support. |
||
NSPD |
Speed |
SSM |
This can be caused by network connectivity or driver compatibility issues. If the problem persists, contact technical support. |
||
NTBR |
Free Tablespace |
NMS |
If an alarm is triggered, check how fast database usage has been changing. A sudden drop (as opposed to a gradual change over time) indicates an error condition. If the problem persists, contact technical support. Adjusting the alarm threshold allows you to proactively manage when additional storage needs to be allocated. If the available space reaches a low threshold (see alarm threshold), contact technical support to change the database allocation. |
||
NTER |
Transmit Errors |
SSM |
These errors can clear without being manually reset. If they don't clear, check network hardware. Check that the adapter hardware and driver are correctly installed and configured to work with your network routers and switches. When the underlying problem is resolved, reset the counter. Select SUPPORT > Tools > Grid topology. Then select site > grid node > SSM > Resources > Configuration > Main, select Reset Transmit Error Count, and click Apply Changes. |
||
NTFQ |
NTP Frequency Offset |
SSM |
If the frequency offset exceeds the configured threshold, there is likely a hardware problem with the local clock. If the problem persists, contact technical support to arrange a replacement. |
||
NTLK |
NTP Lock |
SSM |
If the NTP daemon is not locked to an external time source, check network connectivity to the designated external time sources, their availability, and their stability. |
||
NTOF |
NTP Time Offset |
SSM |
If the time offset exceeds the configured threshold, there is likely a hardware problem with the oscillator of the local clock. If the problem persists, contact technical support to arrange a replacement. |
||
NTSJ |
Chosen Time Source Jitter |
SSM |
This value indicates the reliability and stability of the time source that NTP on the local server is using as its reference. If an alarm is triggered, it can be an indication that the time source's oscillator is defective, or that there is a problem with the WAN link to the time source. |
||
NTSU |
NTP Status |
SSM |
If the value of NTP Status is Not Running, contact technical support. |
||
OPST |
Overall Power Status |
SSM |
An alarm is triggered if the power of a StorageGRID appliance deviates from the recommended operating voltage. Check the status of Power Supply A or B to determine which power supply is operating abnormally. If necessary, replace the power supply. |
||
OQRT |
Objects Quarantined |
LDR |
After the objects are automatically restored by the StorageGRID system, the quarantined objects can be removed from the quarantine directory.
The quarantined objects are removed, and the count is reset to zero. |
||
ORSU |
Outbound Replication Status |
BLDR, BARC |
An alarm indicates that outbound replication is not possible: storage is in a state where objects can't be retrieved. An alarm is triggered if outbound replication is disabled manually. Select SUPPORT > Tools > Grid topology. Then select site > grid node > LDR > Replication > Configuration. An alarm is triggered if the LDR service is unavailable for replication. Select SUPPORT > Tools > Grid topology. Then select site > grid node > LDR > Storage. |
||
OSLF |
Shelf Status |
SSM |
An alarm is triggered if the status of one of the components in the storage shelf for a storage appliance is degraded. Storage shelf components include the IOMs, fans, power supplies, and drive drawers.If this alarm is triggered, see the maintenance instructions for your appliance. |
||
PMEM |
Service Memory Usage (Percent) |
BADC, BAMS, BARC, BCLB, BCMN, BLDR, BNMS, BSSM, BDDS |
Can have a value of Over Y% RAM, where Y represents the percentage of memory being used by the server. Figures under 80% are normal. Over 90% is considered a problem. If memory usage is high for a single service, monitor the situation and investigate. If the problem persists, contact technical support. |
||
PSAS |
Power Supply A Status |
SSM |
An alarm is triggered if power supply A in a StorageGRID appliance deviates from the recommended operating voltage. If necessary, replace power supply A. |
||
PSBS |
Power Supply B Status |
SSM |
An alarm is triggered if power supply B in a StorageGRID appliance deviates from the recommended operating voltage. If necessary, replace the power supply B. |
||
RDTE |
Tivoli Storage Manager State |
BARC |
Only available for Archive Nodes with a Target Type of Tivoli Storage Manager (TSM). If the value of Tivoli Storage Manager State is Offline, check Tivoli Storage Manager Status and resolve any problems. Bring the component back online. Select SUPPORT > Tools > Grid topology. Then select site > grid node > ARC > Target > Configuration > Main, select Tivoli Storage Manager State > Online, and click Apply Changes. |
||
RDTU |
Tivoli Storage Manager Status |
BARC |
Only available for Archive Nodes with a Target Type of Tivoli Storage Manager (TSM). If the value of Tivoli Storage Manager Status is Configuration Error and the Archive Node has just been added to the StorageGRID system, ensure that the TSM middleware server is correctly configured. If the value of Tivoli Storage Manager Status is Connection Failure, or Connection Failure, Retrying, check the network configuration on the TSM middleware server, and the network connection between the TSM middleware server and the StorageGRID system. If the value of Tivoli Storage Manager Status is Authentication Failure, or Authentication Failure, Reconnecting, the StorageGRID system can connect to the TSM middleware server, but can't authenticate the connection. Check that the TSM middleware server is configured with the correct user, password, and permissions, and restart the service. If the value of Tivoli Storage Manager Status is Session Failure, an established session has been lost unexpectedly. Check the network connection between the TSM middleware server and the StorageGRID system. Check the middleware server for errors. If the value of Tivoli Storage Manager Status is Unknown Error, contact technical support. |
||
RIRF |
Inbound Replications — Failed |
BLDR, BARC |
An Inbound Replications — Failed alarm can occur during periods of high load or temporary network disruptions. After system activity reduces, this alarm should clear. If the count of failed replications continues to increase, look for network problems and verify that the source and destination LDR and ARC services are online and available. To reset the count, select SUPPORT > Tools > Grid topology, then select site > grid node > LDR > Replication > Configuration > Main. Select Reset Inbound Replication Failure Count, and click Apply Changes. |
||
RIRQ |
Inbound Replications — Queued |
BLDR, BARC |
Alarms can occur during periods of high load or temporary network disruption. After system activity reduces, this alarm should clear. If the count for queued replications continues to increase, look for network problems and verify that the source and destination LDR and ARC services are online and available. |
||
RORQ |
Outbound Replications — Queued |
BLDR, BARC |
The outbound replication queue contains object data being copied to satisfy ILM rules and objects requested by clients. An alarm can occur as a result of a system overload. Wait to see if the alarm clears when system activity declines. If the alarm recurs, add capacity by adding Storage Nodes. |
||
SAVP |
Total Usable Space (Percent) |
LDR |
If usable space reaches a low threshold, options include expanding the StorageGRID system or move object data to archive through an Archive Node. |
||
SCAS |
Status |
CMN |
If the value of Status for the active grid task is Error, look up the grid task message. Select SUPPORT > Tools > Grid topology. Then select site > grid node > CMN > Grid Tasks > Overview > Main. The grid task message displays information about the error (for example, "check failed on node 12130011"). After you have investigated and corrected the problem, restart the grid task. Select SUPPORT > Tools > Grid topology. Then select site > grid node > CMN > Grid Tasks > Configuration > Main, and select Actions > Run. If the value of Status for a grid task being stopped is Error, retry ending the grid task. If the problem persists, contact technical support. |
||
SCEP |
Storage API Service Endpoints Certificate Expiry |
CMN |
Triggered when the certificate used for accessing storage API endpoints is about to expire.
|
||
SCHR |
Status |
CMN |
If the value of Status for the historical grid task is Aborted, investigate the reason and run the task again if required. If the problem persists, contact technical support. |
||
SCSA |
Storage Controller A |
SSM |
An alarm is triggered if there is an issue with storage controller A in a StorageGRID appliance. If necessary, replace the component. |
||
SCSB |
Storage Controller B |
SSM |
An alarm is triggered if there is an issue with storage controller B in a StorageGRID appliance. If necessary, replace the component. Some appliance models don't have a storage controller B. |
||
SHLH |
Health |
LDR |
If the value of Health for an object store is Error, check and correct:
|
||
SLSA |
CPU Load Average |
SSM |
The higher the value the busier the system. If the CPU Load Average persists at a high value, the number of transactions in the system should be investigated to determine whether this is due to heavy load at the time. View a chart of the CPU load average: Select SUPPORT > Tools > Grid topology. Then select site > grid node > SSM > Resources > Reports > Charts. If the load on the system is not heavy and the problem persists, contact technical support. |
||
SMST |
Log Monitor State |
SSM |
If the value of Log Monitor State is not Connected for a persistent period of time, contact technical support. |
||
SMTT |
Total Events |
SSM |
If the value of Total Events is greater than zero, check if there are known events (such as network failures) that can be the cause. Unless these errors have been cleared (that is, the count has been reset to 0), Total Events alarms can be triggered. When an issue is resolved, reset the counter to clear the alarm. Select NODES > site > grid node > Events > Reset event counts.
If the value of Total Events is zero, or the number increases and the problem persists, contact technical support. |
||
SNST |
Status |
CMN |
An alarm indicates that there is a problem storing the grid task bundles. If the value of Status is Checkpoint Error or Quorum Not Reached, confirm that a majority of ADC services are connected to the StorageGRID system (50 percent plus one) and then wait a few minutes. If the problem persists, contact technical support. |
||
SOSS |
Storage Operating System Status |
SSM |
An alarm is triggered if SANtricity OS indicates that there is a "Needs attention" issue with a component in a StorageGRID appliance. Select NODES. Then select appliance Storage Node > Hardware. Scroll down to view the status of each component. In SANtricity OS, check other appliance components to isolate the issue. |
||
SSMA |
SSM Status |
SSM |
If the value of SSM Status is Error, select SUPPORT > Tools > Grid topology, then select site > grid node > SSM > Overview > Main and SSM > Overview > Alarms to determine the cause of the alarm. If the problem persists, contact technical support. |
||
SSME |
SSM State |
SSM |
If the value of SSM State is Standby, continue monitoring, and if the problem persists, contact technical support. If the value of SSM State is Offline, restart the service. If the problem persists, contact technical support. |
||
SSTS |
Storage Status |
BLDR |
If the value of Storage Status is Insufficient Usable Space, there is no more available storage on the Storage Node and data ingests are redirected to other available Storage Node. Retrieval requests can continue to be delivered from this grid node. Additional storage should be added. It is not impacting end user functionality, but the alarm persists until additional storage is added. If the value of Storage Status is Volume(s) Unavailable, a part of the storage is unavailable. Storage and retrieval from these volumes is not possible. Check the volume's Health for more information: Select SUPPORT > Tools > Grid topology. Then select site > grid node > LDR > Storage > Overview > Main. The volume's Health is listed under Object Stores. If the value of Storage Status is Error, contact technical support. |
||
SVST |
Status |
SSM |
This alarm clears when other alarms related to a non-running service are resolved. Track the source service alarms to restore operation. Select SUPPORT > Tools > Grid topology. Then select site > grid node > SSM > Services > Overview > Main. When the status of a service is shown as Not Running, its state is Administratively Down. The service's status can be listed as Not Running for the following reasons:
If a service is listed as Not Running, restart the service ( This alarm might also indicate that the metadata store (Cassandra database) for a Storage Node requires rebuilding. If the problem persists, contact technical support. |
||
TMEM |
Installed Memory |
SSM |
Nodes running with less than 24 GiB of installed memory can lead to performance problems and system instability. The amount of memory installed on the system should be increased to at least 24 GiB. |
||
TPOP |
Pending Operations |
ADC |
A queue of messages can indicate that the ADC service is overloaded. Too few ADC services can be connected to the StorageGRID system. In a large deployment, the ADC service can require adding computational resources, or the system can require additional ADC services. |
||
UMEM |
Available Memory |
SSM |
If the available RAM gets low, determine whether this is a hardware or software issue. If it is not a hardware issue, or if available memory falls below 50 MB (the default alarm threshold), contact technical support. |
||
VMFI |
Entries Available |
SSM |
This is an indication that additional storage is required. Contact technical support. |
||
VMFR |
Space Available |
SSM |
If the value of Space Available gets too low (see alarm thresholds), it needs to be investigated as to whether there are log files growing out of proportion, or objects taking up too much disk space (see alarm thresholds) that need to be reduced or deleted. If the problem persists, contact technical support. |
||
VMST |
Status |
SSM |
An alarm is triggered if the value of Status for the mounted volume is Unknown. A value of Unknown or Offline can indicate that the volume can't be mounted or accessed due to a problem with the underlying storage device. |
||
VPRI |
Verification Priority |
BLDR, BARC |
By default, the value of Verification Priority is Adaptive. If Verification Priority is set to High, an alarm is triggered because storage verification can slow normal operations of the service. |
||
VSTU |
Object Verification Status |
BLDR |
Select SUPPORT > Tools > Grid topology. Then select site > grid node > LDR > Storage > Overview > Main. Check the operating system for any signs of block-device or file system errors. If the value of Object Verification Status is Unknown Error, it usually indicates a low-level file system or hardware problem (I/O error) that prevents the Storage Verification task from accessing stored content. Contact technical support. |
||
XAMS |
Unreachable Audit Repositories |
BADC, BARC, BCLB, BCMN, BLDR, BNMS |
Check network connectivity to the server hosting the Admin Node. If the problem persists, contact technical support. |