Audit message flow and retention

You should understand when audit messages are generated and how they are processed, so you will know how to use the information in the audit.log file.

Audit messages are generated when StorageGRID Webscale services perform their various activities and process events. Audit messages provide a record of these activities. Audit messages are processed by the Audit Management System (AMS) service, which is hosted by the Admin Node, and they are stored in the form of text log files.

Audit message flow

All StorageGRID Webscale services generate audit messages during normal system operation. These messages are sent to all connected AMS services for processing and storage, so that each AMS service can maintain a complete record of system activity.

Certain services are designated as audit message relay services. Relay services act as collection points, so that every service does not need to send its audit messages to all connected AMS services. As shown in the audit message flow diagram, each service sends its messages to just one relay service; however, each relay service sends its messages to all AMS service destinations,

Diagram that summarizes audit message flow through relays

When you install StorageGRID Webscale and approve grid nodes, the Administrative Domain Controller (ADC) service is automatically enabled for the first three Storage Nodes at each site. These three ADC services act as the audit message relays at each site.

Message retention

After an audit message is generated, it is stored on the grid node of the originating service until it has been committed to all connected AMS services, or a designated audit relay service. The relays in turn store the message until it is committed at all AMS services. This process includes a confirmation (positive acknowledgment) to ensure that no messages are lost.

diagram that summarizes audit message receipt at the AMS

Messages arrive at the AMS service and are stored in a queue pending a confirmed write to the audit log file (audit.log). Confirmation of the arrival of messages is sent to the originating service (or audit relay) to permit the originator to delete its copy of the message.

A message can only be removed from the queue after it has been committed to storage at the AMS service. If the backlog becomes unusually large, the local message buffer at the audit relay service (ADC) and the AMS service each have an alarm (AMQS) associated with it. During peak activity, the rate at which audit messages arrive can be faster than they can be relayed to the audit repository on the AMS service or committed to storage in the audit log file, causing a temporary backlog that clears itself when system activity declines.

Audit logs files are saved to the Admin Node’s /var/local/audit/export directory. The active audit log file is named audit.log.

Once a day, the active audit.log file is saved, and a new audit.log file is started. The name of the saved file indicates when it was saved, in the format yyyy-mm-dd.txt. If more than one audit log is created in a single day, the file names use the date the file was saved, appended by a number, in the format yyyy-mm-dd.txt.n. For example, 2018-04-15.txt.1 and 2018-04-15.txt.2 are the first and second log files created and saved on 15 April 2018.

After a day, the saved file is compressed and renamed, in the format yyyy-mm-dd.txt.gz, which preserves the original date. Over time, this results in the consumption of storage allocated for audit logs on the Admin Node. A script monitors the audit log space consumption and deletes log files as necessary to free space in the /var/local/audit/export directory. Audit logs are deleted based on the date they were created, with the oldest being deleted first. You can monitor the script's actions in the manage-audit.log file.

This example shows the active audit.log file, the previous day's file (2018-04-15.txt), and the compressed file for the prior day (2018-04-14.txt.gz).


Duplicate messages

Audit messages are queued for storage by the AMS service. If system communications are interrupted (for example, because of service failures or network interruptions), the write status of some audit messages might be in doubt. The StorageGRID Webscale system takes a conservative approach in this case: all queued audit messages are resubmitted to the AMS service. This can result in duplicate messages in the audit log.

If duplicate messages are a cause for concern (for example, if the audit log is used for billing applications), you must detect and discard duplicate audit messages manually. To detect duplicate audit messages, you use the audit sequence count number (ASQN). Duplicate messages have the same ASQN.