Using the audit-sum tool

11/22/2022 Contributors

You can use the audit-sum tool to count the write, read, head, and delete audit messages and to see the minimum, maximum, and average time (or size) for each operation type.

What you'll need

You must have specific access permissions.
You must have the Passwords.txt file.
You must know the IP address of the primary Admin Node.

About this task

The audit-sum tool, available on the primary Admin Node, summarizes how many write, read, and delete operations were logged and how long these operations took.

The audit-sum tool is primarily intended for use by technical support during troubleshooting operations. Processing audit-sum queries can consume a large amount of CPU power, which might impact StorageGRID operations.

This example shows typical output from the audit-sum tool. This example shows how long protocol operations took.

  message group           count     min(sec)        max(sec)    average(sec)
  =============           =====     ========        ========    ============
  IDEL                      274
  SDEL                   213371        0.004          20.934           0.352
  SGET                   201906        0.010        1740.290           1.132
  SHEA                    22716        0.005           2.349           0.272
  SPUT                  1771398        0.011        1770.563           0.487

The audit-sum tool provides counts and times for the following S3, Swift, and ILM audit messages in an audit log:

Code	Description	Refer to
ARCT	Archive Retrieve from Cloud-Tier	ARCT: Archive Retrieve from Cloud-Tier
ASCT	Archive Store Cloud-Tier	ASCT: Archive Store Cloud-Tier
IDEL	ILM Initiated Delete: Logs when ILM starts the process of deleting an object.	IDEL: ILM Initiated Delete
SDEL	S3 DELETE: Logs a successful transaction to delete an object or bucket.	SDEL: S3 DELETE
SGET	S3 GET: Logs a successful transaction to retrieve an object or list the objects in a bucket.	SGET: S3 GET
SHEA	S3 HEAD: Logs a successful transaction to check for the existence of an object or bucket.	SHEA: S3 HEAD
SPUT	S3 PUT: Logs a successful transaction to create a new object or bucket.	SPUT: S3 PUT
WDEL	Swift DELETE: Logs a successful transaction to delete an object or container.	WDEL: Swift DELETE
WGET	Swift GET: Logs a successful transaction to retrieve an object or list the objects in a container.	WGET: Swift GET
WHEA	Swift HEAD: Logs a successful transaction to check for the existence of an object or container.	WHEA: Swift HEAD
WPUT	Swift PUT: Logs a successful transaction to create a new object or container.	WPUT: Swift PUT

Code

Description

Refer to

ARCT

Archive Retrieve from Cloud-Tier

ARCT: Archive Retrieve from Cloud-Tier

ASCT

Archive Store Cloud-Tier

ASCT: Archive Store Cloud-Tier

IDEL

ILM Initiated Delete: Logs when ILM starts the process of deleting an object.

IDEL: ILM Initiated Delete

SDEL

S3 DELETE: Logs a successful transaction to delete an object or bucket.

SDEL: S3 DELETE

SGET

S3 GET: Logs a successful transaction to retrieve an object or list the objects in a bucket.

SGET: S3 GET

SHEA

S3 HEAD: Logs a successful transaction to check for the existence of an object or bucket.

SHEA: S3 HEAD

SPUT

S3 PUT: Logs a successful transaction to create a new object or bucket.

SPUT: S3 PUT

WDEL

Swift DELETE: Logs a successful transaction to delete an object or container.

WDEL: Swift DELETE

WGET

Swift GET: Logs a successful transaction to retrieve an object or list the objects in a container.

WGET: Swift GET

WHEA

Swift HEAD: Logs a successful transaction to check for the existence of an object or container.

WHEA: Swift HEAD

WPUT

Swift PUT: Logs a successful transaction to create a new object or container.

WPUT: Swift PUT

The audit-sum tool can process plain or compressed audit logs. For example:

audit-sum audit.log

audit-sum 2019-08-12.txt.gz

The audit-sum tool can also process multiple files at once. For example:

audit-sum audit.log 2019-08-12.txt.gz 2019-08-13.txt.gz

audit-sum /var/local/audit/export/*

Finally, the audit-sum tool can also accept input from a pipe, which allows you to filter and preprocess the input using the grep command or other means. For example:

grep WGET audit.log | audit-sum

grep bucket1 audit.log | audit-sum

grep SPUT audit.log | grep bucket1 | audit-sum

This tool does not accept compressed files as piped input. To process compressed files, provide their file names as command-line arguments, or use the zcat tool to decompress the files first. For example:

audit-sum audit.log.gz

zcat audit.log.gz | audit-sum

You can use command-line options to summarize operations on buckets separately from operations on objects or to group message summaries by bucket name, by time period, or by target type. By default, the summaries show the minimum, maximum, and average operation time, but you can use the size (-s) option to look at object size instead.

Use the help (-h) option to see the available options. For example:

 $ audit-sum -h

Steps

Log in to the primary Admin Node:
1. Enter the following command: ssh admin@primary_Admin_Node_IP
2. Enter the password listed in the Passwords.txt file.

If you want to analyze all messages related to write, read, head, and delete operations, follow these steps:

Enter the following command, where /var/local/audit/export/audit.log represents the name and the location of the file or files you want to analyze:

$ audit-sum /var/local/audit/export/audit.log

This example shows typical output from the audit-sum tool. This example shows how long protocol operations took.

  message group           count     min(sec)        max(sec)    average(sec)
  =============           =====     ========        ========    ============
  IDEL                      274
  SDEL                   213371        0.004          20.934           0.352
  SGET                   201906        0.010        1740.290           1.132
  SHEA                    22716        0.005           2.349           0.272
  SPUT                  1771398        0.011        1770.563           0.487

In this example, SGET (S3 GET) operations are the slowest on average at 1.13 seconds, but SGET and SPUT (S3 PUT) operations both show long worst-case times of about 1,770 seconds.

To show the slowest 10 retrieval operations, use the grep command to select only SGET messages and add the long output option (-l) to include object paths: grep SGET audit.log | audit-sum -l

The results include the type (object or bucket) and path, which allows you to grep the audit log for other messages relating to these particular objects.

Total:          201906 operations
    Slowest:      1740.290 sec
    Average:         1.132 sec
    Fastest:         0.010 sec
    Slowest operations:
        time(usec)       source ip         type      size(B) path
        ========== =============== ============ ============ ====
        1740289662   10.96.101.125       object   5663711385 backup/r9O1OaQ8JB-1566861764-4519.iso
        1624414429   10.96.101.125       object   5375001556 backup/r9O1OaQ8JB-1566861764-6618.iso
        1533143793   10.96.101.125       object   5183661466 backup/r9O1OaQ8JB-1566861764-4518.iso
             70839   10.96.101.125       object        28338 bucket3/dat.1566861764-6619
             68487   10.96.101.125       object        27890 bucket3/dat.1566861764-6615
             67798   10.96.101.125       object        27671 bucket5/dat.1566861764-6617
             67027   10.96.101.125       object        27230 bucket5/dat.1566861764-4517
             60922   10.96.101.125       object        26118 bucket3/dat.1566861764-4520
             35588   10.96.101.125       object        11311 bucket3/dat.1566861764-6616
             23897   10.96.101.125       object        10692 bucket3/dat.1566861764-4516

From this example output, you can see that the three slowest S3 GET requests were for objects about 5 GB in size, which is much larger than the other objects. The large size accounts for the slow worst-case retrieval times.

If you want to determine what sizes of objects are being ingested into and retrieved from your grid, use the size option (-s):

audit-sum -s audit.log

  message group           count       min(MB)          max(MB)      average(MB)
  =============           =====     ========        ========    ============
  IDEL                      274        0.004        5000.000        1654.502
  SDEL                   213371        0.000          10.504           1.695
  SGET                   201906        0.000        5000.000          14.920
  SHEA                    22716        0.001          10.504           2.967
  SPUT                  1771398        0.000        5000.000           2.495

In this example, the average object size for SPUT is under 2.5 MB, but the average size for SGET is much larger. The number of SPUT messages is much higher than the number of SGET messages, indicating that most objects are never retrieved.

If you want to determine if retrievals were slow yesterday:

Issue the command on the appropriate audit log and use the group-by-time option (-gt), followed by the time period (for example, 15M, 1H, 10S):

 grep SGET audit.log | audit-sum -gt 1H

  message group           count    min(sec)       max(sec)   average(sec)
  =============           =====     ========        ========    ============
  2019-09-05T00            7591        0.010        1481.867           1.254
  2019-09-05T01            4173        0.011        1740.290           1.115
  2019-09-05T02           20142        0.011        1274.961           1.562
  2019-09-05T03           57591        0.010        1383.867           1.254
  2019-09-05T04          124171        0.013        1740.290           1.405
  2019-09-05T05          420182        0.021        1274.511           1.562
  2019-09-05T06         1220371        0.015        6274.961           5.562
  2019-09-05T07          527142        0.011        1974.228           2.002
  2019-09-05T08          384173        0.012        1740.290           1.105
  2019-09-05T09           27591        0.010        1481.867           1.354

These results show that S3 GET traffic spiked between 06:00 and 07:00. The max and average times are both considerably higher at these times as well, and they did not ramp up gradually as the count increased. This suggests that capacity was exceeded somewhere, perhaps in the network or in the grid's ability to process requests.

To determine what size objects were being retrieved each hour yesterday, add the size option (-s) to the command:

grep SGET audit.log | audit-sum -gt 1H -s

  message group           count       min(B)          max(B)      average(B)
  =============           =====     ========        ========    ============
  2019-09-05T00            7591        0.040        1481.867           1.976
  2019-09-05T01            4173        0.043        1740.290           2.062
  2019-09-05T02           20142        0.083        1274.961           2.303
  2019-09-05T03           57591        0.912        1383.867           1.182
  2019-09-05T04          124171        0.730        1740.290           1.528
  2019-09-05T05          420182        0.875        4274.511           2.398
  2019-09-05T06         1220371        0.691  5663711385.961          51.328
  2019-09-05T07          527142        0.130        1974.228           2.147
  2019-09-05T08          384173        0.625        1740.290           1.878
  2019-09-05T09           27591        0.689        1481.867           1.354

These results indicate that some very large retrievals occurred when the overall retrieval traffic was at its maximum.

To see more detail, use the audit-explain tool to review all the SGET operations during that hour:
```
grep 2019-09-05T06 audit.log | grep SGET | audit-explain | less
```
If the output of the grep command is expected to be many lines, add the less command to show the contents of the audit log file one page (one screen) at a time.

If you want to determine if SPUT operations on buckets are slower than SPUT operations for objects:

Start by using the -go option, which groups messages for object and bucket operations separately:

grep SPUT sample.log | audit-sum -go

  message group           count     min(sec)        max(sec)    average(sec)
  =============           =====     ========        ========    ============
  SPUT.bucket                 1        0.125           0.125           0.125
  SPUT.object                12        0.025           1.019           0.236

The results show that SPUT operations for buckets have different performance characteristics than SPUT operations for objects.

To determine which buckets have the slowest SPUT operations, use the -gb option, which groups messages by bucket:

grep SPUT audit.log | audit-sum -gb

  message group                  count     min(sec)        max(sec)    average(sec)
  =============                  =====     ========        ========    ============
  SPUT.cho-non-versioning        71943        0.046        1770.563           1.571
  SPUT.cho-versioning            54277        0.047        1736.633           1.415
  SPUT.cho-west-region           80615        0.040          55.557           1.329
  SPUT.ldt002                  1564563        0.011          51.569           0.361

To determine which buckets have the largest SPUT object size, use both the -gb and the -s options:

grep SPUT audit.log | audit-sum -gb -s

  message group                  count       min(B)          max(B)      average(B)
  =============                  =====     ========        ========    ============
  SPUT.cho-non-versioning        71943        2.097        5000.000          21.672
  SPUT.cho-versioning            54277        2.097        5000.000          21.120
  SPUT.cho-west-region           80615        2.097         800.000          14.433
  SPUT.ldt002                  1564563        0.000         999.972           0.352

Related information

Using the audit-explain tool

Using the audit-sum tool

Creating your file...