Using the audit-sum tool

You can use the audit-sum tool to count the write, read, head, and delete audit messages and to see the minimum, maximum, and average time (or size) for each operation type.

Before you begin

About this task

The audit-sum tool, available on the primary Admin Node, summarizes how many write, read, and delete operations were logged and how long these operations took.

Attention: The audit-sum tool is primarily intended for use by technical support during troubleshooting operations. Processing audit-sum queries can consume a large amount of CPU power, which might impact StorageGRID operations.

This example shows typical output from the audit-sum tool. This example shows how long protocol operations took.

  message group           count     min(sec)        max(sec)    average(sec)
  =============           =====     ========        ========    ============
  IDEL                      274 
  SDEL                   213371        0.004          20.934           0.352
  SGET                   201906        0.010        1740.290           1.132
  SHEA                    22716        0.005           2.349           0.272
  SPUT                  1771398        0.011        1770.563           0.487
The audit-sum tool provides counts and times for the following S3, Swift, and ILM audit messages in an audit log:
Code Description Refer to
ARCT Archive Retrieve from Cloud-Tier ARCT: Archive Retrieve from Cloud-Tier
ASCT Archive Store Cloud-Tier ASCT: Archive Store Cloud-Tier
IDEL ILM Initiated Delete: Logs when ILM starts the process of deleting an object. IDEL: ILM Initiated Delete
SDEL S3 DELETE: Logs a successful transaction to delete an object or bucket. SDEL: S3 DELETE
SGET S3 GET: Logs a successful transaction to retrieve an object or list the objects in a bucket. SGET: S3 GET
SHEA S3 HEAD: Logs a successful transaction to check for the existence of an object or bucket. SHEA: S3 HEAD
SPUT S3 PUT: Logs a successful transaction to create a new object or bucket. SPUT: S3 PUT
WDEL Swift DELETE: Logs a successful transaction to delete an object or container. WDEL: Swift DELETE
WGET Swift GET: Logs a successful transaction to retrieve an object or list the objects in a container. WGET: Swift GET
WHEA Swift HEAD: Logs a successful transaction to check for the existence of an object or container. WHEA: Swift HEAD
WPUT Swift PUT: Logs a successful transaction to create a new object or container. WPUT: Swift PUT

The audit-sum tool can process plain or compressed audit logs. For example:

audit-sum audit.log
audit-sum 2019-08-12.txt.gz
The audit-sum tool can also process multiple files at once. For example:
audit-sum audit.log 2019-08-12.txt.gz 2019-08-13.txt.gz
audit-sum /var/local/audit/export/*
Finally, the audit-sum tool can also accept input from a pipe, which allows you to filter and preprocess the input using the grep command or other means. For example:
grep WGET audit.log | audit-sum
grep bucket1 audit.log | audit-sum
grep SPUT audit.log | grep bucket1 | audit-sum
Note: This tool does not accept compressed files as piped input. To process compressed files, provide their file names as command-line arguments, or use the zcat tool to decompress the files first. For example:
audit-sum audit.log.gz
zcat audit.log.gz | audit-sum

You can use command-line options to summarize operations on buckets separately from operations on objects or to group message summaries by bucket name, by time period, or by target type. By default, the summaries show the minimum, maximum, and average operation time, but you can use the size (-s) option to look at object size instead.

Use the help (-h) option to see the available options. For example:
$ audit-sum -h

Procedure

  1. Log in to the primary Admin Node:
    1. Enter the following command: ssh admin@primary_Admin_Node_IP
    2. Enter the password listed in the Passwords.txt file.
  2. If you want to analyze all messages related to write, read, head, and delete operations, follow these steps:
    1. Enter the following command, where /var/local/audit/export/audit.log represents the name and the location of the file or files you want to analyze: $ audit-sum /var/local/audit/export/audit.log
      This example shows typical output from the audit-sum tool. This example shows how long protocol operations took.
        message group           count     min(sec)        max(sec)    average(sec)
        =============           =====     ========        ========    ============
        IDEL                      274 
        SDEL                   213371        0.004          20.934           0.352
        SGET                   201906        0.010        1740.290           1.132
        SHEA                    22716        0.005           2.349           0.272
        SPUT                  1771398        0.011        1770.563           0.487

      In this example, SGET (S3 GET) operations are the slowest on average at 1.13 seconds, but SGET and SPUT (S3 PUT) operations both show long worst-case times of about 1,770 seconds.

    2. To show the slowest 10 retrieval operations, use the grep command to select only SGET messages and add the long output option (-l) to include object paths: grep SGET audit.log | audit-sum -l
      The results include the type (object or bucket) and path, which allows you to grep the audit log for other messages relating to these particular objects.
      Total:          201906 operations
          Slowest:      1740.290 sec
          Average:         1.132 sec
          Fastest:         0.010 sec
          Slowest operations:
              time(usec)       source ip         type      size(B) path
              ========== =============== ============ ============ ====
              1740289662   10.96.101.125       object   5663711385 backup/r9O1OaQ8JB-1566861764-4519.iso
              1624414429   10.96.101.125       object   5375001556 backup/r9O1OaQ8JB-1566861764-6618.iso
              1533143793   10.96.101.125       object   5183661466 backup/r9O1OaQ8JB-1566861764-4518.iso
                   70839   10.96.101.125       object        28338 bucket3/dat.1566861764-6619
                   68487   10.96.101.125       object        27890 bucket3/dat.1566861764-6615
                   67798   10.96.101.125       object        27671 bucket5/dat.1566861764-6617
                   67027   10.96.101.125       object        27230 bucket5/dat.1566861764-4517
                   60922   10.96.101.125       object        26118 bucket3/dat.1566861764-4520
                   35588   10.96.101.125       object        11311 bucket3/dat.1566861764-6616
                   23897   10.96.101.125       object        10692 bucket3/dat.1566861764-4516

      From this example output, you can see that the three slowest S3 GET requests were for objects about 5 GB in size, which is much larger than the other objects. The large size accounts for the slow worst-case retrieval times.

  3. If you want to determine what sizes of objects are being ingested into and retrieved from your grid, use the size option (-s): audit-sum -s audit.log
      message group           count       min(B)          max(B)      average(B)
      =============           =====     ========        ========    ============
      IDEL                      274        0.004        5000.000        1654.502
      SDEL                   213371        0.000          10.504           1.695
      SGET                   201906        0.000        5000.000          14.920
      SHEA                    22716        0.001          10.504           2.967
      SPUT                  1771398        0.000        5000.000           2.495

    In this example, the average object size for SPUT is under 2.5 bytes, but the average size for SGET is much larger. The number of SPUT messages is much higher than the number of SGET messages, indicating that most objects are never retrieved.

  4. If you want to determine if retrievals were slow yesterday:
    1. Issue the command on the appropriate audit log and use the group-by-time option (-gt), followed by the time period (for example, 15M, 1H, 10S):grep SGET audit.log | audit-sum -gt 1H
        message group           count     min(sec)        max(sec)    average(sec)
        =============           =====     ========        ========    ============
        2019-09-05T00            7591        0.010        1481.867           1.254
        2019-09-05T01            4173        0.011        1740.290           1.115
        2019-09-05T02           20142        0.011        1274.961           1.562
        2019-09-05T03           57591        0.010        1383.867           1.254
        2019-09-05T04          124171        0.013        1740.290           1.405
        2019-09-05T05          420182        0.021        1274.511           1.562
        2019-09-05T06         1220371        0.015        6274.961           5.562
        2019-09-05T07          527142        0.011        1974.228           2.002
        2019-09-05T08          384173        0.012        1740.290           1.105
        2019-09-05T09           27591        0.010        1481.867           1.354

      These results show that S3 GET traffic spiked between 06:00 and 07:00. The max and average times are both considerably higher at these times as well, and they did not ramp up gradually as the count increased. This suggests that capacity was exceeded somewhere, perhaps in the network or in the grid's ability to process requests.

    2. To determine what size objects were being retrieved each hour yesterday, add the size option (-s) to the command:grep SGET audit.log | audit-sum -gt 1H -s
        message group           count       min(B)          max(B)      average(B)
        =============           =====     ========        ========    ============
        2019-09-05T00            7591        0.040        1481.867           1.976
        2019-09-05T01            4173        0.043        1740.290           2.062
        2019-09-05T02           20142        0.083        1274.961           2.303
        2019-09-05T03           57591        0.912        1383.867           1.182
        2019-09-05T04          124171        0.730        1740.290           1.528
        2019-09-05T05          420182        0.875        4274.511           2.398
        2019-09-05T06         1220371        0.691  5663711385.961          51.328
        2019-09-05T07          527142        0.130        1974.228           2.147
        2019-09-05T08          384173        0.625        1740.290           1.878
        2019-09-05T09           27591        0.689        1481.867           1.354

      These results indicate that some very large retrievals occurred when the overall retrieval traffic was at its maximum.

    3. To see more detail, use the audit-explain tool to review all the SGET operations during that hour: grep 2019-09-05T06 audit.log | grep SGET | audit-explain | less
      If the output of the grep command is expected to be many lines, add the less command to show the contents of the audit log file one page (one screen) at a time.
  5. If you want to determine if SPUT operations on buckets are slower than SPUT operations for objects:
    1. Start by using the -go option, which groups messages for object and bucket operations separately: grep SPUT sample.log | audit-sum -go
        message group           count     min(sec)        max(sec)    average(sec)
        =============           =====     ========        ========    ============
        SPUT.bucket                 1        0.125           0.125           0.125
        SPUT.object                12        0.025           1.019           0.236

      The results show that SPUT operations for buckets have different performance characteristics than SPUT operations for objects.

    2. To determine which buckets have the slowest SPUT operations, use the -gb option, which groups messages by bucket: grep SPUT audit.log | audit-sum -gb
        message group                  count     min(sec)        max(sec)    average(sec)
        =============                  =====     ========        ========    ============
        SPUT.cho-non-versioning        71943        0.046        1770.563           1.571
        SPUT.cho-versioning            54277        0.047        1736.633           1.415
        SPUT.cho-west-region           80615        0.040          55.557           1.329
        SPUT.ldt002                  1564563        0.011          51.569           0.361
    3. To determine which buckets have the largest SPUT object size, use both the -gb and the -s options: grep SPUT audit.log | audit-sum -gb -s
        message group                  count       min(B)          max(B)      average(B)
        =============                  =====     ========        ========    ============
        SPUT.cho-non-versioning        71943        2.097        5000.000          21.672
        SPUT.cho-versioning            54277        2.097        5000.000          21.120
        SPUT.cho-west-region           80615        2.097         800.000          14.433
        SPUT.ldt002                  1564563        0.000         999.972           0.352