Investigating lost objects

When the Objects lost alert and the legacy LOST (Lost Objects) alarm are triggered, you must investigate immediately. Collect information about the affected objects and contact technical support.

Before you begin

  • You must be signed in to the Grid Manager using a supported browser.
  • You must have specific access permissions.
  • You must have the Passwords.txt file.

About this task

The Objects lost alert and the LOST alarm indicate that StorageGRID believes that there are no copies of an object in the grid. Data might have been permanently lost.

Investigate lost object alarms or alerts immediately. You might need to take action to prevent further data loss. In some cases, you might be able to restore a lost object if you take prompt action.

The number of Lost Objects can be seen in the Grid Manager.

Procedure

  1. Select Nodes.
  2. Select Storage Node > Objects.
  3. Review the number of Lost Objects shown in the Object Counts table.
    This number indicates the total number of objects this grid node detects as missing from the entire StorageGRID system. The value is the sum of the Lost Objects counters of the Data Store component within the LDR and DDS services.
    Nodes Storage Nodes Object Page lost object
  4. From an Admin Node, access the audit log to determine the unique identifier (UUID) of the object that triggered the Objects lost alert and the LOST alarm:
    1. Log in to the grid node:
      1. Enter the following command: ssh admin@grid_node_IP
      2. Enter the password listed in the Passwords.txt file.
      3. Enter the following command to switch to root: su -
      4. Enter the password listed in the Passwords.txt file.
      When you are logged in as root, the prompt changes from $ to #.
    2. Change to the directory where the audit logs are located. Enter: cd /var/local/audit/export/
    3. Use grep to extract the Object Lost (OLST) audit messages. Enter: grep OLST audit_file_name
    4. Note the UUID value included in the message.
      >Admin: # grep OLST audit.log
      2020-02-12T19:18:54.780426 [AUDT:[CBID(UI64):0x38186FE53E3C49A5][UUID(CSTR):"926026C4-00A4-449B-AC72-BCCA72DD1311"]
      [PATH(CSTR):"source/cats"][NOID(UI32):12288733][VOLI(UI64):3222345986][RSLT(FC32):NONE][AVER(UI32):10]
      [ATIM(UI64):1581535134780426][ATYP(FC32):OLST][ANID(UI32):12448208][AMID(FC32):ILMX][ATID(UI64):7729403978647354233]]
  5. Use the ObjectByUUID command to find the object by its identifier (UUID), and then determine if data is at risk.
    1. Telnet to localhost 1402 to access the LDR console.
    2. Enter: /proc/OBRP/ObjectByUUID UUID_value
      In this first example, the object with UUID 926026C4-00A4-449B-AC72-BCCA72DD1311 has two locations listed.
      ade 12448208: /proc/OBRP > ObjectByUUID 926026C4-00A4-449B-AC72-BCCA72DD1311
      
      {
          "TYPE(Object Type)": "Data object",
          "CHND(Content handle)": "926026C4-00A4-449B-AC72-BCCA72DD1311",
          "NAME": "cats",
          "CBID": "0x38186FE53E3C49A5",
          "PHND(Parent handle, UUID)": "221CABD0-4D9D-11EA-89C3-ACBB00BB82DD",
          "PPTH(Parent path)": "source",
          "META": {
              "BASE(Protocol metadata)": {
                  "PAWS(S3 protocol version)": "2",
                  "ACCT(S3 account ID)": "44084621669730638018",
                  "*ctp(HTTP content MIME type)": "binary/octet-stream"
              },
              "BYCB(System metadata)": {
                  "CSIZ(Plaintext object size)": "5242880",
                  "SHSH(Supplementary Plaintext hash)": "MD5D 0xBAC2A2617C1DFF7E959A76731E6EAF5E",
                  "BSIZ(Content block size)": "5252084",
                  "CVER(Content block version)": "196612",
                  "CTME(Object store begin timestamp)": "2020-02-12T19:16:10.983000",
                  "MTME(Object store modified timestamp)": "2020-02-12T19:16:10.983000",
                  "ITME": "1581534970983000"
              },
              "CMSM": {
                  "LATM(Object last access time)": "2020-02-12T19:16:10.983000"
              },
              "AWS3": {
                  "LOCC": "us-east-1"
              }
          },
          "CLCO(Locations)": [
              {
                  "Location Type": "CLDI(Location online)",
                  "NOID(Node ID)": "12448208",
                  "VOLI(Volume ID)": "3222345473",
                  "Object File Path": "/var/local/rangedb/1/p/17/11/00rH0%DkRt78Ila#3udu",
                  "LTIM(Location timestamp)": "2020-02-12T19:36:17.880569"
              },
              {
                  "Location Type": "CLDI(Location online)",
                  "NOID(Node ID)": "12288733",
                  "VOLI(Volume ID)": "3222345984",
                  "Object File Path": "/var/local/rangedb/0/p/19/11/00rH0%DkRt78Rrb#3s;L",
                  "LTIM(Location timestamp)": "2020-02-12T19:36:17.934425"
              }
          ]
      }
      
      In the second example, the object with UUID 926026C4-00A4-449B-AC72-BCCA72DD1311 has no locations listed.
      ade 12448208: / > /proc/OBRP/ObjectByUUID 926026C4-00A4-449B-AC72-BCCA72DD1311
      
      {
          "TYPE(Object Type)": "Data object",
          "CHND(Content handle)": "926026C4-00A4-449B-AC72-BCCA72DD1311",
          "NAME": "cats",
          "CBID": "0x38186FE53E3C49A5",
          "PHND(Parent handle, UUID)": "221CABD0-4D9D-11EA-89C3-ACBB00BB82DD",
          "PPTH(Parent path)": "source",
          "META": {
              "BASE(Protocol metadata)": {
                  "PAWS(S3 protocol version)": "2",
                  "ACCT(S3 account ID)": "44084621669730638018",
                  "*ctp(HTTP content MIME type)": "binary/octet-stream"
              },
              "BYCB(System metadata)": {
                  "CSIZ(Plaintext object size)": "5242880",
                  "SHSH(Supplementary Plaintext hash)": "MD5D 0xBAC2A2617C1DFF7E959A76731E6EAF5E",
                  "BSIZ(Content block size)": "5252084",
                  "CVER(Content block version)": "196612",
                  "CTME(Object store begin timestamp)": "2020-02-12T19:16:10.983000",
                  "MTME(Object store modified timestamp)": "2020-02-12T19:16:10.983000",
                  "ITME": "1581534970983000"
              },
              "CMSM": {
                  "LATM(Object last access time)": "2020-02-12T19:16:10.983000"
              },
              "AWS3": {
                  "LOCC": "us-east-1"
              }
          }
      }
      
    3. Review the output of /proc/OBRP/ObjectByUUID, and take the appropriate action:
      Metadata Conclusion
      No object found ("ERROR":"" ) If the object is not found, the message "ERROR":"" is returned.

      If the object is not found, it is safe to ignore the alarm. The lack of an object indicates that the object was intentionally deleted.

      Locations > 0 If there are locations listed in the output, the Lost Objects alarm might be a false positive.

      Confirm that the objects exist. Use the Node ID and filepath listed in the output to confirm that the object file is in the listed location.

      (The procedure for finding potentially lost objects explains how to use the Node ID to find the correct Storage Node.)

      Searching for and restoring potentially lost objects

      If the objects exist, you can reset the count of Lost Objects to clear the alarm and the alert.
      Locations = 0 If there are no locations listed in the output, the object is potentially missing.

      You can try to find and restore the object yourself, or you can contact technical support.

      Searching for and restoring potentially lost objects

      Technical support might ask you to determine if there is a storage recovery procedure in progress. That is, has a repair-data command been issued on any Storage Node and is the recovery still in progress? See Restoring object data to a storage volume in the recovery and maintenance instructions.