Investigating lost objects

When the LOST (Lost Objects) alarm and the Objects lost alert are triggered, you must investigate immediately. Collect information about the affected objects and contact technical support.

Before you begin

  • You must be signed in to the Grid Manager using a supported browser.
  • You must have specific access permissions.
  • You must have the Passwords.txt file.

About this task

The LOST (Lost Objects) alarm and the Objects lost alert indicate that StorageGRID believes that there are no copies of an object in the grid. Data might have been permanently lost.

You must investigate lost object alarms immediately. You might need to take action to prevent further data loss. In some cases, you might be able to restore a lost object if you take prompt action.

The Lost Objects attribute might be seen on the following pages:
  • Select Nodes. Then, select Storage Node > Objects. The Lost Objects entry in the Object Counts table indicates the total number of objects this grid node detects as missing from the StorageGRID system. This value is the sum of the Lost Objects counters of the Data Store component within the LDR and DDS services.
  • Select Support > Grid Topology. Then, select site > Storage Node > LDR > Data Store > Overview > Main.
  • Select Support > Grid Topology. Then, select site > Storage Node > DDS > Data Store > Overview > Main.

This procedure shows the Lost Objects attribute on the LDR > Data Store page.

Steps

  1. Select Support > Grid Topology.
  2. Select site > Storage Node > LDR > Data Store > Overview > Main.
  3. Review the Lost Objects attribute to see how many lost objects have been identified.

    Overview: DDS: Data Store page
  4. From an Admin Node, access the audit log to determine the identifier (CBID) of the object that triggered the LOST (Lost Objects) alarm:
    1. Log in to the grid node:
      1. Enter the following command: ssh admin@grid_node_IP
      2. Enter the password listed in the Passwords.txt file.
      3. Enter the following command to switch to root: su -
      4. Enter the password listed in the Passwords.txt file.
      When you are logged in as root, the prompt changes from $ to #.
    2. Change to the directory where the audit logs are located. Enter: cd /var/local/audit/export/
    3. Use grep to extract the Object Lost (OLST) audit messages. Enter: grep OLST audit_file_name
    4. Note the CBID value included in the message.
      Admin: # grep OLST audit.log
      2012-01-14T11:03:27.362483 [AUDT:[CBID(UI64):0x498D8A1F681F05B3][UUID(CSTR):"6213A021-91FC-49C0-AF44-EC6BF377D264"]
      [NOID(UI32):12088241][VOLI(UI64):2][RSLT(FC32):NONE][AVER(UI32):10][ATYP(FC32):OLST][ATIM(UI64):1350613602969243]
      [ATID(UI64):16956755694216746320][ANID(UI32):13959984]]
  5. Use the ObjectByCBID command to find the object by its identifier (CBID), and then determine if data is at risk.
    1. Telnet to localhost 1402 to access the LDR console.
    2. Enter: /proc/OBRP/ObjectByCBID -h hexadecimal_CBID_value
      In the following example, the object with CBID 0xFE1C42ABD3CD2AC0 has a UUID, but it has no locations listed.
      ade 21511404: / > /proc/OBRP/ObjectByCBID -h 0xFE1C42ABD3CD2AC0
       
      {
          "OID": "00006FFD00198494009DC7E0C02DEA4CC7BCFB513B11B81B8A",
          "TYPE(Object Type)": "Data object",
          "CHND(Content handle)": "9DC7E0C0-2DEA-4CC7-BCFB-513B11B81B8A",
          "NAME": "lost/testau.dat",
          "CBID": "0xFE1C42ABD3CD2AC0",
          "PHND(Parent handle, UUID)": "402BC3FE-1BB4-11E7-8FCB-18EB00C226D9",
          "PPTH(Parent path)": "LOST",
          "META": {
              "BASE(Protocol metadata)": {
                  "ISIA(Source client ip address)": "10.55.72.90",
                  "PHTP(HTTP protocol handler version)": "1",
                  "PAWS(S3 protocol version)": "1",
                  "ACCT(S3 account ID)": "10699577065449838288",
                  "*ctp(HTTP content MIME type)": "application/octet-stream"
              },
              "AWS3": {
                  "USDM(User-defined metadata)": "{\"s3b-last-modified\":[\"20161117T230402Z\"]}"
              
      
      },
              "BYCB(System metadata)": {
                  "SHSH(Supplementary Plaintext hash)": "MD5D 0xC9B110581DAC712BFAE0D1D8EF36CB7E",
                  
      
      "CSIZ(Plaintext object size)": "8204",
                  "BSIZ(Content block size)": "8886",
                  "CVER(Content block version)": "196612",
                  "CFLG(Content block flags)": "256",
                  "CTME(Object store begin timestamp)": "2017-04-10T20:01:58.399632",
                  
      
      "CTYP(Compression algorithm type)": "NONE",
                  "CHSH(Object hash)": "SHA1 0x7973967630676847CEB60C4C0D9384075F81A3C6",
                  
      
      "MTME(Object store modified timestamp)": "2017-04-10T20:01:58.406157"
              },
              "CMSM": {
                  "OWNR(ILM owner node ID)": "13895688",
                  "LATM(Object last access time)": "2017-04-10T20:01:58.399632"
              }
          }
      }
      
    3. Review the output of /proc/OBRP/ObjectByCBID, and take the appropriate action:
      Metadata Conclusion
      No object found ("ERROR":"" )

      or an object was found with no UUID metadata

      If the object is not found, the message "ERROR":"" is returned.

      If the object is not found, or if there is no UUID metadata, it is safe to ignore the alarm. The lack of an object, or the absence of a UUID, indicates that the object was intentionally deleted.

      UUID is present

      Locations > 0

      If there is a UUID and there are locations listed in the output, the Lost Objects alarm was a false positive. There are other object locations in the grid. You can reset the Lost Objects alarm.
      UUID is present

      Locations = 0

      If there is a UUID but there are no locations listed in the output, the object is potentially missing.

      If the ILM policy does not include an ILM rule with only one active content placement instruction, contact technical support. You could also try to find and restore the object yourself.

      Technical support might ask you to determine if there is a storage recovery procedure in progress. That is, has a repair-data command been issued on any Storage Node and is the recovery still in progress? See "Restoring object data to a storage volume" in the recovery and maintenance instructions.