Determining if objects are permanently unavailable

You can determine if objects are permanently unavailable by making a request using the TSM administrative console.

Before you begin

About this task

This example is provided for your information only; this procedure cannot help you identify all failure conditions that may result in unavailable objects or tape volumes. For information about TSM administration, see TSM Server documentation.


  1. Log in to an Admin Node:
    1. Enter the following command: ssh admin@Admin_Node_IP
    2. Enter the password listed in the Passwords.txt file.
  2. Identify the object or objects that could not be retrieved by the Archive Node:
    1. Go to the directory containing the audit log files: cd /var/local/audit/export
      The active audit log file is named audit.log. Once a day, the active audit.log file is saved, and a new audit.log file is started. The name of the saved file indicates when it was saved, in the format yyyy-mm-dd.txt. After a day, the saved file is compressed and renamed, in the format yyyy-mm-dd.txt.gz, which preserves the original date.
    2. Search the relevant audit log file for messages indicating that a retrieval failure occurred. For example, enter: grep ARCE audit.log | less -n
      When a retrieval fails, the ARCE audit message (Archive Object Retrieve End) displays ARUN (archive middleware unavailable) or GERR (general error) in the result field. The following example line from the audit log shows that the ARCE message terminated with the result ARUN for CBID 498D8A1F681F05B3.

      See the instructions for understanding audit messages.

    3. Record the CBID of each object with a request failure.
      You might also want to record the following additional information used by the TSM to identify objects saved by the Archive Node:
      • File Space Name: Select Support > Grid Topology. Then, select Archive Node > ARC > Target > Overview.

        The file space name is the Archive Node's node ID.

      • High Level Name: Equivalent to the volume ID assigned to the object by the Archive Node. The volume ID takes the form of a date (20091127), and is recorded as the VLID of the object in archive audit messages.
      • Low Level Name: Equivalent to the CBID assigned to an object by the StorageGRID system.
    4. Log out of the command shell: exit
  3. Check the TSM server to see if the objects identified in step 2 are permanently unavailable:
    1. Log in to the administrative console of the TSM server: dsmadmc
      Use the administrative user name and password that are configured for the ARC service. Enter the user name and password in the Grid Manager. (Select Support > Grid Topology. Then, select Archive Node > ARC > Target > Configuration.)
    2. Determine if the object is permanently unavailable.
      For example, you might search the TSM activity log for a data integrity error for that object. The following example shows a search of the activity log for the past day for an object with CBID 498D8A1F681F05B3.
      > query actlog begindate=-1 search=276C14E94082CC69
      12/21/2008 05:39:15 ANR0548W Retrieve or restore 
      failed for session 9139359 for node DEV-ARC-20 (Bycast ARC) 
      processing file space /19130020 4 for file /20081002/ 
      498D8A1F681F05B3 stored as Archive - data 
      integrity error detected. (SESSION: 9139359)

      Note that depending on the nature of the error, the CBID might not be recorded in the TSM activity log. You might need to search the log for other TSM errors around the time of the request failure.

    3. If an entire tape is permanently unavailable, identify the CBIDs for all objects stored on that volume: query content TSM_Volume_Name
      where TSM_Volume_Name is the TSM name for the unavailable tape. The following is an example of the output for this command:
       > query content TSM-Volume-Name
      Node Name       Type Filespace  FSID Client's Name for File Name
      --------------- ---- ---------- ---- --------------------------------
      DEV-ARC-20      Arch /19130020  216  /20081201/ C1D172940E6C7E12
      DEV-ARC-20      Arch /19130020  216  /20081201/ F1D7FBC2B4B0779E

      The Client’s Name for File Name” is the Archive Node volume ID (TSM “high level name”) followed by the object’s CBID (TSM “low level name). That is: /Archive Node volume ID /CBID or, in the first line of this example: /20081201/ C1D172940E6C7E12

      Recall also that the Filespace is the node ID of the Archive Node.

      You will need the CBID of each object stored on the volume and the node ID of the Archive Node to cancel the retrieval request in the next step.

  4. For each object that is permanently unavailable, cancel the retrieval request and inform the StorageGRID system that the object copy was lost:
    Attention: Use the ADE Console with caution. If the console is used improperly, it is possible to interrupt system operations and corrupt data. Enter commands carefully, and only use the commands documented in this procedure.
    1. If you are not already logged in to the Archive Node, log in as follows:
      1. Enter the following command: ssh admin@grid_node_IP
      2. Enter the password listed in the Passwords.txt file.
      3. Enter the following command to switch to root: su -
      4. Enter the password listed in the Passwords.txt file.
    2. Access the ADE console of the ARC service: telnet localhost 1409
    3. Cancel the request for the object: /proc/BRTR/cancel -c CBID

      where CBID is the identifier of the object that cannot be retrieved from the TSM.

      If the only copies of the object are on tape, the “bulk retrieval” request is canceled with a message “1 requests canceled”. If copies of the object exist elsewhere in the system, the object retrieval is processed by a different module so the response to the message is “0 requests canceled”.

    4. Notify the StorageGRID system that an object copy has been lost and an additional copy must be made of the indicated object: /proc/CMSI/Object_Lost CBID node_ID

      where CBID is the identifier of the object that cannot be retrieved from the TSM server.

      For Archive Nodes, you cannot use a range of CBIDs.

      node_ID is the node ID of the Archive Node where the retrieval failed.

      In most cases, the StorageGRID system immediately begins to make additional copies of object data to ensure that the system's ILM policy is followed. In a StorageGRID system configured to use an ILM rule with only one active content placement instruction, copies of an object are not made. If an object is lost, it cannot be recovered. In this case, running the Object_Lost command purges the lost object’s metadata from the StorageGRID system.

      When the Object_Lost command completes successfully, it returns the message CLOC_LOST_ANS returned result ‘SUCS’.

    5. Exit the ADE Console: exit
    6. Log out of the Archive Node: exit
  5. Reset the value of Request Failures in the StorageGRID system:
    1. Go to Archive Node > ARC > Retrieve > Configuration, and select Reset Request Failure Count.
    2. Click Apply Changes.