Run device failure diagnostics

Contributors thrisun netapp-martyh

Running diagnostics can help you determine why access to a specific device becomes intermittent or why the device becomes unavailable in your storage system.

  1. At the storage system prompt, switch to the LOADER prompt: halt

  2. Enter the following command at the LOADER prompt: boot_diags

    Note You must run this command from the LOADER prompt for system-level diagnostics to function properly. The boot_diags command starts special drivers designed specifically for system-level diagnostics.
  3. Run diagnostics on the device causing problems by entering the following command: sldiag device run [-dev devtype|mb|slotslotnum] [-name device]

    • -dev devtype specifies the type of device to be tested.

      • ata is an Advanced Technology Attachment device.

      • bootmedia is the system booting device..

      • cna is a Converged Network Adapter not connected to a network or storage device.

      • env is motherboard environmentals.

      • fcache is the Flash Cache adapter, also known as the Performance Acceleration Module 2.

      • fcal is a Fibre Channel-Arbitrated Loop device not connected to a storage device or Fibre Channel network.

      • fcvi is the Fiber Channel Virtual Interface not connected to a Fibre Channel network.

      • interconnect or nvram-ib is the high-availability interface.

      • mem is system memory.

      • nic is a Network Interface Card not connected to a network.

      • nvram is nonvolatile RAM.

      • nvmem is a hybrid of NVRAM and system memory.

      • sas is a Serial Attached SCSI device not connected to a disk shelf.

      • serviceproc is the Service Processor.

      • storage is an ATA, FC-AL, or SAS interface that has an attached disk shelf.

      • toe is a TCP Offload Engine, a type of NIC.

    • mb specifies that all the motherboard devices are to be tested.

    • `slot`slotnum specifies that a device in a specific slot number is to be tested.

    • -name device specifies a given device class and type.

  4. View the status of the test by entering the following command: sldiag device status

    Your storage system provides the following output while the tests are still running:

    There are still test(s) being processed.

    After all the tests are complete, the following response appears by default:

  5. Identify any hardware problems by entering the following command: sldiag device status [-dev devtype|mb|slotslotnum] [-name device] -long -state failed

    The following example shows how the full status of failures resulting from testing the FC-AL adapter are displayed:

    *> **sldiag device status fcal -long -state failed**
    TEST START ------------------------------------------
    DEVTYPE: fcal
    NAME: Fcal Loopback Test
    START DATE: Sat Jan  3 23:10:56 GMT 2009
    STATUS: Completed
    Starting test on Fcal Adapter: 0b
    Started gathering adapter info.
    Adapter get adapter info OK
    Adapter fc_data_link_rate: 1Gib
    Adapter name: QLogic 2532
    Adapter firmware rev: 4.5.2
    Adapter hardware rev: 2
    Started adapter get WWN string test.
    Adapter get WWN string OK wwn_str: 5:00a:098300:035309
    Started adapter interrupt test
    Adapter interrupt test OK
    Started adapter reset test.
    Adapter reset OK
    Started Adapter Get Connection State Test.
    Connection State: 5
    Loop on FC Adapter 0b is OPEN
    Started adapter Retry LIP test
    Adapter Retry LIP OK
    ERROR: failed to init adaptor port for IOCTL call
    ioctl_status.class_type = 0x1
    ioctl_status.subclass = 0x3 = 0x0
    Error Count: 2  Run Time: 70 secs
    >>>>> ERROR, please ensure the port has a shelf or plug.
    END DATE: Sat Jan  3 23:12:07 GMT 2009
    LOOP: 1/1
    TEST END --------------------------------------------
    If the system-level diagnostics tests…​ Then…​

    Resulted in some test failures

    Determine the cause of the problem.

    1. Exit Maintenance mode by entering the following command: halt

    2. Perform a clean shutdown and disconnect the power supplies.

    3. Verify that you have observed all the considerations identified for running system-level diagnostics, that cables are securely connected, and that hardware components are properly installed in the storage system.

    4. Reconnect the power supplies and power on the storage system.

    5. Repeat Steps 1 through 5 of Running device failure diagnostics.

    Resulted in the same test failures

    Technical support might recommend modifying the default settings on some of the tests to help identify the problem.

    1. Modify the selection state of a specific device or type of device on your storage system by entering the following command: sldiag device modify [-dev devtype|mb|slot_slotnum_] [-name device] [-selection enable|disable|default|only]
      -selection enable|disable|default|only allows you to enable, disable, accept the default selection of a specified device type or named device, or only enable the specified device or named device by disabling all others first.

    2. Verify that the tests were modified by entering the following command: sldiag option show

    3. Repeat Steps 3 through 5 of Running device failure diagnostics.

    4. After you identify and resolve the problem, reset the tests to their default states by repeating substeps 1 and 2.

    5. Repeat Steps 1 through 5 of Running device failure diagnostics.

    Were completed without any failures

    There are no hardware problems and your storage system returns to the prompt.

    1. Clear the status logs by entering the following command: sldiag device clearstatus [-dev devtype|mb|slot_slotnum_]

    2. Verify that the log is cleared by entering the following command: sldiag device status [-dev devtype|mb|slot_slotnum_]

      The following default response is displayed:

      SLDIAG: No log messages are present.
    3. Exit Maintenance mode by entering the following command: halt

    4. Enter the following command at the Loader prompt to boot the storage system: boot_ontap You have completed system-level diagnostics.

If the failures persist after repeating the steps, you need to replace the hardware.