Restore and verify the system configuration - AFF A900

Contributors dougthomp netapp-martyh

After completing the hardware replacement, you verify the low-level system configuration of the replacement controller, reconfigure system settings as necessary, and then run system-level diagnostics.

Step 1: Set and verify the system time after replacing the controller module

You should check the time and date on the replacement controller module against the healthy controller module in an HA pair, or against a reliable time server in a stand-alone configuration. If the time and date do not match, you must reset them on the replacement controller module to prevent possible outages on clients due to time differences.

About this task

It is important that you apply the commands in the steps on the correct systems:

  • The replacement node is the new node that replaced the impaired node as part of this procedure.

  • The healthy node is the HA partner of the replacement node.

Steps
  1. If the replacement node is not at the LOADER prompt, halt the system to the LOADER prompt.

  2. On the healthy node, check the system time: show date

    The date and time are given in GMT.

  3. At the LOADER prompt, check the date and time on the replacement node: show date

    The date and time are given in GMT.

  4. If necessary, set the date in GMT on the replacement node: set date mm/dd/yyyy

  5. If necessary, set the time in GMT on the replacement node: set time hh:mm:ss

  6. At the LOADER prompt, confirm the date and time on the replacement node: show date

    The date and time are given in GMT.

Step 2: Verify and set the HA state of the controller module

You must verify the HA state of the controller module and, if necessary, update the state to match your system configuration.

  1. In Maintenance mode from the replacement controller module, verify that all components display the same HA state: ha-config show

    If your system is in…​ The HA state for all components should be…​

    An HA pair

    ha

    A MetroCluster FC configuration with four or more nodes

    mcc

    A MetroCluster IP configuration

    mccip

  2. If the displayed system state of the controller module does not match your system configuration, set the HA state for the controller module: ha-config modify controller ha-state

  3. If the displayed system state of the chassis does not match your system configuration, set the HA state for the chassis: ha-config modify chassis ha-state

Step 3: Run system-level diagnostics

You should run comprehensive or focused diagnostic tests for specific components and subsystems whenever you replace the controller.

All commands in the diagnostic procedures are issued from the controller where the component is being replaced.

  1. If the controller to be serviced is not at the LOADER prompt, reboot the controller: halt

    After you issue the command, you should wait until the system stops at the LOADER prompt.

  2. At the LOADER prompt, access the special drivers specifically designed for system-level diagnostics to function properly: boot_diags

    During the boot process, you can safely respond y to the prompts until the Maintenance mode prompt (*>) appears.

  3. Display and note the available devices on the controller module: sldiag device show -dev mb

    The controller module devices and ports displayed can be any one or more of the following:

    • bootmedia is the system booting device.

    • cna is a Converged Network Adapter or interface not connected to a network or storage device.

    • fcal is a Fibre Channel-Arbitrated Loop device not connected to a Fibre Channel network.

    • env is motherboard environmentals.

    • mem is system memory.

    • nic is a network interface card.

    • nvram is nonvolatile RAM.

    • nvmem is a hybrid of NVRAM and system memory.

    • sas is a Serial Attached SCSI device not connected to a disk shelf.

  4. Run diagnostics as desired.

    If you want to run diagnostic tests on…​ Then…​

    Individual components

    1. Clear the status logs: sldiag device clearstatus

    2. Display the available tests for the selected devices: sldiag device show -dev dev_name

      dev_name can be any one of the ports and devices identified in the preceding step.

    3. Examine the output and, if applicable, select only the tests that you want to run: sldiag device modify -dev dev_name -selection only

      -selection only disables all other tests that you do not want to run for the device.

    4. Run the selected tests: sldiag device run -dev dev_name

      After the test is complete, the following message is displayed:

      *> <SLDIAG:_ALL_TESTS_COMPLETED>
    5. Verify that no tests failed: sldiag device status -dev dev_name -long -state failed

      System-level diagnostics returns you to the prompt if there are no test failures, or lists the full status of failures resulting from testing the component.

    Multiple components at the same time

    1. Review the enabled and disabled devices in the output from the preceding procedure and determine which ones you want to run concurrently.

    2. List the individual tests for the device: sldiag device show -dev dev_name

    3. Examine the output and, if applicable, select only the tests that you want to run: sldiag device modify -dev dev_name -selection only

      -selection only disables all other tests that you do not want to run for the device.

    4. Verify that the tests were modified: sldiag device show

    5. Repeat these substeps for each device that you want to run concurrently.

    6. Run diagnostics on all of the devices: sldiag device run

      Important Do not add to or modify your entries after you start running diagnostics.

      After the test is complete, the following message is displayed:

      *> <SLDIAG:_ALL_TESTS_COMPLETED>
    7. Verify that there are no hardware problems on the controller: sldiag device status -long -state failed

      System-level diagnostics returns you to the prompt if there are no test failures, or lists the full status of failures resulting from testing the component.

  5. Proceed based on the result of the preceding step:

    If the system-level diagnostics tests…​ Then…​

    Were completed without any failures

    1. Clear the status logs: sldiag device clearstatus

    2. Verify that the log was cleared: sldiag device status

      The following default response is displayed:

      SLDIAG: No log messages are present.

    3. Exit Maintenance mode: halt

      The controller displays the LOADER prompt.

    4. Boot the controller from the LOADER prompt: bye

    5. Return the controller to normal operation:

If your controller is in…​ Then…​

An HA pair

Perform a give back: storage failover giveback -ofnode replacement_node_name Note: If you disabled automatic giveback, re-enable it with the storage failover modify command.

Resulted in some test failures

Determine the cause of the problem:

  1. Exit Maintenance mode: halt

    After you issue the command, wait until the system stops at the LOADER prompt.

  2. Turn off or leave on the power supplies, depending on how many controller modules are in the chassis.
    Leave the power supplies turned on to provide power to the other controller module.

  3. Verify that you have observed all the considerations identified for running system-level diagnostics, that cables are securely connected, and that hardware components are properly installed in the storage system.

  4. Boot the controller module you are servicing, interrupting the boot by pressing Ctrl-C when prompted to get to the Boot menu.
    The controller module boots up when fully seated.

  5. Select Boot to maintenance mode from the menu.

  6. Exit Maintenance mode by entering the following command: halt

    After you issue the command, wait until the system stops at the LOADER prompt.

  7. Rerun the system-level diagnostic test.