Restore and verify the system configuration - AFF A200

Contributors thrisun netapp-jsnyder netapp-martyh

After completing the hardware replacement and booting to Maintenance mode, you verify the low-level system configuration of the replacement controller and reconfigure system settings as necessary.

Step 1: Set and verify system time after replacing the controller

You should check the time and date on the replacement controller module against the healthy controller module in an HA pair, or against a reliable time server in a stand-alone configuration. If the time and date do not match, you must reset them on the replacement controller module to prevent possible outages on clients due to time differences.

About this task

It is important that you apply the commands in the steps on the correct systems:

  • The replacement node is the new node that replaced the impaired node as part of this procedure.

  • The healthy node is the HA partner of the replacement node

Steps
  1. If the replacement node is not at the LOADER prompt, halt the system to the LOADER prompt.

  2. On the healthy node, check the system time: show date

    The date and time are given in GMT.

  3. At the LOADER prompt, check the date and time on the replacement node: show date

    The date and time are given in GMT.

  4. If necessary, set the date in GMT on the replacement node: set date mm/dd/yyyy

  5. If necessary, set the time in GMT on the replacement node: set time hh:mm:ss

  6. At the LOADER prompt, confirm the date and time on the replacement node: show date

    The date and time are given in GMT.

Step 2: Verify and set the HA state of the controller module

You must verify the HA state of the controller module and, if necessary, update the state to match your system configuration.

Steps
  1. In Maintenance mode from the new controller module, verify that all components display the same HA state: ha-config show

    The value for HA-state can be one of the following:

    • ha

    • non-ha

  2. If the displayed system state of the controller module does not match your system configuration, set the HA state for the controller module: ha-config modify controller ha-state

  3. Confirm that the setting has changed: ha-config show

Step 3: Run system-level diagnostics

You should run comprehensive or focused diagnostic tests for specific components and subsystems whenever you replace the controller.

About this task

All commands in the diagnostic procedures are issued from the node where the component is being replaced.

Steps
  1. If the node to be serviced is not at the LOADER prompt, reboot the node: halt

    After you issue the command, you should wait until the system stops at the LOADER prompt.

  2. At the LOADER prompt, access the special drivers specifically designed for system-level diagnostics to function properly: boot_diags

    During the boot process, you can safely respond y to the prompts until the Maintenance mode prompt (*>) appears.

  3. Display and note the available devices on the controller module: sldiag device show -dev mb

    The controller module devices and ports displayed can be any one or more of the following:

    • bootmedia is the system booting device..

    • cna is a Converged Network Adapter or interface not connected to a network or storage device.

    • fcal is a Fibre Channel-Arbitrated Loop device not connected to a Fibre Channel network.

    • env is motherboard environmentals.

    • mem is system memory.

    • nic is a network interface card.

    • nvram is nonvolatile RAM.

    • nvmem is a hybrid of NVRAM and system memory.

    • sas is a Serial Attached SCSI device not connected to a disk shelf.

  4. Run diagnostics as desired.

    If you want to run diagnostic tests on…​ Then…​

    Individual components

    1. Clear the status logs: sldiag device clearstatus

    2. Display the available tests for the selected devices: sldiag device show -dev dev_name

      dev_name can be any one of the ports and devices identified in the preceding step.

    3. Examine the output and, if applicable, select only the tests that you want to run: sldiag device modify -dev dev_name -selection only

      -selection only disables all other tests that you do not want to run for the device.

    4. Run the selected tests: sldiag device run -dev dev_name

      After the test is complete, the following message is displayed:

      *> <SLDIAG:_ALL_TESTS_COMPLETED>
    5. Verify that no tests failed: sldiag device status -dev dev_name -long -state failed

      System-level diagnostics returns you to the prompt if there are no test failures, or lists the full status of failures resulting from testing the component.

    Multiple components at the same time

    1. Review the enabled and disabled devices in the output from the preceding procedure and determine which ones you want to run concurrently.

    2. List the individual tests for the device: sldiag device show -dev dev_name

    3. Examine the output and, if applicable, select only the tests that you want to run: sldiag device modify -dev dev_name -selection only

      -selection only disables all other tests that you do not want to run for the device.

    4. Verify that the tests were modified: sldiag device show

    5. Repeat these substeps for each device that you want to run concurrently.

    6. Run diagnostics on all of the devices: sldiag device run

      Note Do not add to or modify your entries after you start running diagnostics.

      After the test is complete, the following message is displayed:

      *> <SLDIAG:_ALL_TESTS_COMPLETED>
    7. Verify that there are no hardware problems on the node: sldiag device status -long -state failed

      System-level diagnostics returns you to the prompt if there are no test failures, or lists the full status of failures resulting from testing the component.

  5. Proceed based on the result of the preceding step.

    If the system-level diagnostics tests…​ Then…​

    Were completed without any failures

    1. Clear the status logs: sldiag device clearstatus

    2. Verify that the log was cleared: sldiag device status

      The following default response is displayed:

      SLDIAG: No log messages are present.
    3. Exit Maintenance mode: halt

      The system displays the LOADER prompt.

      You have completed system-level diagnostics.

    Resulted in some test failures

    Determine the cause of the problem.

    1. Exit Maintenance mode: halt

    2. Perform a clean shutdown, and then disconnect the power supplies.

    3. Verify that you have observed all of the considerations identified for running system-level diagnostics, that cables are securely connected, and that hardware components are properly installed in the storage system.

    4. Reconnect the power supplies, and then power on the storage system.

    5. Rerun the system-level diagnostics test.