Restore and verify the system configuration - AFF A700 and FAS9000

Contributors thrisun netapp-martyh dougthomp

After completing the hardware replacement and booting to Maintenance mode, you verify the low-level system configuration of the replacement controller and reconfigure system settings as necessary.

Step 1: Set and verify system time after replacing the controller

You should check the time and date on the replacement controller module against the healthy controller module in an HA pair, or against a reliable time server in a stand-alone configuration. If the time and date do not match, you must reset them on the replacement controller module to prevent possible outages on clients due to time differences.

About this task

It is important that you apply the commands in the steps on the correct systems:

  • The replacement node is the new node that replaced the impaired node as part of this procedure.

  • The healthy node is the HA partner of the replacement node

Steps
  1. If the replacement node is not at the LOADER prompt, halt the system to the LOADER prompt.

  2. On the healthy node, check the system time: show date

    The date and time are given in GMT.

  3. At the LOADER prompt, check the date and time on the replacement node: show date

    The date and time are given in GMT.

  4. If necessary, set the date in GMT on the replacement node: set date mm/dd/yyyy

  5. If necessary, set the time in GMT on the replacement node: set time hh:mm:ss

  6. At the LOADER prompt, confirm the date and time on the replacement node: show date

    The date and time are given in GMT.

Step 2: Verify and set the HA state of the controller module

You must verify the HA state of the controller module and, if necessary, update the state to match your system configuration.

Steps
  1. In Maintenance mode from the new controller module, verify that all components display the same HA state: ha-config show

    The value for HA-state can be one of the following:

    • ha

    • mcc

    • mcc-2n

    • mccip

    • non-ha

      1. Confirm that the setting has changed: ha-config show

Step 3: Run system-level diagnostics

You should run comprehensive or focused diagnostic tests for specific components and subsystems whenever you replace the controller.

All commands in the diagnostic procedures are issued from the node where the component is being replaced.

Steps
  1. If the node to be serviced is not at the LOADER prompt, reboot the node: halt

    After you issue the command, you should wait until the system stops at the LOADER prompt.

  2. At the LOADER prompt, access the special drivers specifically designed for system-level diagnostics to function properly: boot_diags

    During the boot process, you can safely respond y to the prompts until the Maintenance mode prompt (*>) appears.

  3. Display and note the available devices on the controller module: sldiag device show -dev mb

    The controller module devices and ports displayed can be any one or more of the following:

    • bootmedia is the system booting device.

    • cna is a Converged Network Adapter or interface not connected to a network or storage device.

    • fcal is a Fibre Channel-Arbitrated Loop device not connected to a Fibre Channel network.

    • env is motherboard environmentals.

    • mem is system memory.

    • nic is a network interface card.

    • nvram is nonvolatile RAM.

    • nvmem is a hybrid of NVRAM and system memory.

    • sas is a Serial Attached SCSI device not connected to a disk shelf.

  4. Run diagnostics as desired.

    If you want to run diagnostic tests on…​ Then…​

    Individual components

    1. Clear the status logs: sldiag device clearstatus

    2. Display the available tests for the selected devices: sldiag device show -dev _dev_name

      dev_name can be any one of the ports and devices identified in the preceding step.

    3. Examine the output and, if applicable, select only the tests that you want to run: sldiag device modify -dev dev_name -selection only + `-selection only disables all other tests that you do not want to run for the device.

    4. Run the selected tests: sldiag device run -dev dev_name

      After the test is complete, the following message is displayed:

      *> <SLDIAG:_ALL_TESTS_COMPLETED>
    5. Verify that no tests failed: sldiag device status -dev dev_name -long -state failed

      System-level diagnostics returns you to the prompt if there are no test failures, or lists the full status of failures resulting from testing the component.

    Multiple components at the same time

    1. Review the enabled and disabled devices in the output from the preceding procedure and determine which ones you want to run concurrently.

    2. List the individual tests for the device: sldiag device show -dev dev_name

    3. Examine the output and, if applicable, select only the tests that you want to run: sldiag device modify -dev dev_name -selection only

      -selection only disables all other tests that you do not want to run for the device.

    4. Verify that the tests were modified: sldiag device show

    5. Repeat these substeps for each device that you want to run concurrently.

    6. Run diagnostics on all of the devices: sldiag device run

      Note Do not add to or modify your entries after you start running diagnostics.

      After the test is complete, the following message is displayed:

      *> <SLDIAG:_ALL_TESTS_COMPLETED>
    7. Verify that there are no hardware problems on the node: sldiag device status -long -state failed

      System-level diagnostics returns you to the prompt if there are no test failures, or lists the full status of failures resulting from testing the component.

  5. Proceed based on the result of the preceding step:

    If the system-level diagnostics tests…​ Then…​

    Were completed without any failures

    1. Clear the status logs: sldiag device clearstatus

    2. Verify that the log was cleared: sldiag device status

      The following default response is displayed:

      SLDIAG: No log messages are present.
    3. Exit Maintenance mode: halt

      The node displays the LOADER prompt.

    4. Boot the node from the LOADER prompt: bye

    5. Return the node to normal operation:

    An HA pair

    Perform a give back: storage failover giveback -ofnode replacement_node_name

    Note If you disabled automatic giveback, re-enable it with the storage failover modify command.

    A two-node MetroCluster configuration

    Proceed to the next step.

    The MetroCluster switchback procedure is done in the next task in the replacement process.

    A stand-alone configuration

    Proceed to the next step.

    No action is required.

    You have completed system-level diagnostics.

    Resulted in some test failures

    Determine the cause of the problem:

    1. Exit Maintenance mode: halt

      After you issue the command, wait until the system stops at the LOADER prompt.

    2. Turn off or leave on the power supplies, depending on how many controller modules are in the chassis:

      • If you have two controller modules in the chassis, leave the power supplies turned on to provide power to the other controller module.

      • If you have one controller module in the chassis, turn off the power supplies and unplug them from the power sources.

    3. Verify that you have observed all the considerations identified for running system-level diagnostics, that cables are securely connected, and that hardware components are properly installed in the storage system.

    4. Boot the controller module you are servicing, interrupting the boot by pressing Ctrl-C when prompted to get to the Boot menu:

      • If you have two controller modules in the chassis, fully seat the controller module you are servicing in the chassis.

        The controller module boots up when fully seated.

      • If you have one controller module in the chassis, connect the power supplies, and then turn them on.

    5. Select Boot to maintenance mode from the menu.

    6. Exit Maintenance mode by entering the following command: halt

      After you issue the command, wait until the system stops at the LOADER prompt.

    7. Rerun the system-level diagnostic test.