Skip to main content
SAN hosts and cloud clients

Troubleshoot

Contributors netapp-ranuk

Before troubleshooting any NVMe-oF failures for RHEL, OL, and SLES hosts, verify that you are running a configuration that is compliant to the Interoperability Matrix Tool (IMT) specifications and then proceed with the next steps to debug any host side issues.

Note The troubleshooting instructions are not applicable for AIX, Windows, and ESXi hosts.

Enable verbose logging

If you have an issue with your configuration, verbose logging can provide essential information for troubleshooting.

The procedure to set verbose logging for Qlogic (Qla2xxx) is different from the procedure to set LPFC verbose logging.

LPFC

Set the lpfc driver for NVMe/FC.

Steps
  1. Set the lpfc_log_verbose driver setting to any of the following values to log NVMe/FC events.

    #define LOG_NVME 0x00100000 /* NVME general events. */
    #define LOG_NVME_DISC 0x00200000 /* NVME Discovery/Connect events. */
    #define LOG_NVME_ABTS 0x00400000 /* NVME ABTS events. */
    #define LOG_NVME_IOERR 0x00800000 /* NVME IO Error events. */
  2. After setting the values, run the dracut-f command and reboot the host.

  3. Verify the settings.

    # cat /etc/modprobe.d/lpfc.conf options lpfc lpfc_log_verbose=0xf00083
    
    # cat /sys/module/lpfc/parameters/lpfc_log_verbose 15728771
Qla2xxx

There is no specific qla2xxx logging for NVMe/FC similar to that for the lpfc driver. Instead, set the general qla2xxx logging level.

Steps
  1. Append the ql2xextended_error_logging=0x1e400000 value to the corresponding modprobe qla2xxx conf file.

  2. Execute the dracut -f command and then reboot the host.

  3. After reboot, verify that the verbose logging has been enabled:

    # cat /etc/modprobe.d/qla2xxx.conf

    Example output:

    options qla2xxx ql2xnvmeenable=1 ql2xextended_error_logging=0x1e400000
    # cat /sys/module/qla2xxx/parameters/ql2xextended_error_logging
    507510784

Common nvme-cli errors and workarounds

The errors displayed by nvme-cli during nvme discover, nvme connect, or nvme connect-all operations and the workarounds are shown in the following table:

Error message Probable cause Workaround

Failed to write to /dev/nvme-fabrics: Invalid argument

Incorrect syntax

Verify that you are using the correct syntax for the nvme discover, nvme connect, and nvme connect-all commands.

Failed to write to /dev/nvme-fabrics: No such file or directory

Multiple issues can trigger this, for example,
providing wrong arguments to the NVMe commands is one of the common causes.

  • Verify that you have passed the correct arguments (such as, correct WWNN string, WWPN string, and more) to the commands.

  • If the arguments are correct, but you still see this error, check whether the /sys/class/scsi_host/host*/nvme_info command output is correct, the NVMe initiator is displayed as Enabled, and the NVMe/FC target LIFs are correctly displayed under the remote ports sections.
    Example:

    # cat /sys/class/scsi_host/host*/nvme_info
    NVME Initiator Enabled
    NVME LPORT lpfc0 WWPN x10000090fae0ec9d WWNN x20000090fae0ec9d DID x012000 ONLINE
    NVME RPORT WWPN x200b00a098c80f09 WWNN x200a00a098c80f09 DID x010601 TARGET DISCSRVC ONLINE
    NVME Statistics
    LS: Xmt 0000000000000006 Cmpl 0000000000000006
    FCP: Rd 0000000000000071 Wr 0000000000000005 IO 0000000000000031
    Cmpl 00000000000000a6 Outstanding 0000000000000001
    NVME Initiator Enabled
    NVME LPORT lpfc1 WWPN x10000090fae0ec9e WWNN x20000090fae0ec9e DID x012400 ONLINE
    NVME RPORT WWPN x200900a098c80f09 WWNN x200800a098c80f09 DID x010301 TARGET DISCSRVC ONLINE
    NVME Statistics
    LS: Xmt 0000000000000006 Cmpl 0000000000000006
    FCP: Rd 0000000000000073 Wr 0000000000000005 IO 0000000000000031
    Cmpl 00000000000000a8 Outstanding 0000000000000001
  • If the target LIFs are not displayed as above in the nvme_info command output, check the /var/log/messages and dmesg command outputs for any suspicious NVMe/FC failures, and report or fix accordingly.

No discovery log entries to fetch

Generally observed when the /etc/nvme/hostnqn string has not been added to the corresponding subsystem on the NetApp array or an incorrect hostnqn string has been added to the respective subsystem.

Verify that the exact /etc/nvme/hostnqn string is added to the corresponding subsystem on the NetApp array (verify using the vserver nvme subsystem host show command).

Failed to write to /dev/nvme-fabrics: Operation already in progress

Observed when the controller associations or specified operation is already created or in the process of being created. This could happen as part of the auto-connect scripts installed above.

None. Try running the nvme discover command again after some time. For nvme connect and connect-all, run the nvme list command to verify that the namespace devices are already created and displayed on the host.

When to contact technical support

If you are still facing issues, collect the following files and command outputs and contact NetApp support for further triage:

cat /sys/class/scsi_host/host*/nvme_info
/var/log/messages
dmesg
nvme discover output as in:
nvme discover --transport=fc --traddr=nn-0x200a00a098c80f09:pn-0x200b00a098c80f09 --host-traddr=nn-0x20000090fae0ec9d:pn-0x10000090fae0ec9d
nvme list
nvme list-subsys /dev/nvmeXnY