Use maintenance mode on SolidFire eSDS clusters

Contributors netapp-amitha Download PDF of this page

If you need to take a storage node offline for maintenance such as software upgrades or host repairs, you can minimize the I/O impact to the rest of the storage cluster by enabling maintenance mode for that node.

If you want to verify the current status of maintenance mode on your node, use the ListActiveNodes API method. The node object includes a maintenanceMode parameter, which indicates the current status of maintenance mode on the node.
Ensure that you do the maintenance as soon as maintenance mode is enabled. Do not leave the node in maintenance mode any more than necessary.

You can transition a storage node to maintenance mode only if the node is healthy (has no blocking cluster faults) and the storage cluster is tolerant to a single node failure. After you enable maintenance mode for a healthy and tolerant node, the node is not immediately transitioned; it is monitored until the following conditions are true:

  • All volumes hosted on the node have failed over and the node is no longer hosting as the primary for any volume.

  • A temporary standby node is assigned for every volume being failed over.

After these criteria are met, the node is transitioned to maintenance mode. If these criteria are not met within a five-minute period, the node will not enter maintenance mode.

When you disable maintenance mode for a storage node, the node is monitored until the following conditions are true:

  • All data is fully replicated to the node.

  • All blocking cluster faults are resolved.

  • All temporary standby node assignments for the volumes hosted on the node have been inactivated.

After these criteria are met, the node is transitioned out of maintenance mode. If these criteria are not met within one hour, the node will fail to transition out of maintenance mode.

Possible scenarios while using maintenance mode

  • If a node is in maintenance mode, but has not been rebooted yet, and/or has not had maintenance occur, or it had maintenance occur and is back up and healthy, but you have not disabled maintenance mode, and another node goes down, maintenance mode on the first node will be disabled automatically.

  • If one of your nodes is in maintenance mode, and another node goes down at the same time, there will be an outage. You have to wait till the node that is in maintenance mode comes back online.

  • If you put a node that is a member of an ensemble in maintenance mode for a long period of time, the system will automatically remove it from the ensemble, if there are other nodes available to be added in its place.

Enable maintenance mode

You can enable maintenance mode using the EnableMaintenanceMode API method. This method has the following input parameters:

Name Description Type Default value Required

forceWithUnresolvedFaults

Force maintenance mode to be enabled for this node even with blocking cluster faults present.

boolean

False

No

nodes

The list of node IDs to put in maintenance mode. Only one node at a time is supported.

integer array

None

Yes

perMinutePrimarySwapLimit

The number of primary slices to swap per minute. If not specified, all primary slices will be swapped at once.

integer

None

No

timeout

Specifies how long maintenance mode should remain enabled before it is automatically disabled. Formatted as a time string (for example, HH:mm:ss). If not specified, maintenance mode will remain enabled until explicitly disabled.

string

None

No

This method has the following return values:

Name Description Type

asyncHandle

You can use the GetAsyncResult method to retrieve this asyncHandle and determine when the maintenance mode transition is complete.

integer

currentMode

The current maintenance mode state of the node. Possible values:

  • Disabled: No maintenance has been requested.

  • FailedToRecover: The node failed to recover from maintenance mode.

  • RecoveringFromMaintenance: The node is in the process of recovering from maintenance mode.

  • PreparingForMaintenance: Actions are being taken to prepare a node to have maintenance performed.

  • ReadyForMaintenance: The node is ready for maintenance to be performed.

MaintenanceMode (string)

requestedMode

The requested maintenance mode state of the node. Possible values:

  • Disabled: No maintenance has been requested.

  • FailedToRecover: The node failed to recover from maintenance mode.

  • RecoveringFromMaintenance: The node is in the process of recovering from maintenance mode.

  • PreparingForMaintenance: Actions are being taken to prepare a node to have maintenance performed.

  • ReadyForMaintenance: The node is ready for maintenance to be performed.

MaintenanceMode (string)

Disable maintenance mode

You can disable maintenance mode using the DisableMaintenanceMode API method. This method has the following input parameter:

Name Description Type Default value Required

nodes

List of storage node IDs to take out of maintenance mode.

integer array

None

Yes

This method has the following return values:

Name Description Type

asyncHandle

You can use the GetAsyncResult method to retrieve this asyncHandle and determine when the maintenance mode transition is complete.

integer

currentMode

The current maintenance mode state of the node. Possible values:

  • Disabled: No maintenance has been requested.

  • FailedToRecover: The node failed to recover from maintenance mode.

  • Unexpected: The node was found to be offline, but was in the Disabled mode.

  • RecoveringFromMaintenance: The node is in the process of recovering from maintenance mode.

  • PreparingForMaintenance: Actions are being taken to prepare a node to have maintenance performed.

  • ReadyForMaintenance: The node is ready for maintenance to be performed.

MaintenanceMode (string)

requestedMode

The requested maintenance mode state of the node. Possible values:

  • Disabled: No maintenance has been requested.

  • FailedToRecover: The node failed to recover from maintenance mode.

  • Unexpected: The node was found to be offline, but was in the Disabled mode.

  • RecoveringFromMaintenance: The node is in the process of recovering from maintenance mode.

  • PreparingForMaintenance: Actions are being taken to prepare a node to have maintenance performed.

  • ReadyForMaintenance: The node is ready for maintenance to be performed.

MaintenanceMode (string)