Performing maintenance procedures

You perform various maintenance procedures to keep your StorageGRID system up-to-date and to ensure it is performing efficiently. The Grid Manager provides tools and options to facilitate the process of performing maintenance tasks.

Software upgrades

When a new StorageGRID feature release is available, the Software Upgrade page guides you through the process of uploading the required file and upgrading your StorageGRID system. You must upgrade all grid nodes for all data center sites from the primary Admin Node.

During an upgrade, client applications can continue to ingest and retrieve object data.

Hotfixes

If issues with the software are detected and resolved between feature releases, you might need to apply a hotfix to your StorageGRID system.

StorageGRID hotfixes contain software changes that are made available outside of a feature or patch release. The same changes are included in a future release.

Applying a hotfix is similar to upgrading the software. The Apply Hotfix page, shown below, allows you to upload a hotfix file and monitor progress as the hotfix is installed.


screenshot showing the Grid Node Status part of the Apply Hotfix page

Similar to a software upgrade, the hotfix is applied first to the primary Admin Node. Then, the hotfix is applied to all other grid nodes in your StorageGRID system. However, while all grid nodes are updated with the new hotfix version, the actual changes in a hotfix might only affect specific services on specific types of nodes. For example, a hotfix might only affect the LDR service on Storage Nodes.

Expansion procedures

You can expand a StorageGRID system by adding storage volumes to Storage Nodes, adding new grid nodes to an existing site, or adding a new data center site. You can perform expansions without interrupting the operation of your current system. When you add nodes or a site, you first deploy the new nodes and then perform the expansion procedure from the Grid Expansion page.


This image is explained by the surrounding text.

Node recovery procedures

Grid nodes can fail if a hardware, virtualization, operating system, or software fault renders the node inoperable or unreliable.

The steps to recover a grid node depend on the platform where the grid node is hosted and on the type of grid node. Each type of grid node has a specific recovery procedure, which you must follow exactly. Generally, you try to preserve data from the failed grid node where possible, repair or replace the failed node, use the Recovery page to configure the replacement node, and restore the node's data.

For example, this flowchart shows the recovery procedure for a software-based Storage Node that has a failed system drive. If the system drive has failed, the Storage Node is not available to the StorageGRID system.


Flowchart overview of non-appliance Storage Node recovery with system drive failure

Decommission procedures

You might want to permanently remove grid nodes from your StorageGRID system. For example, you might want to decommission nodes in these cases:
  • You have added a larger Storage Node to the system and you want to remove one or more smaller Storage Nodes, while at the same time preserving objects.
  • You require less total storage.
  • You no longer require a Gateway Node or a non-primary Admin Node.
  • Your grid includes a disconnected node that you cannot recover or bring back online.
You can use the Decommission page in the Grid Manager to remove the following types of grid nodes:
  • Storage Nodes, unless not enough nodes would remain at the site to support certain requirements
  • Gateway Nodes
  • Non-primary Admin Nodes


screenshot of Decommission page

Network maintenance procedures

Some of the network maintenance procedures you might need to perform include the following:
  • Updating the subnets on the Grid Network
  • Using the Change IP tool to change the networking configuration that was initially set during grid deployment
  • Adding, removing, or updating domain name system (DNS) servers
  • Adding, removing, or updating network time protocol (NTP) servers to ensure that data is synchronized accurately between grid nodes
  • Restoring network connectivity to nodes that might have become isolated from the rest of the grid

Host-level and middleware procedures

Some maintenance procedures are specific to Linux or VMware deployments of StorageGRID, or are specific to other components of the StorageGRID solution. For example, you might want to migrate a grid node to a different Linux host or perform maintenance on an Archive Node that is connected to Tivoli Storage Manager (TSM).

Grid node procedures

You might need to perform certain procedures on a specific grid node. For example, you might need to reboot a grid node or manually stop and restart a specific grid node service. Some grid node procedures can be performed from the Grid Manager; others require you to log in to the grid node and use the node's command line.