You perform various maintenance procedures to keep your StorageGRID system up-to-date and to ensure it is performing efficiently. The Grid Manager provides tools and options to facilitate the process of performing maintenance tasks.
When a new StorageGRID feature release is available, the Software Upgrade page guides you through the process of uploading the required file and upgrading your StorageGRID system. You must upgrade all grid nodes for all data center sites from the primary Admin Node.
During an upgrade, client applications can continue to ingest and retrieve object data.
If issues with the software are detected and resolved between feature releases, you might need to apply a hotfix to your StorageGRID system.
StorageGRID hotfixes contain software changes that are made available outside of a feature or patch release. The same changes are included in a future release.
Applying a hotfix is similar to upgrading the software. The Apply Hotfix page, shown below, allows you to upload a hotfix file and monitor progress as the hotfix is installed.
Similar to a software upgrade, the hotfix is applied first to the primary Admin Node. Then, the hotfix is applied to all other grid nodes in your StorageGRID system. However, while all grid nodes are updated with the new hotfix version, the actual changes in a hotfix might only affect specific services on specific types of nodes. For example, a hotfix might only affect the LDR service on Storage Nodes.
You can expand a StorageGRID system by adding storage volumes to Storage Nodes, adding new grid nodes to an existing site, or adding a new data center site. You can perform expansions without interrupting the operation of your current system. When you add nodes or a site, you first deploy the new nodes and then perform the expansion procedure from the Grid Expansion page.
Grid nodes can fail if a hardware, virtualization, operating system, or software fault renders the node inoperable or unreliable.
The steps to recover a grid node depend on the platform where the grid node is hosted and on the type of grid node. Each type of grid node has a specific recovery procedure, which you must follow exactly. Generally, you try to preserve data from the failed grid node where possible, repair or replace the failed node, use the Recovery page to configure the replacement node, and restore the node's data.
For example, this flowchart shows the recovery procedure for a software-based Storage Node that has a failed system drive. If the system drive has failed, the Storage Node is not available to the StorageGRID system.
Some maintenance procedures are specific to Linux or VMware deployments of StorageGRID, or are specific to other components of the StorageGRID solution. For example, you might want to migrate a grid node to a different Linux host or perform maintenance on an Archive Node that is connected to Tivoli Storage Manager (TSM).
You might need to perform certain procedures on a specific grid node. For example, you might need to reboot a grid node or manually stop and restart a specific grid node service. Some grid node procedures can be performed from the Grid Manager; others require you to log in to the grid node and use the node's command line.