2つのスクリプトを手動で実行して、保持されているストレージ ボリュームを再マウントし、障害ストレージ ボリュームを再フォーマットする必要があります。最初のスクリプトは、StorageGRID Webscaleストレージ ボリュームとして適切にフォーマットされているボリュームを再マウントします。2番目のスクリプトはマウントされていないボリュームを再フォーマットし、必要に応じてストレージ ノードのCassandraデータベースを再構築します。
この処理によって、障害ストレージ ボリュームを追加で特定できる場合があります。
ストレージ ノードのシステム ドライブのリカバリに関する警告の確認
root@SG:~ # sn-remount-volumes
The configured LDR noid is 12632740
====== Device /dev/sdb ======
Mount and unmount device /dev/sdb and checking file system consistency:
The device is consistent.
Check rangedb structure on device /dev/sdb:
Mount device /dev/sdb to /tmp/sdb-654321 with rangedb mount options
This device has all rangedb directories.
Found LDR node id 12632740, volume number 0 in the volID file
Attempting to remount /dev/sdb
Device /dev/sdb remounted successfully
====== Device /dev/sdc ======
Mount and unmount device /dev/sdc and checking file system consistency:
Error: File system consistency check retry failed on device /dev/sdc.
You can see the diagnosis information in the /var/local/log/sn-remount-volumes.log.
This volume could be new or damaged. If you run sn-recovery-postinstall.sh,
this volume and any data on this volume will be deleted. If you only had two
copies of object data, you will temporarily have only a single copy.
StorageGRID Webscale will attempt to restore data redundancy by making
additional replicated copies or EC fragments, according to the rules in
the active ILM policy.
Do not continue to the next step if you believe that the data remaining on
this volume cannot be rebuilt from elsewhere in the grid (for example, if
your ILM policy uses a rule that makes only one copy or if volumes have
failed on multiple nodes). Instead, contact support to determine how to
recover your data.
====== Device /dev/sdd ======
Mount and unmount device /dev/sdd and checking file system consistency:
Failed to mount device /dev/sdd
This device could be an uninitialized disk or has corrupted superblock.
File system check might take a long time. Do you want to continue? (y or n) [y/N]? y
Error: File system consistency check retry failed on device /dev/sdd.
You can see the diagnosis information in the /var/local/log/sn-remount-volumes.log.
This volume could be new or damaged. If you run sn-recovery-postinstall.sh,
this volume and any data on this volume will be deleted. If you only had two
copies of object data, you will temporarily have only a single copy.
StorageGRID Webscale will attempt to restore data redundancy by making
additional replicated copies or EC fragments, according to the rules in
the active ILM policy.
Do not continue to the next step if you believe that the data remaining on
this volume cannot be rebuilt from elsewhere in the grid (for example, if
your ILM policy uses a rule that makes only one copy or if volumes have
failed on multiple nodes). Instead, contact support to determine how to
recover your data.
====== Device /dev/sde ======
Mount and unmount device /dev/sde and checking file system consistency:
The device is consistent.
Check rangedb structure on device /dev/sde:
Mount device /dev/sde to /tmp/sde-654321 with rangedb mount options
This device has all rangedb directories.
Found LDR node id 12000078, volume number 9 in the volID file
Error: This volume does not belong to this node. Fix the attached volume and re-run this script.
この出力例では、1つのストレージ ボリュームが正常に再マウントされ、3つのストレージ ボリュームでエラーが発生しています。
Error: This volume does not belong to this node. Fix the attached volume and re-run this script.
リカバリしているストレージ ノードのノードIDは、スクリプトの最上部で確認できます(「configured LDR noid」)。他のストレージ ノードのノードIDはGrid Managerで検索できます。[Support] > [Grid Topology] > [Site] > [Storage Node] > [LDR] > [Overview]を選択します。
root@SG:~ # sn-recovery-postinstall.sh Starting Storage Node recovery post installation. Reformatting all unmounted disks as rangedbs Formatting devices that are not in use... Skipping in use device /dev/sdb Successfully formatted /dev/sdc with UUID d6533e5f-7dfe-4a45-af7c-08ae6864960a Successfully formatted /dev/sdd with UUID a2534c4b-6bcc-3a12-fa4e-88ee8621452c Skipping in use device /dev/sde All devices processed Creating Object Stores for LDR Generating Grid Interface Configuration file LDR initialization complete Cassandra data directory is empty. Cassandra needs rebuilding. Rebuild the Cassandra database for this Storage Node. ATTENTION: Do not execute this script when two or more Storage Nodes have failed or been offline at the same time. Doing so may result in data loss. Contact technical support. ATTENTION: Do not rebuild more than a single node within a 15 day period. Rebuilding 2 or more nodes within 15 days of each other may result in data loss. Enter 'y' to rebuild the Cassandra database for this Storage Node. [y/N]? y Cassandra is down. Rebuilding may take 12-24 hours. Do not stop or pause the rebuild. If the rebuild was stopped or paused, re-run this command. Cassandra node needs to be bootstrapped. Cleaning Cassandra directories for node. Adding replace_address_first_boot flag. Starting ntp service. Starting nginx service. Starting dynip service. Starting cassandra service. Cassandra mode is NORMAL. No bootstrap resume required. Rebuild was successful. Not starting services due to --do-not-start-services argument. Updating progress on primary Admin Node Starting services ####################################### STARTING SERVICES ####################################### Starting Syslog daemon Stopping system logging: syslog-ng. Starting system logging: syslog-ng. Starting SSH Starting OpenBSD Secure Shell server: sshd. No hotfix to install starting persistence ... done remove all error states starting all services services still stopped: acct adc ade-exporter cms crr dds idnt kstn ldr net-monitor nginx node-exporter ssm starting ade-exporter Starting service ade-exporter in background starting cms Starting service cms in background starting crr Starting service crr in background starting net-monitor Starting service net-monitor in background starting nginx Starting service nginx in background starting node-exporter Starting service node-exporter in background starting ssm Starting service ssm in background services still stopped: acct adc dds idnt kstn ldr starting adc Starting service adc in background starting dds Starting service dds in background starting ldr Starting service ldr in background services still stopped: acct idnt kstn starting acct Starting service acct in background starting idnt Starting service idnt in background starting kstn Starting service kstn in background all services started Starting service servermanager in background Restarting SNMP services:: snmpd ####################################### SERVICES STARTED ####################################### Loaded node_id from server-config node_id=5611d3c9-45e9-47e4-99ac-cd16ef8a20b9 Storage Node recovery post installation complete. Object data must be restored to the storage volumes. Triggering bare-metal reboot of SGA to complete installation. SGA install phase 2: awaiting chainboot.
次の手順の説明に従い、sn-recovery-postinstall.shによってフォーマットされたストレージ ボリュームにオブジェクト データをリストアします。