Skip to main content

Monitor repair-data jobs

Contributors

You can monitor the status of repair jobs by using the repair-data script from the command line.

These include jobs that you initiated manually, or jobs that StorageGRID initiated automatically as part of a decommission procedure.

Note If you are running volume restoration jobs, monitor the progress and view a history of those jobs in the Grid Manager instead.

Monitor the status of repair-data jobs based on whether you use replicated data, erasure-coded (EC) data, or both.

  • To get an estimated percent completion for the replicated repair, add the show-replicated-repair-status option to the repair-data command.

    repair-data show-replicated-repair-status

  • To determine if repairs are complete:

    1. Select NODES > Storage Node being repaired > ILM.

    2. Review the attributes in the Evaluation section. When repairs are complete, the Awaiting - All attribute indicates 0 objects.

  • To monitor the repair in more detail:

    1. Select SUPPORT > Tools > Grid topology.

    2. Select grid > Storage Node being repaired > LDR > Data Store.

    3. Use a combination of the following attributes to determine, as well as possible, if replicated repairs are complete.

      Note Cassandra inconsistencies might be present, and failed repairs aren't tracked.
      • Repairs Attempted (XRPA): Use this attribute to track the progress of replicated repairs. This attribute increases each time a Storage Node tries to repair a high-risk object. When this attribute does not increase for a period longer than the current scan period (provided by the Scan Period — Estimated attribute), it means that ILM scanning found no high-risk objects that need to be repaired on any nodes.

        Note High-risk objects are objects that are at risk of being completely lost. This does not include objects that don't satisfy their ILM configuration.
      • Scan Period — Estimated (XSCM): Use this attribute to estimate when a policy change will be applied to previously ingested objects. If the Repairs Attempted attribute does not increase for a period longer than the current scan period, it is probable that replicated repairs are done. Note that the scan period can change. The Scan Period — Estimated (XSCM) attribute applies to the entire grid and is the maximum of all node scan periods. You can query the Scan Period — Estimated attribute history for the grid to determine an appropriate time frame.