Monitoring the recovery point objective through ILM

You can track ILM evaluation attributes to determine the recovery point objective (RPO) of the StorageGRID Webscale system as defined by the ILM policy. The RPO defines the maximum tolerable period in which data might be lost because of a site failure, a Storage Node failure, or both.

Before you begin

You must be signed in to the Grid Manager using a supported browser.

About this task

The StorageGRID Webscale system manages objects by applying the defined ILM policy. The ILM policy and associated ILM rules determine how many copies are made, how those copies are made, the appropriate placement, and the length of time each copy is retained.

ILM rules are processed asynchronously after specific operations such as ingest or delete. ILM processing classifies objects into four categories and operates simultaneously on all four categories to ensure fairness and prioritization:
  • Repair of replicated copies with only one copy remaining.
  • Awaiting - Background: excludes repair of replicated copies with only one copy remaining.
  • Awaiting - Client: includes new ingests and metadata updates; excludes deletions.
  • Deletions.

Ingest or other activity can exceed the rate at which the system can process ILM. When this scenario occurs, the system will begin to queue objects whose ILM can no longer be fulfilled in near real time. In the example shown, the chart of the Awaiting—Client indicates that the number of objects awaiting ILM evaluation temporarily increases in an unsustainable manner, then eventually decreases. Such a trend indicates that ILM was temporarily not fulfilled in near real time.


Awaiting - Client vs. Time chart

Steps

  1. Select Support > Grid Topology.
  2. Select deployment > Overview > Main.
  3. In the ILM Activity section, review the key attributes for ILM evaluations:
    Awaiting - All
    The total number of objects awaiting ILM evaluation.
    Awaiting - Client
    The total number of objects awaiting ILM evaluation from client operations (for example, ingest).
    Scan Rate
    The rate at which objects in the grid are scanned and queued for ILM.
    Scan Period - Estimated
    The estimated time to complete a full ILM scan of all objects.
    Note: A full scan does not guarantee that ILM has been applied to all objects.
    Awaiting - Evaluation Rate
    The current rate at which objects are evaluated against the ILM policy in the grid.
    Repairs Attempted
    The total number of object repair operations for replicated data that have been attempted. This count increments each time an LDR tries to repair a high-risk object.
    Note: The same object repair might increment again if replication failed after the repair.