Understanding the search integration service

You can enable search integration for S3 buckets so that you can use an external search and data analysis service for your object metadata.

The search integration service is a custom StorageGRID Webscale service that automatically and asynchronously sends S3 object metadata to a destination endpoint whenever an object or its metadata is updated. You can then use sophisticated search, data analysis, visualization, or machine learning tools provided by the destination service to search, analyze, and gain insights from your object data.

You can enable the search integration service for any versioned or unversioned bucket. Search integration is configured by associating metadata notification configuration XML with the bucket that specifies which objects to act on and the destination for the object metadata. Notifications are generated in the form of a JSON document named with the bucket name, object name, and version ID, if any. Each metadata notification contains a standard set of system metadata for the object in addition to all of the object's tags and user metadata. Notifications are generated and queued for delivery whenever an object is created or deleted, including when objects are deleted as a result of the operation of the grid's Information Lifecycle Management (ILM) policy. A notification is also generated when object metadata or tags are added, updated, or deleted. The complete set of metadata and tags is always sent on update — not just the changed values.

After you add metadata notification configuration XML to a bucket, notifications are not sent for any objects that were already in the bucket. Notifications are sent for any new objects that you create after you add the configuration XML, and for any object that you modify by updating its data, user metadata, or tags. To make sure that S3 object metadata for all objects in a bucket is sent to the destination, you should either configure the search integration service immediately after creating the bucket and before adding any objects, or you should perform an action on all objects in already in the bucket that will trigger a metadata notification message to be sent to the destination.

The StorageGRID Webscale search integration service supports an Elasticsearch cluster as a destination. As with the other platform services, the destination is specified in the endpoint whose URN is used in the configuration XML for the service.

Search integration notifications are sent directly from the site where a metadata update is triggered to the destination endpoint. This means that a grid administrator must configure networking and firewall rules at each data center site so that documents can be sent to the destination Elasticsearch index.

See the Interoperability Matrix Tool for information on the supported versions of Elasticsearch.