Skip to main content

Manage S3 Select for tenant accounts

Contributors netapp-pcelmer ssantho3 netapp-perveilerk netapp-madkat netapp-lhalbert

You can allow certain S3 tenants to use S3 Select to issue SelectObjectContent requests on individual objects.

S3 Select provides an efficient way to search through large amounts of data without having to deploy a database and associated resources to enable searches. It also reduces the cost and latency of retrieving data.

What is S3 Select?

S3 Select allows S3 clients to use SelectObjectContent requests to filter and retrieve only the data needed from an object. The StorageGRID implementation of S3 Select includes a subset of S3 Select commands and features.

Considerations and requirements for using S3 Select

Grid administration requirements

The grid administrator must grant tenants S3 Select ability. Select Allow S3 Select when creating a tenant or editing a tenant.

Object format requirements

The object you want to query must be in one of the following formats:

  • CSV. Can be used as is or compressed into GZIP or BZIP2 archives.

  • Parquet. Additional requirements for Parquet objects:

    • S3 Select supports only columnar compression using GZIP or Snappy. S3 Select doesn't support whole-object compression for Parquet objects.

    • S3 Select doesn't support Parquet output. You must specify the output format as CSV or JSON.

    • The maximum uncompressed row group size is 512 MB.

    • You must use the data types specified in the object's schema.

    • You can't use INTERVAL, JSON, LIST, TIME, or UUID logical types.

Endpoint requirements

The SelectObjectContent request must be sent to a StorageGRID load balancer endpoint.

The Admin and Gateway Nodes used by the endpoint must be one of the following:

  • An SG100 or SG1000 appliance node

  • A VMware-based software node

  • A bare metal node running a kernel with cgroup v2 enabled

General considerations

Queries can't be sent directly to Storage Nodes.

Caution SelectObjectContent requests can decrease load-balancer performance for all S3 clients and all tenants. Enable this feature only when required and only for trusted tenants.

To view Grafana charts for S3 Select operations over time, select SUPPORT > Tools > Metrics in the Grid Manager.