Configure Dremio data source with StorageGRID
Dremio supports a varity of data sources, including cloud-based or on-premises object storage. You can configure Dremio to use StorageGRID as object storage data source.
Configure Dremio data source
Prerequisites
-
A StorageGRID S3 endpoint URL, a tenant s3 access key ID, and secret access key.
-
StorageGRID configuration recommendation: disable compression (disabled by default).
Dremio uses byte range GET to fetch different byte ranges from within the same object concurrently during query. Typical size for byte-range requests is 1MB. Compressed object degrades byte-range GET performance.
Instruction
-
On Dremio Datasets page, click + sign to add a source, select 'Amazon S3'.
-
Enter a name for this new data source, StorageGRID S3 tenant access key ID and secret access key.
-
Check the box 'Encrypt connection' if using https for connection to StorageGRID S3 endpoint.
If using self-signed CA cert for this s3 endpoint, follow Dremio guide instrution to add this CA cert into Dremio server's <JAVA_HOME>/jre/lib/security
Sample screenshot -
Click 'Advanced Options', check 'Enable compatibility mode'
-
Under Connection properties, click + Add Properties and add these s3a properties.
-
fs.s3a.connection.maximum default is 100. If your s3 datasets include large Parquet files with 100 or more columns, must enter a value greater than 100. Refer to Dremio guide for this setting.
Name Value fs.s3a.endpoint
<StorageGRID S3 endpoint:port>
fs.s3a.path.style.access
true
fs.s3a.connection.maximum
<a value greater than 100>
Sample screenshot
-
Configure other Dremio options as per your organization or application requirements.
-
Click the Save button to create this new data source.
-
Once StorageGRID data source is added successfully, a list of buckets will be displayed on the left panel.
Sample screenshot
By Angela Cheng