本繁體中文版使用機器翻譯,譯文僅供參考,若與英文版本牴觸,應以英文版本為準。
選擇對象內容
您可以使用 S3 SelectObjectContent 請求根據簡單的 SQL 語句過濾 S3 物件的內容。
開始之前
-
租用戶帳戶具有S3 Select權限。
-
你有 `s3:GetObject`您要查詢的物件的權限。
-
您要查詢的物件必須採用以下格式之一:
-
CSV。可原樣使用或壓縮為 GZIP 或 BZIP2 檔案。
-
鑲木地板。 Parquet 物件的附加要求:
-
S3 Select 僅支援使用 GZIP 或 Snappy 進行列壓縮。 S3 Select 不支援 Parquet 物件的整個物件壓縮。
-
S3 Select 不支援 Parquet 輸出。您必須將輸出格式指定為 CSV 或 JSON。
-
未壓縮的行組最大大小為 512 MB。
-
您必須使用物件模式中指定的資料類型。
-
您不能使用 INTERVAL、JSON、LIST、TIME 或 UUID 邏輯類型。
-
-
-
您的 SQL 表達式的最大長度為 256 KB。
-
輸入或結果中的任何記錄的最大長度為 1 MiB。
CSV 請求語法範例
POST /{Key+}?select&select-type=2 HTTP/1.1
Host: Bucket.s3.abc-company.com
x-amz-expected-bucket-owner: ExpectedBucketOwner
<?xml version="1.0" encoding="UTF-8"?>
<SelectObjectContentRequest xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Expression>string</Expression>
<ExpressionType>string</ExpressionType>
<RequestProgress>
<Enabled>boolean</Enabled>
</RequestProgress>
<InputSerialization>
<CompressionType>GZIP</CompressionType>
<CSV>
<AllowQuotedRecordDelimiter>boolean</AllowQuotedRecordDelimiter>
<Comments>#</Comments>
<FieldDelimiter>\t</FieldDelimiter>
<FileHeaderInfo>USE</FileHeaderInfo>
<QuoteCharacter>'</QuoteCharacter>
<QuoteEscapeCharacter>\\</QuoteEscapeCharacter>
<RecordDelimiter>\n</RecordDelimiter>
</CSV>
</InputSerialization>
<OutputSerialization>
<CSV>
<FieldDelimiter>string</FieldDelimiter>
<QuoteCharacter>string</QuoteCharacter>
<QuoteEscapeCharacter>string</QuoteEscapeCharacter>
<QuoteFields>string</QuoteFields>
<RecordDelimiter>string</RecordDelimiter>
</CSV>
</OutputSerialization>
<ScanRange>
<End>long</End>
<Start>long</Start>
</ScanRange>
</SelectObjectContentRequest>
Parquet 請求語法範例
POST /{Key+}?select&select-type=2 HTTP/1.1
Host: Bucket.s3.abc-company.com
x-amz-expected-bucket-owner: ExpectedBucketOwner
<?xml version="1.0" encoding="UTF-8"?>
<SelectObjectContentRequest xmlns=http://s3.amazonaws.com/doc/2006-03-01/>
<Expression>string</Expression>
<ExpressionType>string</ExpressionType>
<RequestProgress>
<Enabled>boolean</Enabled>
</RequestProgress>
<InputSerialization>
<CompressionType>GZIP</CompressionType>
<PARQUET>
</PARQUET>
</InputSerialization>
<OutputSerialization>
<CSV>
<FieldDelimiter>string</FieldDelimiter>
<QuoteCharacter>string</QuoteCharacter>
<QuoteEscapeCharacter>string</QuoteEscapeCharacter>
<QuoteFields>string</QuoteFields>
<RecordDelimiter>string</RecordDelimiter>
</CSV>
</OutputSerialization>
<ScanRange>
<End>long</End>
<Start>long</Start>
</ScanRange>
</SelectObjectContentRequest>
SQL 查詢範例
此查詢取得州名稱、2010 年人口、估計 2015 年人口以及美國人口普查資料的變化百分比。文件中非狀態的記錄將被忽略。
SELECT STNAME, CENSUS2010POP, POPESTIMATE2015, CAST((POPESTIMATE2015 - CENSUS2010POP) AS DECIMAL) / CENSUS2010POP * 100.0 FROM S3Object WHERE NAME = STNAME
要查詢的文件的前幾行, SUB-EST2020_ALL.csv ,看起來像這樣:
SUMLEV,STATE,COUNTY,PLACE,COUSUB,CONCIT,PRIMGEO_FLAG,FUNCSTAT,NAME,STNAME,CENSUS2010POP, ESTIMATESBASE2010,POPESTIMATE2010,POPESTIMATE2011,POPESTIMATE2012,POPESTIMATE2013,POPESTIMATE2014, POPESTIMATE2015,POPESTIMATE2016,POPESTIMATE2017,POPESTIMATE2018,POPESTIMATE2019,POPESTIMATE042020, POPESTIMATE2020 040,01,000,00000,00000,00000,0,A,Alabama,Alabama,4779736,4780118,4785514,4799642,4816632,4831586, 4843737,4854803,4866824,4877989,4891628,4907965,4920706,4921532 162,01,000,00124,00000,00000,0,A,Abbeville city,Alabama,2688,2705,2699,2694,2645,2629,2610,2602, 2587,2578,2565,2555,2555,2553 162,01,000,00460,00000,00000,0,A,Adamsville city,Alabama,4522,4487,4481,4474,4453,4430,4399,4371, 4335,4304,4285,4254,4224,4211 162,01,000,00484,00000,00000,0,A,Addison town,Alabama,758,754,751,750,745,744,742,734,734,728, 725,723,719,717
AWS-CLI 使用範例 (CSV)
aws s3api select-object-content --endpoint-url https://10.224.7.44:10443 --no-verify-ssl --bucket 619c0755-9e38-42e0-a614-05064f74126d --key SUB-EST2020_ALL.csv --expression-type SQL --input-serialization '{"CSV": {"FileHeaderInfo": "USE", "Comments": "#", "QuoteEscapeCharacter": "\"", "RecordDelimiter": "\n", "FieldDelimiter": ",", "QuoteCharacter": "\"", "AllowQuotedRecordDelimiter": false}, "CompressionType": "NONE"}' --output-serialization '{"CSV": {"QuoteFields": "ASNEEDED", "QuoteEscapeCharacter": "#", "RecordDelimiter": "\n", "FieldDelimiter": ",", "QuoteCharacter": "\""}}' --expression "SELECT STNAME, CENSUS2010POP, POPESTIMATE2015, CAST((POPESTIMATE2015 - CENSUS2010POP) AS DECIMAL) / CENSUS2010POP * 100.0 FROM S3Object WHERE NAME = STNAME" changes.csv
輸出文件的前幾行, changes.csv ,看起來像這樣:
Alabama,4779736,4854803,1.5705260708959658022953568983726297854 Alaska,710231,738430,3.9703983633493891424057806544631253775 Arizona,6392017,6832810,6.8959922978928247531256565807005832431 Arkansas,2915918,2979732,2.1884703204959810255295244928012378949 California,37253956,38904296,4.4299724839960620557988526104449148971 Colorado,5029196,5454328,8.4532796097030221132761578590295546246
AWS-CLI 使用範例(Parquet)
aws s3api select-object-content -endpoint-url https://10.224.7.44:10443 --bucket 619c0755-9e38-42e0-a614-05064f74126d --key SUB-EST2020_ALL.parquet --expression "SELECT STNAME, CENSUS2010POP, POPESTIMATE2015, CAST((POPESTIMATE2015 - CENSUS2010POP) AS DECIMAL) / CENSUS2010POP * 100.0 FROM S3Object WHERE NAME = STNAME" --expression-type 'SQL' --input-serialization '{"Parquet":{}}' --output-serialization '{"CSV": {}}' changes.csv
輸出檔 changes.csv 的前幾行如下:
Alabama,4779736,4854803,1.5705260708959658022953568983726297854 Alaska,710231,738430,3.9703983633493891424057806544631253775 Arizona,6392017,6832810,6.8959922978928247531256565807005832431 Arkansas,2915918,2979732,2.1884703204959810255295244928012378949 California,37253956,38904296,4.4299724839960620557988526104449148971 Colorado,5029196,5454328,8.4532796097030221132761578590295546246