本繁體中文版使用機器翻譯,譯文僅供參考,若與英文版本牴觸,應以英文版本為準。
選取物件內容
貢獻者
建議變更
您可以使用S3 SelectObjectContent要求、根據簡單的SQL陳述來篩選S3物件的內容。
開始之前
-
租戶帳戶具有S3 Select權限。
-
您有 `s3:GetObject`查詢物件的權限。
-
您要查詢的物件必須採用下列其中一種格式:
-
* CSV* 。可依原樣使用、也可壓縮至 GZIP 或 bzip2 歸檔。
-
* 硬地板 * 。硬地板物件的其他需求:
-
S3 Select 僅支援使用 GZIP 或 Snappy 進行柱式壓縮。S3 Select 不支援 Parquet 物件的全物件壓縮。
-
S3 Select 不支援硬地板輸出。您必須將輸出格式指定為 CSV 或 JSON 。
-
最大未壓縮列群組大小為 512 MB 。
-
您必須使用物件架構中指定的資料類型。
-
您無法使用時間間隔、 JSON 、清單、時間或 UUID 邏輯類型。
-
-
-
SQL運算式的最大長度為256 KB。
-
輸入或結果中的任何記錄最大長度為1個mib。
CSV 要求語法範例
POST /{Key+}?select&select-type=2 HTTP/1.1
Host: Bucket.s3.abc-company.com
x-amz-expected-bucket-owner: ExpectedBucketOwner
<?xml version="1.0" encoding="UTF-8"?>
<SelectObjectContentRequest xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Expression>string</Expression>
<ExpressionType>string</ExpressionType>
<RequestProgress>
<Enabled>boolean</Enabled>
</RequestProgress>
<InputSerialization>
<CompressionType>GZIP</CompressionType>
<CSV>
<AllowQuotedRecordDelimiter>boolean</AllowQuotedRecordDelimiter>
<Comments>#</Comments>
<FieldDelimiter>\t</FieldDelimiter>
<FileHeaderInfo>USE</FileHeaderInfo>
<QuoteCharacter>'</QuoteCharacter>
<QuoteEscapeCharacter>\\</QuoteEscapeCharacter>
<RecordDelimiter>\n</RecordDelimiter>
</CSV>
</InputSerialization>
<OutputSerialization>
<CSV>
<FieldDelimiter>string</FieldDelimiter>
<QuoteCharacter>string</QuoteCharacter>
<QuoteEscapeCharacter>string</QuoteEscapeCharacter>
<QuoteFields>string</QuoteFields>
<RecordDelimiter>string</RecordDelimiter>
</CSV>
</OutputSerialization>
<ScanRange>
<End>long</End>
<Start>long</Start>
</ScanRange>
</SelectObjectContentRequest>
拼花地板要求語法範例
POST /{Key+}?select&select-type=2 HTTP/1.1
Host: Bucket.s3.abc-company.com
x-amz-expected-bucket-owner: ExpectedBucketOwner
<?xml version="1.0" encoding="UTF-8"?>
<SelectObjectContentRequest xmlns=http://s3.amazonaws.com/doc/2006-03-01/>
<Expression>string</Expression>
<ExpressionType>string</ExpressionType>
<RequestProgress>
<Enabled>boolean</Enabled>
</RequestProgress>
<InputSerialization>
<CompressionType>GZIP</CompressionType>
<PARQUET>
</PARQUET>
</InputSerialization>
<OutputSerialization>
<CSV>
<FieldDelimiter>string</FieldDelimiter>
<QuoteCharacter>string</QuoteCharacter>
<QuoteEscapeCharacter>string</QuoteEscapeCharacter>
<QuoteFields>string</QuoteFields>
<RecordDelimiter>string</RecordDelimiter>
</CSV>
</OutputSerialization>
<ScanRange>
<End>long</End>
<Start>long</Start>
</ScanRange>
</SelectObjectContentRequest>
SQL查詢範例
此查詢會取得州名、2010年人口、2015年估計人口、以及美國統計資料的變更百分比。檔案中非狀態的記錄會被忽略。
SELECT STNAME, CENSUS2010POP, POPESTIMATE2015, CAST((POPESTIMATE2015 - CENSUS2010POP) AS DECIMAL) / CENSUS2010POP * 100.0 FROM S3Object WHERE NAME = STNAME
要查詢的檔案的前幾行 SUB-EST2020_ALL.csv
、如下所示:
SUMLEV,STATE,COUNTY,PLACE,COUSUB,CONCIT,PRIMGEO_FLAG,FUNCSTAT,NAME,STNAME,CENSUS2010POP, ESTIMATESBASE2010,POPESTIMATE2010,POPESTIMATE2011,POPESTIMATE2012,POPESTIMATE2013,POPESTIMATE2014, POPESTIMATE2015,POPESTIMATE2016,POPESTIMATE2017,POPESTIMATE2018,POPESTIMATE2019,POPESTIMATE042020, POPESTIMATE2020 040,01,000,00000,00000,00000,0,A,Alabama,Alabama,4779736,4780118,4785514,4799642,4816632,4831586, 4843737,4854803,4866824,4877989,4891628,4907965,4920706,4921532 162,01,000,00124,00000,00000,0,A,Abbeville city,Alabama,2688,2705,2699,2694,2645,2629,2610,2602, 2587,2578,2565,2555,2555,2553 162,01,000,00460,00000,00000,0,A,Adamsville city,Alabama,4522,4487,4481,4474,4453,4430,4399,4371, 4335,4304,4285,4254,4224,4211 162,01,000,00484,00000,00000,0,A,Addison town,Alabama,758,754,751,750,745,744,742,734,734,728, 725,723,719,717
AWS-CLI 使用範例( CSV )
aws s3api select-object-content --endpoint-url https://10.224.7.44:10443 --no-verify-ssl --bucket 619c0755-9e38-42e0-a614-05064f74126d --key SUB-EST2020_ALL.csv --expression-type SQL --input-serialization '{"CSV": {"FileHeaderInfo": "USE", "Comments": "#", "QuoteEscapeCharacter": "\"", "RecordDelimiter": "\n", "FieldDelimiter": ",", "QuoteCharacter": "\"", "AllowQuotedRecordDelimiter": false}, "CompressionType": "NONE"}' --output-serialization '{"CSV": {"QuoteFields": "ASNEEDED", "QuoteEscapeCharacter": "#", "RecordDelimiter": "\n", "FieldDelimiter": ",", "QuoteCharacter": "\""}}' --expression "SELECT STNAME, CENSUS2010POP, POPESTIMATE2015, CAST((POPESTIMATE2015 - CENSUS2010POP) AS DECIMAL) / CENSUS2010POP * 100.0 FROM S3Object WHERE NAME = STNAME" changes.csv
輸出文件的前幾行 `changes.csv`如下所示:
Alabama,4779736,4854803,1.5705260708959658022953568983726297854 Alaska,710231,738430,3.9703983633493891424057806544631253775 Arizona,6392017,6832810,6.8959922978928247531256565807005832431 Arkansas,2915918,2979732,2.1884703204959810255295244928012378949 California,37253956,38904296,4.4299724839960620557988526104449148971 Colorado,5029196,5454328,8.4532796097030221132761578590295546246
AWS-CLI 使用範例( Parquet )
aws s3api select-object-content -endpoint-url https://10.224.7.44:10443 --bucket 619c0755-9e38-42e0-a614-05064f74126d --key SUB-EST2020_ALL.parquet --expression "SELECT STNAME, CENSUS2010POP, POPESTIMATE2015, CAST((POPESTIMATE2015 - CENSUS2010POP) AS DECIMAL) / CENSUS2010POP * 100.0 FROM S3Object WHERE NAME = STNAME" --expression-type 'SQL' --input-serialization '{"Parquet":{}}' --output-serialization '{"CSV": {}}' changes.csv
輸出檔案的前幾行: changes .csv 、如下所示:
Alabama,4779736,4854803,1.5705260708959658022953568983726297854 Alaska,710231,738430,3.9703983633493891424057806544631253775 Arizona,6392017,6832810,6.8959922978928247531256565807005832431 Arkansas,2915918,2979732,2.1884703204959810255295244928012378949 California,37253956,38904296,4.4299724839960620557988526104449148971 Colorado,5029196,5454328,8.4532796097030221132761578590295546246