La versione in lingua italiana fornita proviene da una traduzione automatica. Per eventuali incoerenze, fare riferimento alla versione in lingua inglese.

Connettore confluente s3

12/15/2025 Collaboratori

PDF

Il connettore Amazon S3 Sink esporta i dati dagli argomenti Apache Kafka agli oggetti S3 nei formati Avro, JSON o Bytes. Il connettore sink Amazon S3 interroga periodicamente i dati da Kafka e a sua volta li carica su S3. Un partizionatore viene utilizzato per suddividere i dati di ogni partizione Kafka in blocchi. Ogni blocco di dati è rappresentato come un oggetto S3. Il nome della chiave codifica l'argomento, la partizione Kafka e l'offset iniziale di questo blocco di dati.

In questa configurazione, ti mostriamo come leggere e scrivere argomenti nell'archiviazione di oggetti da Kafka direttamente utilizzando il connettore sink Kafka s3. Per questo test abbiamo utilizzato un cluster Confluent autonomo, ma questa configurazione è applicabile anche a un cluster distribuito.

Scarica Confluent Kafka dal sito web di Confluent.
Decomprimi il pacchetto in una cartella sul tuo server.

Esporta due variabili.

Export CONFLUENT_HOME=/data/confluent/confluent-6.2.0
export PATH=$PATH:/data/confluent/confluent-6.2.0/bin

Per una configurazione Confluent Kafka autonoma, il cluster crea una cartella radice temporanea in /tmp Crea inoltre Zookeeper, Kafka, un registro di schemi, connect, un server ksql e cartelle del centro di controllo e copia i rispettivi file di configurazione da $CONFLUENT_HOME . Vedere il seguente esempio:

root@stlrx2540m1-108:~# ls -ltr /tmp/confluent.406980/
total 28
drwxr-xr-x 4 root root 4096 Oct 29 19:01 zookeeper
drwxr-xr-x 4 root root 4096 Oct 29 19:37 kafka
drwxr-xr-x 4 root root 4096 Oct 29 19:40 schema-registry
drwxr-xr-x 4 root root 4096 Oct 29 19:45 kafka-rest
drwxr-xr-x 4 root root 4096 Oct 29 19:47 connect
drwxr-xr-x 4 root root 4096 Oct 29 19:48 ksql-server
drwxr-xr-x 4 root root 4096 Oct 29 19:53 control-center
root@stlrx2540m1-108:~#

Configura Zookeeper. Se si utilizzano i parametri predefiniti non è necessario apportare alcuna modifica.

root@stlrx2540m1-108:~# cat  /tmp/confluent.406980/zookeeper/zookeeper.properties  | grep -iv ^#
dataDir=/tmp/confluent.406980/zookeeper/data
clientPort=2181
maxClientCnxns=0
admin.enableServer=false
tickTime=2000
initLimit=5
syncLimit=2
server.179=controlcenter:2888:3888
root@stlrx2540m1-108:~#

Nella configurazione di cui sopra, abbiamo aggiornato il server. xxx proprietà. Di default, per la selezione del leader Kafka sono necessari tre Zookeeper.

Abbiamo creato un file myid in /tmp/confluent.406980/zookeeper/data con un ID univoco:
```
root@stlrx2540m1-108:~# cat /tmp/confluent.406980/zookeeper/data/myid
179
root@stlrx2540m1-108:~#
```
Abbiamo utilizzato l'ultimo numero di indirizzi IP per il file myid. Abbiamo utilizzato valori predefiniti per le configurazioni Kafka, connect, control-center, Kafka, Kafka-rest, ksql-server e schema-registry.

Avviare i servizi Kafka.

root@stlrx2540m1-108:/data/confluent/confluent-6.2.0/bin# confluent local services  start
The local commands are intended for a single-node development environment only,
NOT for production usage.
 
Using CONFLUENT_CURRENT: /tmp/confluent.406980
ZooKeeper is [UP]
Kafka is [UP]
Schema Registry is [UP]
Kafka REST is [UP]
Connect is [UP]
ksqlDB Server is [UP]
Control Center is [UP]
root@stlrx2540m1-108:/data/confluent/confluent-6.2.0/bin#

Per ogni configurazione è presente una cartella di registro, utile per risolvere i problemi. In alcuni casi, l'avvio dei servizi richiede più tempo. Assicurarsi che tutti i servizi siano attivi e funzionanti.

Installa Kafka Connect utilizzando confluent-hub .

root@stlrx2540m1-108:/data/confluent/confluent-6.2.0/bin# ./confluent-hub install confluentinc/kafka-connect-s3:latest
The component can be installed in any of the following Confluent Platform installations:
  1. /data/confluent/confluent-6.2.0 (based on $CONFLUENT_HOME)
  2. /data/confluent/confluent-6.2.0 (where this tool is installed)
Choose one of these to continue the installation (1-2): 1
Do you want to install this into /data/confluent/confluent-6.2.0/share/confluent-hub-components? (yN) y

Component's license:
Confluent Community License
http://www.confluent.io/confluent-community-license
I agree to the software license agreement (yN) y
Downloading component Kafka Connect S3 10.0.3, provided by Confluent, Inc. from Confluent Hub and installing into /data/confluent/confluent-6.2.0/share/confluent-hub-components
Do you want to uninstall existing version 10.0.3? (yN) y
Detected Worker's configs:
  1. Standard: /data/confluent/confluent-6.2.0/etc/kafka/connect-distributed.properties
  2. Standard: /data/confluent/confluent-6.2.0/etc/kafka/connect-standalone.properties
  3. Standard: /data/confluent/confluent-6.2.0/etc/schema-registry/connect-avro-distributed.properties
  4. Standard: /data/confluent/confluent-6.2.0/etc/schema-registry/connect-avro-standalone.properties
  5. Based on CONFLUENT_CURRENT: /tmp/confluent.406980/connect/connect.properties
  6. Used by Connect process with PID 15904: /tmp/confluent.406980/connect/connect.properties
Do you want to update all detected configs? (yN) y
Adding installation directory to plugin path in the following files:
  /data/confluent/confluent-6.2.0/etc/kafka/connect-distributed.properties
  /data/confluent/confluent-6.2.0/etc/kafka/connect-standalone.properties
  /data/confluent/confluent-6.2.0/etc/schema-registry/connect-avro-distributed.properties
  /data/confluent/confluent-6.2.0/etc/schema-registry/connect-avro-standalone.properties
  /tmp/confluent.406980/connect/connect.properties
  /tmp/confluent.406980/connect/connect.properties

Completed
root@stlrx2540m1-108:/data/confluent/confluent-6.2.0/bin#

È anche possibile installare una versione specifica utilizzando confluent-hub install confluentinc/kafka-connect-s3:10.0.3 .

Per impostazione predefinita, confluentinc-kafka-connect-s3 è installato in /data/confluent/confluent-6.2.0/share/confluent-hub-components/confluentinc-kafka-connect-s3 .

Aggiorna il percorso del plug-in con il nuovo confluentinc-kafka-connect-s3 .

root@stlrx2540m1-108:~# cat /data/confluent/confluent-6.2.0/etc/kafka/connect-distributed.properties | grep plugin.path
# plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors,
plugin.path=/usr/share/java,/data/zookeeper/confluent/confluent-6.2.0/share/confluent-hub-components,/data/confluent/confluent-6.2.0/share/confluent-hub-components,/data/confluent/confluent-6.2.0/share/confluent-hub-components/confluentinc-kafka-connect-s3
root@stlrx2540m1-108:~#

Arrestare i servizi Confluent e riavviarli.

confluent local services  stop
confluent local services  start
root@stlrx2540m1-108:/data/confluent/confluent-6.2.0/bin# confluent local services  status
The local commands are intended for a single-node development environment only,
NOT for production usage.
 
Using CONFLUENT_CURRENT: /tmp/confluent.406980
Connect is [UP]
Control Center is [UP]
Kafka is [UP]
Kafka REST is [UP]
ksqlDB Server is [UP]
Schema Registry is [UP]
ZooKeeper is [UP]
root@stlrx2540m1-108:/data/confluent/confluent-6.2.0/bin#

Configurare l'ID di accesso e la chiave segreta in /root/.aws/credentials file.

root@stlrx2540m1-108:~# cat /root/.aws/credentials
[default]
aws_access_key_id = xxxxxxxxxxxx
aws_secret_access_key = xxxxxxxxxxxxxxxxxxxxxxxxxx
root@stlrx2540m1-108:~#

Verificare che il bucket sia raggiungibile.

root@stlrx2540m4-01:~# aws s3 –endpoint-url http://kafkasgd.rtpppe.netapp.com:10444 ls kafkasgdbucket1-2
2021-10-29 21:04:18       1388 1
2021-10-29 21:04:20       1388 2
2021-10-29 21:04:22       1388 3
root@stlrx2540m4-01:~#

Configurare il file delle proprietà s3-sink per la configurazione s3 e bucket.

root@stlrx2540m1-108:~# cat /data/confluent/confluent-6.2.0/share/confluent-hub-components/confluentinc-kafka-connect-s3/etc/quickstart-s3.properties | grep -v ^#
name=s3-sink
connector.class=io.confluent.connect.s3.S3SinkConnector
tasks.max=1
topics=s3_testtopic
s3.region=us-west-2
s3.bucket.name=kafkasgdbucket1-2
store.url=http://kafkasgd.rtpppe.netapp.com:10444/
s3.part.size=5242880
flush.size=3
storage.class=io.confluent.connect.s3.storage.S3Storage
format.class=io.confluent.connect.s3.format.avro.AvroFormat
partitioner.class=io.confluent.connect.storage.partitioner.DefaultPartitioner
schema.compatibility=NONE
root@stlrx2540m1-108:~#

Importa alcuni record nel bucket s3.

kafka-avro-console-producer --broker-list localhost:9092 --topic s3_topic \
--property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"f1","type":"string"}]}'
{"f1": "value1"}
{"f1": "value2"}
{"f1": "value3"}
{"f1": "value4"}
{"f1": "value5"}
{"f1": "value6"}
{"f1": "value7"}
{"f1": "value8"}
{"f1": "value9"}

Caricare il connettore s3-sink.

root@stlrx2540m1-108:~# confluent local services connect connector load s3-sink  --config /data/confluent/confluent-6.2.0/share/confluent-hub-components/confluentinc-kafka-connect-s3/etc/quickstart-s3.properties
The local commands are intended for a single-node development environment only,
NOT for production usage. https://docs.confluent.io/current/cli/index.html
{
  "name": "s3-sink",
  "config": {
    "connector.class": "io.confluent.connect.s3.S3SinkConnector",
    "flush.size": "3",
    "format.class": "io.confluent.connect.s3.format.avro.AvroFormat",
    "partitioner.class": "io.confluent.connect.storage.partitioner.DefaultPartitioner",
    "s3.bucket.name": "kafkasgdbucket1-2",
    "s3.part.size": "5242880",
    "s3.region": "us-west-2",
    "schema.compatibility": "NONE",
    "storage.class": "io.confluent.connect.s3.storage.S3Storage",
    "store.url": "http://kafkasgd.rtpppe.netapp.com:10444/",
    "tasks.max": "1",
    "topics": "s3_testtopic",
    "name": "s3-sink"
  },
  "tasks": [],
  "type": "sink"
}
root@stlrx2540m1-108:~#

Controllare lo stato del sink s3.

root@stlrx2540m1-108:~# confluent local services connect connector status s3-sink
The local commands are intended for a single-node development environment only,
NOT for production usage. https://docs.confluent.io/current/cli/index.html
{
  "name": "s3-sink",
  "connector": {
    "state": "RUNNING",
    "worker_id": "10.63.150.185:8083"
  },
  "tasks": [
    {
      "id": 0,
      "state": "RUNNING",
      "worker_id": "10.63.150.185:8083"
    }
  ],
  "type": "sink"
}
root@stlrx2540m1-108:~#

Controllare il registro per assicurarsi che s3-sink sia pronto ad accettare argomenti.
```
root@stlrx2540m1-108:~# confluent local services connect log
```

Consulta gli argomenti in Kafka.

kafka-topics --list --bootstrap-server localhost:9092
…
connect-configs
connect-offsets
connect-statuses
default_ksql_processing_log
s3_testtopic
s3_topic
s3_topic_new
root@stlrx2540m1-108:~#

Controllare gli oggetti nel bucket s3.

root@stlrx2540m1-108:~# aws s3 --endpoint-url http://kafkasgd.rtpppe.netapp.com:10444 ls --recursive kafkasgdbucket1-2/topics/
2021-10-29 21:24:00        213 topics/s3_testtopic/partition=0/s3_testtopic+0+0000000000.avro
2021-10-29 21:24:00        213 topics/s3_testtopic/partition=0/s3_testtopic+0+0000000003.avro
2021-10-29 21:24:00        213 topics/s3_testtopic/partition=0/s3_testtopic+0+0000000006.avro
2021-10-29 21:24:08        213 topics/s3_testtopic/partition=0/s3_testtopic+0+0000000009.avro
2021-10-29 21:24:08        213 topics/s3_testtopic/partition=0/s3_testtopic+0+0000000012.avro
2021-10-29 21:24:09        213 topics/s3_testtopic/partition=0/s3_testtopic+0+0000000015.avro
root@stlrx2540m1-108:~#

Per verificare il contenuto, copia ciascun file da S3 al tuo file system locale eseguendo il seguente comando:

root@stlrx2540m1-108:~# aws s3 --endpoint-url http://kafkasgd.rtpppe.netapp.com:10444 cp s3://kafkasgdbucket1-2/topics/s3_testtopic/partition=0/s3_testtopic+0+0000000000.avro  tes.avro
download: s3://kafkasgdbucket1-2/topics/s3_testtopic/partition=0/s3_testtopic+0+0000000000.avro to ./tes.avro
root@stlrx2540m1-108:~#

Per stampare i record, utilizzare avro-tools-1.11.0.1.jar (disponibile nel "Archivi Apache" ).

root@stlrx2540m1-108:~# java -jar /usr/src/avro-tools-1.11.0.1.jar tojson tes.avro
21/10/30 00:20:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
{"f1":"value1"}
{"f1":"value2"}
{"f1":"value3"}
root@stlrx2540m1-108:~#

Connettori Instaclustr Kafka Connect

Instaclustr supporta i connettori Kafka Connect e i loro dettagli - "Maggiori dettagli". Instaclustr fornisce connettori aggiuntivi "i loro dettagli"

Connettore confluente s3

Creating your file...

Connettori Instaclustr Kafka Connect