# Kafka Connect

Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems.

It makes it simple to quickly define connectors that move large collections of data into and out of Kafka. 

Kafka Connect can ingest entire databases or collect metrics from all your application servers into Kafka topics, making the data available for stream processing with low latency. 

An export job can deliver data from Kafka topics into secondary storage and query systems or into batch systems for offline analysis.

## Features

**A common framework for Kafka connectors** 

Kafka Connect standardizes integration of other data systems with Kafka, simplifying connector development, deployment, and management

![](https://images.ctfassets.net/8vofjvai1hpv/4io4iF1i7C6vaHt3w0EIal/b608a11ae2613cd91a226680c6796322/blog_IntroducingConfluentHub.png)
https://www.confluent.io/hub/

**Distributed and standalone modes** 

Scale up to a large, centrally managed service supporting an entire organization or scale down to development, testing, and small production deployments

![](https://cdn.confluent.io/wp-content/uploads/kafka-connect-2.png)
https://www.confluent.io/blog/create-dynamic-kafka-connect-source-connectors/

**REST interface**
Submit and manage connectors to your Kafka Connect cluster via an easy to use REST API

![](https://img.youtube.com/vi/4xWPDXhBi3g/maxresdefault.jpg)
https://developer.confluent.io/learn-kafka/kafka-connect/rest-api/

**Automatic offset management** 

With just a little information from connectors, Kafka Connect can manage the offset commit process automatically so connector developers do not need to worry about this error prone part of connector development

![](https://static.wikia.nocookie.net/fa4cdf31-15f4-492a-ad62-4cec313ba39b/scale-to-width/370)

https://harrypotter.fandom.com/f/p/4400000000003402981/r/4400000000011097584

**Distributed and scalable by default** 

Kafka Connect builds on the existing group management protocol. More workers can be added to scale up a Kafka Connect cluster.

![](https://www.oreilly.com/api/v2/epubs/9781787122765/files/assets/842ab4a5-f79c-43d4-bd61-96b2eb53676a.png)
https://www.oreilly.com/library/view/modern-big-data/9781787122765/

**Streaming/batch integration** 

Leveraging Kafka's existing capabilities, Kafka Connect is an ideal solution for bridging streaming and batch data systems

![](https://i.imgflip.com/7icsk4.jpg)
[NicsMeme](https://imgflip.com/i/7icsk4)

## Connect Standalone Demo

A worker can be run using the following command
```bash
> bin/connect-standalone.sh config/connect-standalone.properties [connector1.properties connector2.properties ...]
```

The list of the few 

# A File to File example

Goal: Create a kafka connect process that reads from a file and writes to another file.

# Configuration

```properties
bootstrap.servers=kafkaServer:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
plugin.path=/opt/kafka/libs/connect-file-3.4.0.jar
```

# Source

```properties
name=local-file-source
connector.class=FileStreamSource
tasks.max=1
file=test.txt
topic=connect-test
```

# Sink

```properties
name=local-file-sink
connector.class=FileStreamSink
tasks.max=1
file=test.sink.txt
topics=connect-test
```

```bash
# Assuming Kafka Server and ZooKeeper are up and running

docker exec -it kafkaServer kafka-topics.sh --bootstrap-server kafkaServer:9092 --create --topic connect-test 
# Start a new container (please note that referenced file are adjusted to work in docker)
docker run --rm -e KAFKA_ACTION=connect-standalone -e KAFKA_WORKER_PROPERTIES=connect-standalone.properties -e KAFKA_CONNECTOR_PROPERTIES="config/connect-file-source.properties config/connect-file-sink.properties" --network tap --name kafkaConnect tap:kafka

# In another tab open a consumer
docker run --rm  -e KAFKA_ACTION=consumer -e KAFKA_TOPIC=connect-test --network tap   -it tap:kafka

# In another tab open a shell inside the kafkaConnect 
docker exec -it kafkaConnect /bin/bash
cd /tmp/
echo "hello" > my-test.txt

# In another tab open another shell inside the kafkaConnect 
docker exec -it kafkaConnect /bin/bash
cd /tmp/
tail -f test.sink.txt

```

# API Rest
https://kafka.apache.org/documentation/#connect_rest


```bash
# Open a new tab 
docker exec -it kafkaConnect /bin/bash
curl -s -XGET http://localhost:8083/connectors
```

# Biblio
- https://blog.softwaremill.com/do-not-reinvent-the-wheel-use-kafka-connect-4bcabb143292
- https://dev.to/thegroo/kafka-connect-crash-course-1chd
- https://data-flair.training/blogs/kafka-connect/
- https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-thereabouts-part-1/
- https://data-flair.training/blogs/kafka-connect/