# Kafka Connect{background-color="white" background-image="https://images.ctfassets.net/gt6dp23g0g38/5vGOBwLiNaRedNyB0yaiIu/529a29a059d8971541309f7f57502dd2/ingest-data-upstream-systems.jpg" background-size="80%" background-opacity="0.5"}


## Overview
<https://kafka.apache.org/documentation.html#connect_overview>

- Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems.
- Kafka Connect can ingest entire databases or collect metrics from all your application servers into Kafka topics, making the data available for stream processing with low latency.
- An export job can deliver data from Kafka topics into secondary storage and query systems or into batch systems for offline analysis.

## Features

**A common framework for Kafka connectors** 

:::: {.columns}

::: {.fragment .column width="50%"}
Kafka Connect standardizes integration of other data systems with Kafka, simplifying connector development, deployment, and management
::: 

::: {.fragment .column width="50%"}
![](https://images.ctfassets.net/8vofjvai1hpv/4io4iF1i7C6vaHt3w0EIal/b608a11ae2613cd91a226680c6796322/blog_IntroducingConfluentHub.png)
<https://www.confluent.io/hub/>
:::
::::



**Distributed and standalone modes** 

:::: {.columns}

::: {.fragment .column width="50%"}
Scale up to a large, centrally managed service supporting an entire organization or scale down to development, testing, and small production deployments
::: 

::: {.fragment .column width="50%"}
![](https://cdn.confluent.io/wp-content/uploads/kafka-connect-2.png)
<https://www.confluent.io/blog/create-dynamic-kafka-connect-source-connectors/>
:::
::::




**REST interface**

:::: {.columns}

::: {.fragment .column width="50%"}
Submit and manage connectors to your Kafka Connect cluster via an easy to use REST API
::: 

::: {.fragment .column width="50%"}
![](https://img.youtube.com/vi/4xWPDXhBi3g/maxresdefault.jpg)
<https://developer.confluent.io/learn-kafka/kafka-connect/rest-api/>
:::
::::



**Automatic offset management** 

:::: {.columns}

::: {.fragment .column width="50%"}
With just a little information from connectors, Kafka Connect can manage the offset commit process automatically so connector developers do not need to worry about this error prone part of connector development
::: 

::: {.fragment .column width="50%"}

Examples File Source Connector:

The File Source Connector uses offsets to keep track of the last byte position read in the file. As it reads new data, it periodically commits the latest byte position as the offset. If the connector restarts, it resumes reading from the last committed byte position. This prevents re-reading the entire file from the beginning, which could lead to duplicate data in Kafka.

[See Source Code]( https://github.com/a0x8o/kafka/blob/master/connect/file/src/main/java/org/apache/kafka/connect/file/FileStreamSourceTask.java)
:::
::::



**Distributed and scalable by default** 

:::: {.columns}

::: {.fragment .column width="50%"}
Kafka Connect builds on the existing group management protocol. More workers can be added to scale up a Kafka Connect cluster.
::: 

::: {.fragment .column width="50%"}

![](https://www.oreilly.com/api/v2/epubs/9781787122765/files/assets/842ab4a5-f79c-43d4-bd61-96b2eb53676a.png)
<https://www.oreilly.com/library/view/modern-big-data/9781787122765/>
:::
::::



**Streaming/batch integration** 

:::: {.columns}

::: {.fragment .column width="50%"}
Leveraging Kafka's existing capabilities, Kafka Connect is an ideal solution for bridging streaming and batch data systems
::: 

::: {.fragment .column width="50%"}
![](https://i.imgflip.com/7icsk4.jpg)
[NicsMeme](https://imgflip.com/i/7icsk4)
:::
::::




# Connect Standalone Demo


## Start a kafka connect worker 

A worker can be run using the following command
```bash
> bin/connect-standalone.sh config/connect-standalone.properties [connector1.properties connector2.properties ...]
```


## File to File

Goal: Create a kafka connect process that reads from a file and writes to another file.

::: {.fragment}
**worker properties**

```properties
bootstrap.servers=kafkaServer:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
plugin.path=/opt/kafka/libs/connect-file-3.8.0.jar
```
::: 

::: {.fragment}
**source properties**

```properties
name=local-file-source
connector.class=FileStreamSource
tasks.max=1
file=test.txt
topic=connect-test
```
::: 

::: {.fragment}
**sink properties**

```properties
name=local-file-sink
connector.class=FileStreamSink
tasks.max=1
file=test.sink.txt
topics=connect-test
```
::: 


```bash
# GO in repo/kafka-connect dir
cd kafka-connect

# Start Kafka Server
docker run --rm -p 9092:9092 --network tap --name kafkaServer -v $(pwd):/connect apache/kafka:3.8.0

# Create the topic (optional)
docker exec -it --workdir /opt/kafka/bin/ kafkaServer ./kafka-topics.sh --alter --bootstrap-server kafkaServer:9092  --topic connect-test --partitions 1

# Start connect 
docker exec -it --workdir /opt/kafka/bin/ kafkaServer ./connect-standalone.sh /connect/connect-standalone.properties /connect/connect-file-source.properties /connect/connect-file-sink.properties

# In another tab write to the source file
docker exec -it kafkaServer sh -c "echo hello > /tmp/my-test.txt"

# In another tab open a consumer
docker exec --workdir /opt/kafka/bin/ -it kafkaServer ./kafka-console-consumer.sh --topic connect-test --from-beginning --bootstrap-server localhost:9092

# In another tab run a tail on the destination  
docker exec -it kafkaServer sh -c "tail -f /tmp/test.sink.txt"

```

## API Rest
https://kafka.apache.org/documentation/#connect_rest

```bash
# Open a new tab 
docker exec -it kafkaServer wget -qO /tmp/api  http://localhost:8083/connectors
docker exec -it kafkaServer cat /tmp/api  

```