# Apache Kafka

http://cloudurable.com/blog/kafka-architecture/index.html

The documentation on Kafka is available here: [https://kafka.apache.org/.](https://kafka.apache.org/)


Apache Kafka is a distributed event store and stream-processing platform


Kafka is primarily used to build real-time streaming data pipelines and applications that adapt to the data streams. It combines messaging, storage, and stream processing to allow storage and analysis of both historical and real-time data.

### Running a Kafka Broker

#### Installation

`Installation` The first step to use Kafka is to download the archive including all binary files and
extracting this archive:
```
wget https://archive.apache.org/dist/kafka/2.7.0/kafka_2.12-2.7.0.tgz
tar zxvf kafka_2.12-2.7.0.tgz
```

It's better to work on terminals. 

In [None]:
!wget https://archive.apache.org/dist/kafka/2.7.0/kafka_2.12-2.7.0.tgz
!tar zxvf kafka_2.12-2.7.0.tgz


# A kafka folder will be made in the current folder

At this point, Kafka is ready to be used. 

#### Starting a Zookeeper

Kafka relies on Zookeeper to reliably store information about the configuration of a Kafka Cluster, about the messages that have been delivered to clients, etc.

In a realistic setup, the Zookeeper service should be run on multiple nodes, that are not the ones
where Kafka is going to execute. However, in this lab, we are going to simply make some tests
with a local deployment. As such, we are going to run a single-node Zookeeper instance, as
follows:

Go to the bin folder of kafka_2.12-2.7.0

```
bin/zookeeper-server-start.sh config/zookeeper.properties
```

In [None]:
# !bin/zookeeper-server-start.sh config/zookeeper.properties

In [None]:
sh zookeeper-server-start.sh ./../config/zookeeper.properties # Run this in bin
#  client port address is 0.0.0.0:2181

#### Starting a Kafka Broker

Here, we are going to work with a single Kafka broker. To launch the broker, run in a new terminal:

```
bin/kafka-server-start.sh config/server.properties
```

In [None]:
# !bin/kafka-server-start.sh config/server.properties

In [None]:
sh kafka-server-start.sh ./../config/server.properties

### Working with Kafka

Messages in Kafka are published in a topic. The following command can be used to obtain the list
of topics that already exist in a Kafka cluster: 
```sh
bin/kafka-topics.sh --list --zookeeper localhost:2181
```

In [None]:
# !bin/kafka-topics.sh --list --zookeeper localhost:2181
# In bin file
sh kafka-topics.sh --list --zookeeper localhost:2181
    
# Will be empty

At the beginning, you should observe that no topics exist. Hence we are going to create one:

```
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 \
    --replication-factor 1 --partitions 1 \
    --topic kafka-topic
```

In [None]:
# Create a topic
# In bin file
sh kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic kafka-topic

You can then verify that this new topic exists. 

In [None]:
sh kafka-topics.sh --list --zookeeper localhost:2181

# You will get kafka-topic as output

It is now time to publish the first messages in the topic. To publish messages, we are going to use
the Kafka client console. To start the console, run:

```
bin/kafka-console-producer.sh --broker-list localhost:9092 \
    --topic kafka-topic
```

From this point on, you can start publishing messages on the topic by entering messages in the
console. However, the setup is not yet very interesting as there are no processes reading the
messages published on the topic. 

In [None]:
sh kafka-console-producecd.sh --broker-list localhost:9092 --topic kafka-topic

To read the messages published on the topic, we are going to start a consumer console: 

```
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 \
    --topic kafka-topic --from-beginning
```

Note that since we used the flag -from-beginning, the consumer receives all the messages
published since the topic has been created. To create a client that only receives messages published
after it connected, simply remove this flag.

### Cleaning

In [None]:
cd to / (root) then
rm -rf /tmp/zookeeper/* /tmp/kafka-logs/*

# Keep the folders I guess for later

If you're going to create a new environment: 

- first, stop the existing Kafka broker by sending a `SIGINT` (`Ctrl + C`). 
- then, stop the Zookeeper service by sending a `SIGINT` signal to the process. (note that it is important to stop the Kafka broker first) 
- delete the logs of Zookeeper and Kafka using the following command: 
```
rm -rf /tmp/zookeeper/* /tmp/kafka-logs/*
```