# Apache Kafka in Python

### Setting up and running

#### Previous Class Materials

Start ZooKeeper: 
- `bin/zookeeper-server-start.sh config/zookeeper.properties` 

Start Kafka Server: 
- `bin/kafka-server-start.sh config/server.properties` 

Kafka topics: 
- `bin/kafka-topics.sh --list --zookeeper localhost:2181` 

Create Kafka Topic:
- `bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic kafka-topic`

Start Producer: 
- `bin/kafka-console-producer.sh --broker-list localhost:9092 --topic kafka-topic`

Start Consumer: 
- `bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic kafka-topic --from-beginning`

Let's create three python files: 
- `consumer.py`: Kafka Consumer 
- `data.py`: Data Generator 
- `producer.py`: Kafka Producer 

Requirements: 
```
Faker
kafka-python
```
How to install: 
```
pip install Faker
pip install kafka-python
```

### data.py: 

```python 
from faker import Faker

fake = Faker()


def get_registered_user():
    return {
        "name": fake.name(),
        "address": fake.address(),
        "created_at": fake.year()
    }


if __name__ == "__main__":
    print(get_registered_user())
```

### consumer.py: 

```python 
from kafka import KafkaConsumer
import json

if __name__ == "__main__":
    consumer = KafkaConsumer(
        "registered_user",
        bootstrap_servers='localhost:9092',
        auto_offset_reset='earliest',
        group_id="consumer-group-a")
    print("starting the consumer")
    for msg in consumer:
        print("Registered User = {}".format(json.loads(msg.value)))
```

### producer.py:

```python
from kafka import KafkaProducer
import json
from data import get_registered_user
import time


def json_serializer(data):
    return json.dumps(data).encode("utf-8")


producer = KafkaProducer(bootstrap_servers=['localhost:9092'],
                         value_serializer=json_serializer)

if __name__ == "__main__":
    while True:
        registered_user = get_registered_user()
        print(registered_user)
        producer.send("registered_user", registered_user)
        time.sleep(4)
```

## Anamoly Detection

![alt](https://raw.githubusercontent.com/tnurbek/ds702/main/Lab9/scheme.png)

### Task: 

- Try to create a data generator and producer (data can be numeric and suitable for some machine learning models, such as linear regression).
- After that, train your model and save it with pickle or joblib. 
- Create a producer that follows similar distribution as in the model training data but with some outliers (be creative)
- Create an anomaly detector with a specific threshold (e.g. MSE score). Use saved ML model. 