# Data Understanding

In [1]:
import polars as pl

In [2]:
df = pl.read_csv("datasets/armut_data.csv", parse_dates=True)

In [3]:
df.head()

UserId,ServiceId,CategoryId,CreateDate
i64,i64,i64,datetime[μs]
25446,4,5,2017-08-06 16:11:00
22948,48,5,2017-08-06 16:12:00
10618,0,8,2017-08-06 16:13:00
7256,9,4,2017-08-06 16:14:00
25446,48,5,2017-08-06 16:16:00


In [4]:
df.shape

(162523, 4)

In [5]:
df.null_count()

UserId,ServiceId,CategoryId,CreateDate
u32,u32,u32,u32
0,0,0,0


In [6]:
df.columns

['UserId', 'ServiceId', 'CategoryId', 'CreateDate']

# Run Tools
## Run Haddoop

![image.png](attachment:image.png)

## Run Hive
![image.png](attachment:image.png)

## Run Kafka
docker-compose.yml:
```
version: '3'
services:
  zookeeper:
    image: confluentinc/cp-zookeeper:7.3.0
    container_name: zookeeper
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000

  broker:
    image: confluentinc/cp-kafka:7.3.0
    container_name: broker
    ports:
    # To learn about configuring Kafka for access across networks see
    # https://www.confluent.io/blog/kafka-client-cannot-connect-to-broker-on-aws-on-docker-etc/
      - "9092:9092"
    depends_on:
      - zookeeper
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_INTERNAL:PLAINTEXT
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092,PLAINTEXT_INTERNAL://broker:29092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
```

In [1]:
#!docker-compose up -d

![image.png](attachment:image.png)

## Run Nifi

In [4]:
#!.\nifi.cmd start

UI:
![image.png](attachment:image.png)

## Run PostgreSQL
![image.png](attachment:image.png)

# Tools Preparation
## PostgreSQL Preparation
### Create Table
```
CREATE TABLE IF NOT EXISTS armut (
	UserId INT,
	ServiceId INT,
	CategoryId INT,
	CreateDate text,
	CreateTime text 
);
```
## Hive Preparation

### Create Database
```
CREATE DATABASE nifi;
```
### Create Table
```
CREATE TABLE IF NOT EXISTS nifi.armut (
	UserId INT,
	ServiceId INT,
	CategoryId INT,
	CreateDate STRING,
	CreateTime STRING
);
```
<br><br>
DBeaver:
![image.png](attachment:image.png)

## Kafka Preparation

In [2]:
#!docker exec -it broker kafka-topics --create --topic hive --bootstrap-server localhost:9092 \
#    --partitions 3 --replication-factor 1

In [3]:
#!docker exec -it broker kafka-topics --list --bootstrap-server localhost:9092

![image.png](attachment:image.png)