Skip to content

A dockerized Confluent Kafka Cluster running on AWS EC2 instances that uses the Spooldir Connector(https://github.com/jcustenborder/kafka-connect-spooldir) to spool a directory, Avro serialize the data using the Schema Registry and publish to Kafka Brokers. This project also makes the Confluent Control Center available for visualization.

Notifications You must be signed in to change notification settings

tigstep/kafka_connect_spooldir_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kafka_connect_spooldir_pipeline

Diagram

alt text

Requirments

Tools/Services Used

  • Terraform
  • Ansible
  • Apache Maven
  • AWS EC2
  • Confluent Kafka
    • Zookeeper
    • Kafka Broker
    • Schema Registry
    • Control Center
    • Kafka Connect Worker
  • Docker
  • Kafka-Connect-Spooldir Connector

Short Description

A dockerized Confluent Kafka Cluster running on AWS EC2 instances that uses the Spooldir Connector(https://github.com/jcustenborder/kafka-connect-spooldir) to spool a directory, Avro serialize the data using the Schema Registry and publish to Kafka Brokers. This project also makes the Confluent Control Center available for visualization.

Process Description

This project uses two different infrasctructure managment tools to prepare the infrastructure (Terraform and Ansible)

Terraform

  1. Creates a VPC
  2. Creates a Subnet inside that VPC
  3. Defines a Security Group for later use
  4. Defines an Internet Gatway and Configures the routing to use the Internet Gateway
  5. Spins up ec2 instances inside the Subnet created above behind the defined Security Group

Ansible

  1. Installs Docker on the above EC2 instances
  2. Starts the a Zookeeper Container using Confluent's Zookeeper Image
  3. Starts the a Kafka Container using Confluent's Enterprise Kafka Image
  4. On one of the EC2 instances it starts a Schema Registry Container using Confluent's Schema Registry Image
  5. On one of the EC2 instances it starts a Control Center Container using Confluent's Enterprise Control Center Image
  6. On one of the EC2 instances it starts a Kafka Connect Container using Confluent's Kafka Connect Image and adds Kafka_Connect_Spooldir connector in it's path

To Do

  • Modiy Kafka to AutoEnable the topic creation
  • Make Kafka Connect Topic Visible in Control Center
  • [Not a High Priority] Switch to dynamic ec2 ami lookup

Execution

In order to execute, issue
  1. terraform apply -var-file=variables.tfvars && terraform output ec2_ips > output.txt
  2. ansible-playbook ansible.yml
The above 2 steps should provide a fully functional pipeline that will take a CSV files as an input, avro serialize it and publish to a topic.

Test run

In order to test the pipeline, Copy the file provided to /tmp/.. location on the EC2 instance where the Kafka Connect Container is running. You should be able to see the topic populated using Either the Control Center or kafka consumer

Observations

  • It would be very useful to have a central repository of kafka connect connectors that can be accessed programaticaly to download and place the connectors onto connect worker's path

Warnings

  • Current configuration of this project will be using AWS services that are beyond the Free Tier!

About

A dockerized Confluent Kafka Cluster running on AWS EC2 instances that uses the Spooldir Connector(https://github.com/jcustenborder/kafka-connect-spooldir) to spool a directory, Avro serialize the data using the Schema Registry and publish to Kafka Brokers. This project also makes the Confluent Control Center available for visualization.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages