Skip to content
This repository has been archived by the owner on May 3, 2024. It is now read-only.

Join MySQL, Kafka and Minio data in PrestoDB using Docker

Notifications You must be signed in to change notification settings

wheresalice/presto-mysql-kafka-minio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Presto with MySQL, Minio and Kafka

Presto, MySQL, Minio and Kafka in Docker. Query your MySQL data and join it to Kafka and Minio data.

This repository includes docker-compose setup to join MySQL, Minio and Kafka data using Presto, along with some notes on how to load the data and perform the queries. It is deliberately not fully automated to guide the user through performing this.

img/Presto.png

Usage

Launch everything (Presto, Zookeeper, Kafka, MySQL, Minio):

docker-compose up

Get access to MySQL to load some data (./data is mounted in /tmp/data):

docker-compose exec mysql mysql -uuser -ppassword wheresalice

Load the data:

source /tmp/data/load.sql

Load some data into Kafka:

docker-compose exec kafka /bin/bash
curl -o kafka-tpch https://repo1.maven.org/maven2/de/softwareforge/kafka_tpch_0811/1.0/kafka_tpch_0811-1.0.sh
chmod 755 kafka-tpch
./kafka-tpch load --brokers localhost:9092 --prefix tpch. --tpch-type tiny
exit

Get access to Presto:

docker-compose exec presto presto

Query MySQL data in Presto:

use mysql.wheresalice;
show tables;

Query Kafka data in Presto:

SELECT _message FROM customer LIMIT 5;
SELECT sum(account_balance) FROM kafka.tpch.customer LIMIT 10;

Join the two together:

SELECT customer.account_balance, contacts.email FROM kafka.tpch.customer, mysql.wheresalice.contacts contacts WHERE customer.customer_key = contacts.customer_key;

View what's happening through the Presto UI: http://localhost:8080/ui/

Minio/S3

Minio is included in this stack to mock out S3. It currently takes a little manual configuration to use.

docker-compose exec minio /bin/sh
mkdir -p /data/catalog/ && mkdir -p /data/csvdata
echo "alice@example.com,alice" > /data/csvdata/data.csv
exit

Then create the table in Presto shell to query:

create schema s3.default;
create table s3.default.users (email varchar, username varchar) WITH (external_location='s3a://csvdata/',format = 'csv');
select * from s3.default.users;
SELECT users.username, contacts.customer_key FROM s3.default.users, mysql.wheresalice.contacts WHERE users.email = contacts.email;

You can also upload data into Minio using a web browser via http://localhost:9000

Access Key: minio Secret Key: minio123

Known Issues

  • The data in Kafka has to be in JSON or plain Avro to be able to parse it in Presto. There is not currently any support for Confluent Avro with Schema Registry.

Further Reading

About

Join MySQL, Kafka and Minio data in PrestoDB using Docker

Resources

Stars

Watchers

Forks