Skip to content

A dataplatform demo with Clickhouse, Kafka, Postgres and Jupyterlab

Notifications You must be signed in to change notification settings

timselier/dataplatform-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data-Platform based on ClickHouse demo

This project demonstrates how to make a dataplatform that is scalable by design. When the volume of data increases, the amount of nodes and partitions / shard can easily be increased.

Design

Architectural overview

Prerequisites:

Installation

Create Kind (K8s In Docker) cluster

# Create cluster with 4 worker nodes
kind create cluster --name kind-dataplatform --config=kind.yaml

# Install nginx ingress controller
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml

Use cluster:

kubectl config use-context kind-kind-dataplatform 

Install operators

A Kubernetes Operator can deploy workloads based on Customer Resource definition that defines it. Updates to the resources will also managed by the operator.

Operators are always installed cluster-wide.

# Install Altinity Clickhouse Operator
kubectl apply -f https://raw.githubusercontent.com/Altinity/clickhouse-operator/master/deploy/operator/clickhouse-operator-install-bundle.yaml

# Install Strimzi Kafka operator
helm repo add strimzi https://strimzi.io/charts/
helm install strimzi-kafka-operator strimzi/strimzi-kafka-operator

# Install CloudNativePG PostgreSQL Operator
kubectl apply -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.20/releases/cnpg-1.20.0.yaml

Custom images

The jupyterlab image is based on the datascience-notebook. It comes with default notebooks and install required dependencies.

docker build ./jupyterlab -t dataplatform-jupyterlab:latest
kind load docker-image dataplatform-jupyterlab:latest --name kind-dataplatform

docker build ./setup-data -t setup-data:latest
kind load docker-image  setup-data:latest --name kind-dataplatform

docker build ./data-generator -t data-generator:latest
kind load docker-image data-generator:latest --name kind-dataplatform

Install or upgrade Data-Platform

This demo in contained in a HELM chart.

helm dependency build ./dataplatform-chart
helm upgrade --install dataplatform ./dataplatform-chart --set jupyter.image=dataplatform-jupyterlab:latest

About

A dataplatform demo with Clickhouse, Kafka, Postgres and Jupyterlab

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages