Skip to content

timodechau/building_event_driven_datastack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data sandbox projects: Building an event-driven data stack

A new weekend side-project. It's a great one (at least one I should have started some time ago).

We all know the whole area of data (analytics, tracking) engineering is evolving so fast. And you can find new products, services and platforms literally every week (it's basically sufficient to listen to the weekly episode of Tobias Macey's Data Engineering Podcast ).

And I want to try them all! But how or where? Waiting for the next setup project. Nah.

I build my own data sandbox starting this weekend. And I will use it mostly for https://deepskydata.com, https://datatasks.dev and https://datatasks.dev and basically all my business operations.

And I want to do it event-driven from the start to the end.

This is my initially planned line-up

Avo (https://www.avo.app/) - Data sourcing planning (aka Tracking, Measurement plan)

Google Analytics 4 - yes, still --- I need it to test some things for most of my client projects

elbwalker (https://www.elbwalker.com/) - Privacy-focus analytics and a great new tool for testing sourcing

StreamProcessor (https://streamprocessor.io) - Stream pipelines and real-time analytics

rudderstack (https://rudderstack.com/) - test case for Segment replacements

Klaro (https://heyklaro.com/) - the only open source consent management I know of

GTM - well, yeah, client and server-side - mostly for testing funky things for clients

Airbyte (https://airbyte.io/) - love to see how this works when self-hosted

BigQuery - I love to think not about storage - but there will be also some Firebolt or Clickhouse testing in the future

dbt Labs (https://www.getdbt.com/) - no brainer

Activity Schema - https://www.activityschema.com/ - the brain child of Ahmed Elsamadisi - so keen to play around with this idea

Datahub - https://github.com/linkedin/datahub - pretty long already on my list for testing. Curious about the integration part.

Castor - https://www.castordoc.com/ - so curious to test out the automatic sources schemas and the magic automations

Soda - https://www.soda.io/ - I really want to spent more time on setting up and learning about test-driven data modelling.

Great Expectations - https://greatexpectations.io/ - same here, pretty long already on my list

Lightdash - https://www.lightdash.com/ - hear about it on the podcast - really curious to see the deep dbt integration and the metrics layer

Preset - https://preset.io/ - always had 'setting up a superset' instance on my list, never made it, now I can simply signup

Census - https://www.getcensus.com/ - I know it already from a different project but never really tested it out (especially the analytics data enrichment part).

I will try to write a bit about my experiences and learnings (so just follow me).

Maybe you like to also create a data sandbox for yourself - So do it!

Want to follow my learnings and update? Follow me on LinkedIn: https://www.linkedin.com/in/timo-dechau/

Video Tutorials

I am doing all the different steps also as educational videos in my Youtube Channel: https://www.youtube.com/channel/UCQSHdIS2YdFa6wleYHjK6Jw

01 - How to create a Metric Framework https://www.youtube.com/watch?v=oDImXY8J4Oc

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages