A new weekend side-project. It's a great one (at least one I should have started some time ago).
We all know the whole area of data (analytics, tracking) engineering is evolving so fast. And you can find new products, services and platforms literally every week (it's basically sufficient to listen to the weekly episode of Tobias Macey's Data Engineering Podcast ).
And I want to try them all! But how or where? Waiting for the next setup project. Nah.
I build my own data sandbox starting this weekend. And I will use it mostly for https://deepskydata.com, https://datatasks.dev and https://datatasks.dev and basically all my business operations.
And I want to do it event-driven from the start to the end.
Avo (https://www.avo.app/) - Data sourcing planning (aka Tracking, Measurement plan)
Google Analytics 4 - yes, still --- I need it to test some things for most of my client projects
elbwalker (https://www.elbwalker.com/) - Privacy-focus analytics and a great new tool for testing sourcing
StreamProcessor (https://streamprocessor.io) - Stream pipelines and real-time analytics
rudderstack (https://rudderstack.com/) - test case for Segment replacements
Klaro (https://heyklaro.com/) - the only open source consent management I know of
GTM - well, yeah, client and server-side - mostly for testing funky things for clients
Airbyte (https://airbyte.io/) - love to see how this works when self-hosted
BigQuery - I love to think not about storage - but there will be also some Firebolt or Clickhouse testing in the future
dbt Labs (https://www.getdbt.com/) - no brainer
Activity Schema - https://www.activityschema.com/ - the brain child of Ahmed Elsamadisi - so keen to play around with this idea
Datahub - https://github.com/linkedin/datahub - pretty long already on my list for testing. Curious about the integration part.
Castor - https://www.castordoc.com/ - so curious to test out the automatic sources schemas and the magic automations
Soda - https://www.soda.io/ - I really want to spent more time on setting up and learning about test-driven data modelling.
Great Expectations - https://greatexpectations.io/ - same here, pretty long already on my list
Lightdash - https://www.lightdash.com/ - hear about it on the podcast - really curious to see the deep dbt integration and the metrics layer
Preset - https://preset.io/ - always had 'setting up a superset' instance on my list, never made it, now I can simply signup
Census - https://www.getcensus.com/ - I know it already from a different project but never really tested it out (especially the analytics data enrichment part).
I will try to write a bit about my experiences and learnings (so just follow me).
Maybe you like to also create a data sandbox for yourself - So do it!
Want to follow my learnings and update? Follow me on LinkedIn: https://www.linkedin.com/in/timo-dechau/
I am doing all the different steps also as educational videos in my Youtube Channel: https://www.youtube.com/channel/UCQSHdIS2YdFa6wleYHjK6Jw
01 - How to create a Metric Framework https://www.youtube.com/watch?v=oDImXY8J4Oc