Skip to content

Data Engineering pet-project covering GCP, Docker, workflow orchestration with Mage, data transforming with dbt, batch processing via Spark and data streaming using Kafka

Notifications You must be signed in to change notification settings

yelzha/data-engineering-zoomcamp

Repository files navigation

Data Engineering Zoomcamp

Syllabus

2024 Cohort

Syllabus

Note: NYC TLC changed the format of the data we use to parquet. In the course we still use the CSV files accessible here.

  • Course overview
  • Introduction to GCP
  • Docker and docker-compose
  • Running Postgres locally with Docker
  • Setting up infrastructure on GCP with Terraform
  • Preparing the environment for the course
  • Homework

More details

  • Data Lake
  • Workflow orchestration
  • Workflow orchestration with Mage
  • Homework

More details

More details

  • Data Warehouse
  • BigQuery
  • Partitioning and clustering
  • BigQuery best practices
  • Internals of BigQuery
  • BigQuery Machine Learning

More details

  • Basics of analytics engineering
  • dbt (data build tool)
  • BigQuery and dbt
  • Postgres and dbt
  • dbt models
  • Testing and documenting
  • Deployment to the cloud and locally
  • Visualizing the data with google data studio and metabase

More details

  • Batch processing
  • What is Spark
  • Spark Dataframes
  • Spark SQL
  • Internals: GroupBy and joins

More details

  • Introduction to Kafka
  • Schemas (avro)
  • Kafka Streams
  • Kafka Connect and KSQL

More details

More details

Putting everything we learned to practice

  • Week 1 and 2: working on your project
  • Week 3: reviewing your peers

More details

About

Data Engineering pet-project covering GCP, Docker, workflow orchestration with Mage, data transforming with dbt, batch processing via Spark and data streaming using Kafka

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages