BigQuery data pipeline with dbt, Spark, Docker, Airflow, Terraform, GCP
-
Updated
Feb 6, 2023 - Python
Google BigQuery enables companies to handle large amounts of data without having to manage infrastructure. Google’s documentation describes it as a « serverless architecture (that) lets you use SQL queries to answer your organization's biggest questions with zero infrastructure management. BigQuery's scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes. » Its client libraries allow the use of widely known languages such as Python, Java, JavaScript, and Go. Federated queries are also supported, making it flexible to read data from external sources.
📖 A highly rated canonical book on it is « Google BigQuery: The Definitive Guide », a comprehensive reference.
Another enriching read on the subject is the inside story told in the article by the founding product manager of BigQuery celebrating its 10th anniversary.
BigQuery data pipeline with dbt, Spark, Docker, Airflow, Terraform, GCP
Analysis of NYC's citibike data. Technologies: Python , Prefect, dbt, Terraform , Looker data studio
Entire ETL pipeline project from data ingestion, transformation and finally analytics with Google Looker Studio
Use SQL in Google BigQuery against the Google Analytics data set to write and execute queries to find the desired data for the purpose of answering business questions
Java v11 ⋅ Spring v2 ⋅ Gradle ⋅ BigQuery
Runs queries on the 59 million records in the BigQuery public dataset New York Citibike, in addition to making data visualizations on Google Cloud Platform (GCP), using Cloud SQL (MySQL), Vertex AI, Cloud Shell, and Cloud Storage buckets in Google Cloud Platform (GCP).
A python script for upload Online File to Google Cloud Storage (GCS) built on Docker
The Apache Beam program which reads nginx access logs from Google Cloud Pub/Sub, parses them, and saves into BigQuery.
This project aims to migrate data from MongoDB to Google Cloud Storage (GCS) and BigQuery automatically. It enables businesses to easily transfer and analyze the data in the cloud, improving data and cost management.
This project focuses on creating a pipelin in GCP for the LA parking citations website.
Simple service for bigquery streaming using google pub-sub
Released May 19, 2010