#

etl-pipeline

Here are 716 public repositories matching this topic...

AlexIoannides / pyspark-example-project

Implementing best practices for PySpark ETL jobs and applications.

python data-science spark etl pyspark data-engineering etl-pipeline etl-job

Updated Jan 1, 2023
Python

Udacity-Data-Engineering-Projects

san089 / Udacity-Data-Engineering-Projects

Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.

Updated Aug 26, 2022
Python

goodreads_etl_pipeline

san089 / goodreads_etl_pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Updated Mar 9, 2020
Python

unstract

Zipstack / unstract

No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents

unstructured-data etl-pipeline llm-platform

Updated Nov 8, 2024
Python

airscholar / e2e-data-engineering

An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.

docker big-data cassandra apache-spark data-storage postgresql data-engineering apache-kafka data-processing data-pipeline real-time-analytics containerization apache-zookeeper apache-airflow etl-pipeline

Updated Oct 5, 2023
Python

ApacheSpark

martandsingh / ApacheSpark

This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.

sql database spark hive hadoop etl pyspark data-engineering spark-streaming data-analysis databricks datalake spark-sql timetravel apachespark etl-pipeline deltalake

Updated Jul 28, 2024
Python

stitchfix / hamilton

A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton

python data-science machine-learning etl numpy pandas data-engineering data-platform software-engineering feature-engineering dataframe dag hamiltonian etl-framework hamilton featurization etl-pipeline stitch-fix

Updated Jul 3, 2023
Python

josephmachado / bitcoinMonitor

Near real time ETL to populate a dashboard.

docker postgres cron docker-compose python3 metabase pytest etl-pipeline nearrealtime

Updated Jun 17, 2024
Python

vim89 / datapipelines-essentials-python

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

python big-data spark apache-spark hadoop etl xml python3 xml-parsing pyspark data-pipeline datalake hadoop-mapreduce spark-sql etl-framework hadoop-hdfs etl-pipeline etl-components

Updated May 6, 2023
Python

conductor-sdk / conductor-python

Conductor OSS SDK for Python programming language

python workflow conductor data-pipelines etl-pipeline durable-execution durable-computing

Updated Oct 3, 2024
Python

Indexical-Metrics-Measure-Advisory / watchmen-matryoshka-doll

Watchmen Platform is a low code data platform for data pipeline, meta data management , analysis, and quality management

visualization charts pipeline data-visualization data-pipeline etl-pipeline data-quality-monitoring

Updated Apr 28, 2022
Python

jira-database-etl

toddbirchard / jira-database-etl

🚹 💾 Script to import issues from a JIRA instance into a database.

flask etl pandas python3 jira-rest-api flask-sqlalchemy etl-pipeline

Updated Dec 8, 2022
Python

stellar / stellar-etl-airflow

Airflow DAGs for the Stellar ETL project

python airflow blockchain data-analysis stellar stellar-network etl-framework etl-pipeline stellar-lumens

Updated Nov 8, 2024
Python

danielsaban / data-scraping-sofascore

Data Engineering/Scraping Project. Creating a detailed Sports Relational Database for the Top European Soccer Leagues.

parser rest-api relational-databases webscraping datamining dataengineering etl-pipeline

Updated May 19, 2021
Python

xzZero / DataEng_IBM

Solution for IBM Data Engineer Professional Certificate

sql database nosql data-warehouse data-engineering spark-streaming etl-pipeline

Updated Nov 27, 2022
Python

sanjeevai / disaster-response-pipeline

ETL pipeline combined with supervised learning and grid search to classify text messages sent during a disaster event

sqlite-database supervised-learning grid-search-hyperparameters etl-pipeline data-engineering-pipeline disaster-event

Updated Feb 24, 2019
Python

pran4ajith / spark-twitter-streaming

A real-time streaming ETL pipeline for streaming and performing sentiment analysis on Twitter data using Apache Kafka, Apache Spark and Delta Lake.

python nlp machine-learning big-data spark sentiment-analysis etl-pipeline

Updated Aug 8, 2020
Python

codingforentrepreneurs / Serverless-Python-Workflow-with-AWS-Lambda

A tutorial to setup and deploy a simple Serverless Python workflow with REST API endpoints in AWS Lambda.

python aws data-science aws-lambda serverless etl webscraping etl-pipeline

Updated Apr 22, 2020
Python

kr900910 / mortgage_data_analysis

ETL process which downloads, transforms, and loads Freddie Mac/Fannie Mae mortgage data

python hive shell-script tableau etl-pipeline mortgage-data-analysis fannie-mae fraddie-mac

Updated Dec 13, 2017
Python

okzapradhana / etl-flatfile-airflow

Building Data Warehouse on BigQuery which takes flat file as the data sources with Airflow as the Orchestrator

docker bigquery airflow etl docker-compose google-cloud-storage data-warehouse airflow-docker etl-pipeline

Updated May 23, 2021
Python

Improve this page

Add a description, image, and links to the etl-pipeline topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the etl-pipeline topic, visit your repo's landing page and select "manage topics."