etl-job

Built a Data Pipeline for a Retail store using AWS services that collects data from its transactional database (OLTP) in Snowflake and transforms the raw data (ETL process) using Apache spark to meet business requirements and also enables Data Analyst create Data Visualization using Superset. Airflow is used to orchestrate the pipeline

python docker airflow spark snowflake ec2-instance etl-pipeline etl-job

Updated May 25, 2023
Python

san089 / airflow-training

Star

Introduction to the data pipeline management with Airflow. Airflow schedule and maintain numerous ETL processes running on a large scale Enterprise Data Warehouse.

airflow scheduler scheduling apache etl-framework etl-pipeline etl-job etl-automation

Updated Nov 26, 2018
Python

amantewary / Sentiment-Analysis-of-Tweets-Using-ETL-process-and-Elastic-Search

Star

Sentiment Analysis of Tweets Using ETL process and Elastic Search

elasticsearch sentiment-analysis azure etl-job

Updated Jun 7, 2018
Python

san089 / pyspark-example-project

Star

Example project and best practices for Python-based Spark ETL jobs and applications.

demo apache-spark etl pyspark example-project pyspark-tutorial etl-pipeline etl-job etl-jobs

Updated Nov 15, 2018
Python

mdauthentic / ETLProject-Batch

Star

An ETL pipeline where data is captured from REST API (Remotive, Adzuna & GitHub) and RSS feeds (StackOverflow). The data collected from the API is stored on local disk. The files are preprocessed and ETL jobs are written in spark and scheduled in Prefect to run every week. Transformed data is moved to PostgreSQL.

api json sql etl rest-api python3 data-engineering etl-pipeline etl-job

Updated Jun 13, 2021
Python

obaghirli / PyETL

Star

python 3.5 package for ETL jobs

python python3 pure-python etl-job

Updated Dec 3, 2018
Python

Naz513 / ETLCovid19Project

Star

Event-Driven Python on AWS #CloudGuruChallenge

python aws crawler lambda cloud cloudformation athena dynamodb s3 glue sls etl-pipeline quicksight etl-job serverlerss

Updated Feb 5, 2022
Python

Oguzozcn / Reddit-Data-Pipeline-using-Airflow-and-AWS-S3

Star

This project involves using the Reddit API to extract data, processing it using EC2 instances, and storing the output in CSV format within an AWS S3 bucket, with Airflow managing the overall workflow orchestration.

python airflow reddit ec2 etl s3 data-engineering praw boto3 dag etl-pipeline etl-job

Updated Apr 24, 2023
Python

Iuryck / Fundamentus_API

Star

Code for unofficial API for the brazillian stocks data website called Fundamentus. Uses requests and bs4 for scraping

python docker webscraping flask-api etl-job

Updated Sep 12, 2024
Python

shreeyajoshi2013 / AWS_Data_Engineering_YouTube_Data

Star

Data pipeline using S3, Glue, Athena, Lambda and Quicksight to analyze dataset of YouTube

aws-lambda data-ingestion etl-job

Updated Jan 13, 2023
Python

EssenceSentry / data_download_pipelines

Star

Utilities for declarative specification of data download pipelines for ETL jobs.

etl ftp etl-pipeline etl-job data-downloader

Updated Feb 9, 2023
Python

julientoucoula17 / apache_airflow-with-Docker

Star

Apache Airflow installation with Docker 🌬️

workflow airflow etl apache dag airflow-docker etl-job etl-automation airflow-dags

Updated May 2, 2022
Python

arturogonzalezm / convert_json_to_parquet

Star

ETL (Extract, Transform, Load) job using PySpark - submodule

python apache-spark etl etl-pipeline etl-job pyspark-python python312

Updated May 13, 2024
Python

cdvx / etl-python

Star

This is a command line application to demonstrate a sample ETL pipeline in python. It takes a PostgreSQL dataset that is provided by https://raw.githubusercontent.com/cdvx/etl-python/movies-sql/movielens.sql and transfers the data to MongoDB.

sql python-3 etl-job postgresql-dataset

Updated Aug 10, 2019
Python

miladbehrooz / Dockerized_Data_Pipeline

Star

A data pipeline with Docker to perform Sentiment Analysis on tweets and post it on a slack channel via a bot

python docker sentiment-analysis postgresql mangodb etl-job

Updated Nov 8, 2022
Python

Improve this page

Add a description, image, and links to the etl-job topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the etl-job topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etl-job

Here are 32 public repositories matching this topic...

AlexIoannides / pyspark-example-project

san089 / goodreads_etl_pipeline

ktnsh24 / DataModelling

kishlayjeet / Twitter-Data-Pipeline-using-Airflow-and-AWS-S3

yennanliu / AirflowJob

Joshua-omolewa / Retailstore_ETL_pipeline_project

san089 / airflow-training

amantewary / Sentiment-Analysis-of-Tweets-Using-ETL-process-and-Elastic-Search

san089 / pyspark-example-project

mdauthentic / ETLProject-Batch

obaghirli / PyETL

Naz513 / ETLCovid19Project

Oguzozcn / Reddit-Data-Pipeline-using-Airflow-and-AWS-S3

Iuryck / Fundamentus_API

shreeyajoshi2013 / AWS_Data_Engineering_YouTube_Data

EssenceSentry / data_download_pipelines

julientoucoula17 / apache_airflow-with-Docker

arturogonzalezm / convert_json_to_parquet

cdvx / etl-python

miladbehrooz / Dockerized_Data_Pipeline

Improve this page

Add this topic to your repo