datalake

This project is about building a data lake and creating an ETL pipeline in Spark that loads data from Amazon S3, processes the data into analytics tables, and loads them back into S3

python spark apache-spark hadoop ec2 s3 aws-cli hdfs mapreduce amazon-web-services datalake aws-athena spark-sql emr-cluster etl-pipeline

Updated Jun 15, 2021
Python

javi-domi / aws-datalake

Star

Datalake on AW

python spark aws-lambda aws-s3 aws-emr datalake etl-pipeline

Updated Oct 18, 2022
Python

neuro-ml / tarn

Star

An insanely customizable framework for key-value storage 💾

python memoization storage cache persistent datalake

Updated Apr 7, 2024
Python

UcheIgbokwe / FormulaOneDataETL

Star

Collection of data on Formula One Racing

python spark databricks datalake azuredatabricks azuredatalakegen2

Updated Dec 21, 2022
Python

mxdara / Data-lake-with-pyspark-in-S3

Star

Specifically, I bulid an ETF pipline to extract their data from S3 and processes them using Spark, and loads the data into a new S3 as a set of dimensional tables.

s3-bucket pyspark datalake

Updated Jun 20, 2023
Python

KirillZhul / de-project-sprint-7

Star

PySpark, DataLake

python airflow pyspark datalake

Updated Dec 19, 2023
Python

OMR5221 / esbi_stream

Star

Application to ingest data into DB from API

api docker cli sqlalchemy docker-compose multiprocessing logging multithreading api-client python3 keyring pyinstaller datalake exe

Updated May 21, 2019
Python

gfelot / DEND-DateLake-Spark

Star

Use of Spark to get data from S3 then wrangle it to make available back in S3 with a better schema

aws spark python3 udacity-nanodegree datalake udacity-data-engineer-nanodegree

Updated Dec 8, 2022
Python

neelriyer / spark-datalake

Star

Spark, EMR, S3, EC2, Python

python emr aws spark ec2 datalake

Updated Apr 21, 2022
Python

brfulu / us-accidents-data-engineering

Star

Udacity Data Engineer Nanodegree - Capstone project

aws airflow spark athena s3 datalake

Updated Dec 19, 2019
Python

ylder / 20230514_historicoBolsasCapes

Star

Coleta, armazenamento e análise de dados históricos das distribuições de bolsas de estudos do CAPES.

python sql vscode jupyter-notebook sqlite3 datalake dataexploration

Updated May 28, 2023
Python

leehuwuj / lake-inspector

Sponsor

Star

Inspect your lakehouse data by using PyArrow

arrow datalake pyarrow lakehouse

Updated Jan 9, 2024
Python

xpertdev / tdameritrade-streaming-deleteme

Star

Streaming order book data from TD Ameritrade API

real-time websocket realtime azure-storage datalake orderbook level1 level2 tdameritrade timesale tda-api

Updated May 2, 2021
Python

postpayio / ness

Star

A Python datalake client.

s3 pandas datalake

Updated Dec 16, 2022
Python

CharlieSergeant / airflow-minio-postgres-fastapi

Star

Sample data store project to be hosted on a remote server or cluster. CICD using GitHub actions for SSH Deploy to remote server for docker compose.

python airflow docker-compose postgresql jupyter-notebook minio traefik datalake selenium-python data-engineering-pipeline fastapi

Updated Jul 25, 2023
Python

pactera-ai / data2lake

Star

a tool to form a lake on AWS from your data

aws data automation datalake

Updated Jan 6, 2023
Python

Tanay0510 / Data-Lake-with-Spark

Star

Load data from S3, process the data into analytics tables using Spark and load them back into S3. Deployed this Spark process on a cluster using AWS EMR

spark s3 datalake emr-cluster etl-pipeline

Updated Aug 17, 2021
Python

Improve this page

Add a description, image, and links to the datalake topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the datalake topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datalake

Here are 66 public repositories matching this topic...

dbbatalha / human-resources-analytics

MalondaClement / DataLake

dd-Splunk / splunk-datalake

donjude / data-lakes-with-spark

javi-domi / aws-datalake

neuro-ml / tarn

UcheIgbokwe / FormulaOneDataETL

mxdara / Data-lake-with-pyspark-in-S3

KirillZhul / de-project-sprint-7

OMR5221 / esbi_stream

gfelot / DEND-DateLake-Spark

neelriyer / spark-datalake

brfulu / us-accidents-data-engineering

ylder / 20230514_historicoBolsasCapes

leehuwuj / lake-inspector

xpertdev / tdameritrade-streaming-deleteme

postpayio / ness

CharlieSergeant / airflow-minio-postgres-fastapi

pactera-ai / data2lake

Tanay0510 / Data-Lake-with-Spark

Improve this page

Add this topic to your repo