Elastic Data Factory
-
Updated
Oct 26, 2023 - Python
Elastic Data Factory
A robust data pipeline leveraging Amazon EMR and PySpark, orchestrated seamlessly with Apache Airflow for efficient batch processing
Automate Amazon EMR clusters using Lambda for streamlined and scalable data processing workflows. Unlock the full potential of your data pipeline with LambdaEMR Automator.
Load data from S3, process the data into analytics tables using Spark and load them back into S3. Deployed this Spark process on a cluster using AWS EMR
Full code for UDACITY's Data Engineer Nano Degree project. Implementing a Data Lake in Amazon's cloud with AWS S3, AWS EMR and Spark.
This repository aims to capture and clean data from the twitter API in order to perform a sentiment analysis on an EMR cluster.
Generic python library that enables to provision emr clusters with yaml config files (Configuration as Code)
A template for creating Amazon EMR clusters using either Amazon MWAA or a Dockerized Airflow Container as a workflow environment
Implementation of Random Forest algorithm using pyspark on AWS to classify the wines and deployment on Docker Container.
Player Unknown's Battlegrounds (PUBG), is a first person shooter game where the goal is to be the last player standing. You are placed on a giant circular map that shrinks as the game goes on, and you must find weapons, armor, and other supplies in order to kill other players / teams and survive.
running zeppelin on EMR and launching tasks on it with task runner.
Cloud Computing Tutorials for AWS
This repository contains a definition of standar structure for Machine Learning and Data Pipelines Projects
The EMR Helper library tries to help when setting up and managing an EMR cluster.
This project is about building a data lake and creating an ETL pipeline in Spark that loads data from Amazon S3, processes the data into analytics tables, and loads them back into S3
Database Schema & ETL pipeline for Song Play Analysis | Bosch AI Talent Accelerator Scholarship Program
👷🌇 Set up and build a big data processing pipeline with Apache Spark, 📦 AWS services (S3, EMR, EC2, IAM, VPC, Redshift) Terraform to setup the infrastructure and Integration Airflow to automate workflows🥊
Add a description, image, and links to the emr-cluster topic page so that developers can more easily learn about it.
To associate your repository with the emr-cluster topic, visit your repo's landing page and select "manage topics."