emr-cluster

Star

Here are 36 public repositories matching this topic...

matbragan / emr-airflow

Star

Developing a Flow with EMR and Airflow

emr aws airflow spark emr-cluster aws-emr-clusters

Updated Aug 7, 2024
Python

Siddhesh19991 / Automate_EMR_ETL_pipeline_using_Airflow

Star

This project provides a detailed overview of creating an automated data engineering pipeline using Airflow, AWS services, Spark, Snowflake and Tableau

python aws airflow snowflake s3-bucket data-engineering ec2-instance tableau emr-cluster etl-pipeline dags

Updated Aug 2, 2024
Python

sowrabh-m / Data_Processing_using_Spark_Flink

Star

This project demonstrates data cleaning, processing with Apache Spark and Apache Flink, both locally and on AWS EMR.

aws spark aws-s3 aws-emr spark-streaming flink flink-stream-processing emr-cluster spark-flink

Updated Jul 20, 2024
Python

longNguyen010203 / Spark-Processing-AWS

Star

👷🌇 Set up and build a big data processing pipeline with Apache Spark, 📦 AWS services (S3, EMR, EC2, IAM, VPC, Redshift) Terraform to setup the infrastructure and Integration Airflow to automate workflows🥊

aws apache-spark terraform aws-s3 iam pyspark cloud-computing aws-ec2 redshift data-pipeline aws-services apache-airflow emr-cluster spark-cluster spark-master spark-worker

Updated Jul 12, 2024
Python

jashshah-dev / Automating-EMR-Cluster-using-AWS-Lambda

Star

Automate Amazon EMR clusters using Lambda for streamlined and scalable data processing workflows. Unlock the full potential of your data pipeline with LambdaEMR Automator.

lambda-functions pyspark boto3 pyspark-notebook emr-cluster transient-cluster

Updated Jan 1, 2024
Python

jashshah-dev / AWS-Big-Data-Pipeline-orchestrated-with-Airflow

Star

A robust data pipeline leveraging Amazon EMR and PySpark, orchestrated seamlessly with Apache Airflow for efficient batch processing

distributed-computing snowflake pyspark amazon-s3 emr-cluster airflow-dags transient-cluster

Updated Jan 1, 2024
Python

airscholar / EMR-for-data-engineers

Star

This project demonstrates the use of Amazon Elastic Map Reduce (EMR) for processing large datasets using Apache Spark. It includes a Spark script for ETL (Extract, Transform, Load) operations, AWS command line instructions for setting up and managing the EMR cluster, and a dataset for testing and demonstration purposes.

aws apache-spark aws-s3 emr-cluster

Updated Nov 12, 2023
Python

JennaFar / elastic-data-factory

Star

Elastic Data Factory

aws data-science machine-learning sql presto deployment athena data-acquisition data-visualization pyspark data-processing emr-cluster sagemaker sagemaker-deployment

Updated Oct 26, 2023
Python

EddieAmaitum / NYC-Yellow-Taxi-DataOps-with-AWS-Analyzing-TLC-Datasets

Star

Performed business operations using Big data technologies: AWS EMR, AWS RDS (MySQL), Hadoop, Apache Scoop, Apache HBase, MapReduce

python aws hadoop hbase linux-shell sqoop mapreduce-jobs rds-database emr-cluster

Updated Sep 20, 2023
Python

jpb111 / AWS-EMR-APACHE-SPARK

Star

Guide: Executing a python script on AWS EMR for big data analysis.

python aws aws-s3 pandas pyspark aws-ec2 emr-cluster emr-serverless

Updated Feb 9, 2023
Python

minhky2185 / healthcare_data_pipeline

Star

An end-to-end data pipeline for building Data Lake and supporting report using Apache Spark.

visualization mysql data big-data spark apache-spark analytics postgresql s3 data-engineering data-lake powerbi emr-cluster spark-cluster data-engineering-pipeline healthcare-data rds-mysql rds-postgres

Updated Jan 31, 2023
Python

tejaskenjale / Wine-quality-prediction-aws

Star

Implementation of Random Forest algorithm using pyspark on AWS to classify the wines and deployment on Docker Container.

docker random-forest aws-s3 aws-ec2 emr-cluster

Updated Dec 11, 2022
Python

HarshadRanganathan / aws-emr-launcher

Star

Generic python library that enables to provision emr clusters with yaml config files (Configuration as Code)

aws aws-emr emr-cluster

Updated Dec 8, 2022
Python

arfatmateen / Data_Lake_and_ETL_Pipeline_on_AWS_using_Spark

Star

Database Schema & ETL pipeline for Song Play Analysis | Bosch AI Talent Accelerator Scholarship Program

python aws sql jupyter-notebook s3-bucket pyspark emr-cluster etl-pipeline

Updated Sep 18, 2022
Python

Wittline / pyspark-on-aws-emr

Sponsor

Star

The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.

python aws big-data spark aws-emr pyspark dataengineering big-data-analytics ec2-spot emr-cluster wordcloud-generator ec2-spot-instances

Updated Jun 13, 2022
Python

nogueira-ric / emr-6.4-spark-3.1.2

Star

AWS EMR 6.4 - Spark 3.1.2 - Python3.7.5

spark emr-cluster

Updated Feb 26, 2022
Python

mathias-mike / Crypto-vs-Economy

Star

Data pipeline for analyzing the effects of economic indicators on cryptocurrencies

airflow spark s3 data-engineering emr-cluster

Updated Feb 1, 2022
Python

mikeacosta / florasense

Star

Orchestrating Cloud ETL Workloads

aws cloudformation apache-spark lambda-functions data-warehouse data-lake kinesis-stream redshift step-functions emr-cluster etl-pipeline redshift-spectrum

Updated Sep 19, 2021
Python

Tanay0510 / Data-Lake-with-Spark

Star

Load data from S3, process the data into analytics tables using Spark and load them back into S3. Deployed this Spark process on a cluster using AWS EMR

spark s3 datalake emr-cluster etl-pipeline

Updated Aug 17, 2021
Python

praveen-gopal-reddy / ETL-Spark-EMR-AWS-MusicData

Star

To implement a data lake using S3 and Spark on an EMR cluster using AWS Cloud9 environment and develop an ETL pipeline for a Data Lake that extracts data from S3, processes the data using Spark, and loads the data back into S3 as a set of dimensional tables.

python bootstrap spark pyspark cloud9 s3-storage datalake emr-cluster

Updated Jul 30, 2021
Python

Improve this page

Add a description, image, and links to the emr-cluster topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the emr-cluster topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

emr-cluster

Here are 36 public repositories matching this topic...

matbragan / emr-airflow

Siddhesh19991 / Automate_EMR_ETL_pipeline_using_Airflow

sowrabh-m / Data_Processing_using_Spark_Flink

longNguyen010203 / Spark-Processing-AWS

jashshah-dev / Automating-EMR-Cluster-using-AWS-Lambda

jashshah-dev / AWS-Big-Data-Pipeline-orchestrated-with-Airflow

airscholar / EMR-for-data-engineers

JennaFar / elastic-data-factory

EddieAmaitum / NYC-Yellow-Taxi-DataOps-with-AWS-Analyzing-TLC-Datasets

jpb111 / AWS-EMR-APACHE-SPARK

minhky2185 / healthcare_data_pipeline

tejaskenjale / Wine-quality-prediction-aws

HarshadRanganathan / aws-emr-launcher

arfatmateen / Data_Lake_and_ETL_Pipeline_on_AWS_using_Spark

Wittline / pyspark-on-aws-emr

nogueira-ric / emr-6.4-spark-3.1.2

mathias-mike / Crypto-vs-Economy

mikeacosta / florasense

Tanay0510 / Data-Lake-with-Spark

praveen-gopal-reddy / ETL-Spark-EMR-AWS-MusicData

Improve this page

Add this topic to your repo