emr-cluster

Here are 35 public repositories matching this topic...

JennaFar / elastic-data-factory

Elastic Data Factory

aws data-science machine-learning sql presto deployment athena data-acquisition data-visualization pyspark data-processing emr-cluster sagemaker sagemaker-deployment

Updated Oct 26, 2023
Python

jashshah-dev / AWS-Big-Data-Pipeline-orchestrated-with-Airflow

Star

A robust data pipeline leveraging Amazon EMR and PySpark, orchestrated seamlessly with Apache Airflow for efficient batch processing

distributed-computing snowflake pyspark amazon-s3 emr-cluster airflow-dags transient-cluster

Updated Jan 1, 2024
Python

nogueira-ric / emr-6.4-spark-3.1.2

Star

AWS EMR 6.4 - Spark 3.1.2 - Python3.7.5

spark emr-cluster

Updated Feb 26, 2022
Python

jashshah-dev / Automating-EMR-Cluster-using-AWS-Lambda

Star

Automate Amazon EMR clusters using Lambda for streamlined and scalable data processing workflows. Unlock the full potential of your data pipeline with LambdaEMR Automator.

lambda-functions pyspark boto3 pyspark-notebook emr-cluster transient-cluster

Updated Jan 1, 2024
Python

Tanay0510 / Data-Lake-with-Spark

Star

Load data from S3, process the data into analytics tables using Spark and load them back into S3. Deployed this Spark process on a cluster using AWS EMR

spark s3 datalake emr-cluster etl-pipeline

Updated Aug 17, 2021
Python

jpsalado92 / Udacity-DEND_DataLake-AWSEMR

Star

Full code for UDACITY's Data Engineer Nano Degree project. Implementing a Data Lake in Amazon's cloud with AWS S3, AWS EMR and Spark.

s3-bucket data-warehouse aws-emr data-lake emr-cluster

Updated Jul 22, 2020
Python

carlossanchezvega / twitter

Star

This repository aims to capture and clean data from the twitter API in order to perform a sentiment analysis on an EMR cluster.

aws cloud sentiment-analysis twitter-api emr-cluster

Updated Dec 30, 2017
Python

AmandaJunqueira / BigData

Star

Sentiment Analysis using Common Crawl data

emr ec2 ems emr-cluster

Updated May 5, 2020
Python

HarshadRanganathan / aws-emr-launcher

Star

Generic python library that enables to provision emr clusters with yaml config files (Configuration as Code)

aws aws-emr emr-cluster

Updated Dec 8, 2022
Python

BrightEmah123 / emr-on-airflow-toolkit

Star

A template for creating Amazon EMR clusters using either Amazon MWAA or a Dockerized Airflow Container as a workflow environment

aws airflow emr-cluster mwaa

Updated Aug 10, 2021
Python

a-Imantha / Mahout-Tutorial

Star

Building a Recommender with Apache Mahout on Amazon Elastic MapReduce (EMR) Tutorial

emr s3-bucket mahout hdfs awscli emr-cluster

Updated Mar 20, 2021
Python

tejaskenjale / Wine-quality-prediction-aws

Star

Implementation of Random Forest algorithm using pyspark on AWS to classify the wines and deployment on Docker Container.

docker random-forest aws-s3 aws-ec2 emr-cluster

Updated Dec 11, 2022
Python

nileshsingal / PUBG-DATA-ANALYSIS

Star

Player Unknown's Battlegrounds (PUBG), is a first person shooter game where the goal is to be the last player standing. You are placed on a giant circular map that shrinks as the game goes on, and you must find weapons, armor, and other supplies in order to kill other players / teams and survive.

spark hive aws-lambda api-gateway bigdata s3-bucket tableau aws-cloudformation emr-cluster

Updated Jan 27, 2021
Python

darkhipo / emr-example

Star

running zeppelin on EMR and launching tasks on it with task runner.

emr diff merge hdfs emr-cluster

Updated Dec 28, 2018
Python

deepakag5 / Cloud-Computing-AWS

Star

Cloud Computing Tutorials for AWS

s3-bucket load-balancer vpc hadoop-cluster aws-rds disaster-recovery hadoop-streaming rds-database iam-users emr-cluster

Updated Nov 14, 2019
Python

JohnnyLVP / Project-Standar-Documentation

Star

This repository contains a definition of standar structure for Machine Learning and Data Pipelines Projects

python emr aws documentation machine-learning ec2 project pyspark standard redshift boto3 emr-cluster

Updated Apr 28, 2020
Python

JevyanJ / emr-helper

Star

The EMR Helper library tries to help when setting up and managing an EMR cluster.

python emr aws emr-cluster

Updated Sep 2, 2020
Python

donjude / data-lakes-with-spark

Star

This project is about building a data lake and creating an ETL pipeline in Spark that loads data from Amazon S3, processes the data into analytics tables, and loads them back into S3

python spark apache-spark hadoop ec2 s3 aws-cli hdfs mapreduce amazon-web-services datalake aws-athena spark-sql emr-cluster etl-pipeline

Updated Jun 15, 2021
Python

arfatmateen / Data_Lake_and_ETL_Pipeline_on_AWS_using_Spark

Star

Database Schema & ETL pipeline for Song Play Analysis | Bosch AI Talent Accelerator Scholarship Program

python aws sql jupyter-notebook s3-bucket pyspark emr-cluster etl-pipeline

Updated Sep 18, 2022
Python

longNguyen010203 / Spark-Processing-AWS

Star

👷🌇 Set up and build a big data processing pipeline with Apache Spark, 📦 AWS services (S3, EMR, EC2, IAM, VPC, Redshift) Terraform to setup the infrastructure and Integration Airflow to automate workflows🥊

aws apache-spark terraform aws-s3 iam pyspark cloud-computing aws-ec2 redshift data-pipeline aws-services apache-airflow emr-cluster spark-cluster spark-master spark-worker

Updated Jul 7, 2024
Python

Improve this page

Add a description, image, and links to the emr-cluster topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the emr-cluster topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

emr-cluster

Here are 35 public repositories matching this topic...

JennaFar / elastic-data-factory

jashshah-dev / AWS-Big-Data-Pipeline-orchestrated-with-Airflow

nogueira-ric / emr-6.4-spark-3.1.2

jashshah-dev / Automating-EMR-Cluster-using-AWS-Lambda

Tanay0510 / Data-Lake-with-Spark

jpsalado92 / Udacity-DEND_DataLake-AWSEMR

carlossanchezvega / twitter

AmandaJunqueira / BigData

HarshadRanganathan / aws-emr-launcher

BrightEmah123 / emr-on-airflow-toolkit

a-Imantha / Mahout-Tutorial

tejaskenjale / Wine-quality-prediction-aws

nileshsingal / PUBG-DATA-ANALYSIS

darkhipo / emr-example

deepakag5 / Cloud-Computing-AWS

JohnnyLVP / Project-Standar-Documentation

JevyanJ / emr-helper

donjude / data-lakes-with-spark

arfatmateen / Data_Lake_and_ETL_Pipeline_on_AWS_using_Spark

longNguyen010203 / Spark-Processing-AWS

Improve this page

Add this topic to your repo