Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for Apache Airflow (MWAA) on AWS.
-
Updated
Jul 6, 2022 - Python
Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for Apache Airflow (MWAA) on AWS.
Project files for the post: Running PySpark Applications on Amazon EMR: Methods for Interacting with PySpark on Amazon Elastic MapReduce.
A command-line interface for packaging, deploying, and running your EMR Serverless Spark jobs
Sample CI/CD pipeline for using GitHub Actions with Amazon EMR Serverless Spark.
📓 Repository/Tutorial for initiallizing Jupyter Notebook and Spark cluster on Amazon EMR
Samples related to data engineering, e.g. spark, embulk, airflow, etc.
Project files for the post: Installing Apache Superset on Amazon EMR: Add data exploration and visualization to your analytics cluster.
Udacity Data Engineering Nanodegree Program
Building Data Lake and ETL pipelines using Amazon EMR, S3, and Apache Spark
Used Amazon's Elastic MapReduce to rank the top 20 nodes based on PageRank of graphs with over 100,000 nodes http://courses.cms.caltech.edu/cs144/homeworks/rankmaniac.pdf
Udacity Data Engineering Capstone project
Unofficial Ansible module for Amazon EMR
Add a description, image, and links to the amazon-emr topic page so that developers can more easily learn about it.
To associate your repository with the amazon-emr topic, visit your repo's landing page and select "manage topics."