apachespark

Here are 7 public repositories matching this topic...

payamrastogi / SparkCourse

Updated Sep 15, 2017
Python

gilga001 / HPCandBigDataPipeline

A published paper in PEARC18: Combining HPC and Big Data Infrastructures in Large-Scale Post-Processing of SimulaBon Data: A Case Study

python simulation hpc bigdata mdtraj postprocessing apachespark

Updated Jul 23, 2018
Python

Thapep / ApacheSpark

Star

Apache Spark project for Advanced Topics on Databases course

databases ntua spark-sql dataframes-api apachespark apachespark-rdd

Updated Mar 19, 2021
Python

urvashiforreal / Retail-Data-Analysis

Star

Developed a real-time streaming analytics pipeline using Apache Spark to calculate and store KPIs for e-commerce sales data, including total volume of sales, orders per minute, rate of return, and average transaction size. Used Spark Streaming to read data from Kafka, Spark SQL to calculate KPIs, and Spark DataFrame to write KPIs to JSON files.

sparksql sparkstreaming apachespark sparkdataframe

Updated Oct 15, 2023
Python

ZeroTwoDataRW / DE-Stream-Project-Random-Generated-User-Data

Star

An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.

python docker airflow kafka cassandra-database apachespark postgesql

Updated Mar 22, 2024
Python

martandsingh / ApacheSpark

Star

This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.

sql database spark hive hadoop etl pyspark data-engineering spark-streaming data-analysis databricks datalake spark-sql timetravel apachespark etl-pipeline deltalake

Updated Jul 28, 2024
Python

ravishankar324 / Washington-state-electric-vehicles-ETL-pipeline

Star

ETL Datapipeline to process Washington's EV data using Apache Spark, Docker, Snowflake, Airflow, AWS services and visualize the transformed parquet data by creating Tableau Dashboards.

python emr docker airflow ec2 s3 iam snowflake pyspark sparksql tableau apachespark

Updated Aug 24, 2024
Python

Improve this page

Add a description, image, and links to the apachespark topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the apachespark topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apachespark

Here are 7 public repositories matching this topic...

payamrastogi / SparkCourse

gilga001 / HPCandBigDataPipeline

Thapep / ApacheSpark

urvashiforreal / Retail-Data-Analysis

ZeroTwoDataRW / DE-Stream-Project-Random-Generated-User-Data

martandsingh / ApacheSpark

ravishankar324 / Washington-state-electric-vehicles-ETL-pipeline

Improve this page

Add this topic to your repo