-
Updated
Sep 15, 2017 - Python
apachespark
Here are 7 public repositories matching this topic...
A published paper in PEARC18: Combining HPC and Big Data Infrastructures in Large-Scale Post-Processing of SimulaBon Data: A Case Study
-
Updated
Jul 23, 2018 - Python
Apache Spark project for Advanced Topics on Databases course
-
Updated
Mar 19, 2021 - Python
Developed a real-time streaming analytics pipeline using Apache Spark to calculate and store KPIs for e-commerce sales data, including total volume of sales, orders per minute, rate of return, and average transaction size. Used Spark Streaming to read data from Kafka, Spark SQL to calculate KPIs, and Spark DataFrame to write KPIs to JSON files.
-
Updated
Oct 15, 2023 - Python
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.
-
Updated
Mar 22, 2024 - Python
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
-
Updated
Jul 28, 2024 - Python
Improve this page
Add a description, image, and links to the apachespark topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the apachespark topic, visit your repo's landing page and select "manage topics."