O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
-
Updated
Jun 26, 2023 - Python
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Sentiment Analysis and Data Visualization
Ophelia a PySpark analytics wrapper.
Evaluates the execution time differences between RDD (Resilient Distributed Datasets) and DataFrame data structures in Apache Spark. Also takes into account the file format being used, such as CSV or Parquet.
All in one
PySpark RDD and DataFrame Examples
Streaming data in Spark and doing data analytics
PageRank - Pig vs PySpark comparison https://madoc.univ-nantes.fr/mod/assign/view.php?id=1511791
Project: Spark SQL & DataFrames - Course: Advanced Topics in Databases (9th semester) NTUA
[ECE NTUA] Advanced Topics in Databases - Course project (2022-2023)
Solved various big data problems using pySpark . Variety of Tranformations and Actions are applied on RDDs and Data-Frames to extract different insights from various Data-Sets which are very huge in file ranging in GBs.
Repo to contain the assignments for DSCI 553: Foundations and Applications of Data Mining course at USC
Add a description, image, and links to the rdd topic page so that developers can more easily learn about it.
To associate your repository with the rdd topic, visit your repo's landing page and select "manage topics."