Apache Spark™ and Scala Workshops
-
Updated
Jul 29, 2024 - HTML
Apache Spark™ and Scala Workshops
Provides a cherrypy dashboard to bind access to Spark SQL.
Code from Ralph Winters book Packt/Practical-Predictive-Analytics
Big Data Analytics for anazon.com using Spark Framework and Scala Programming Language
Music prediction using PySpark
Ralph Winters Website
End-to-end real-time credit card transactions application. Made with Kafka, Spark, Bootstrap, ECharts, RxJS.
Recommendation System written in Python, using the pySpark framework and other Data Science libraries
This project extracts list, information and statistics from Wikipedia articles of current and past NBA players. I used Spark SQL to extract information from html documents and save it to a csv file. In the nearby future, I will post the same objective achieved using Pig
Capstone Project in the Udacity Data Scientist Nanodegree program. We manipulate large and realistic datasets with Spark to engineer relevant features for predicting churn. We'll learn how to use Spark MLlib to build machine learning models with large datasets, far beyond what could be done with non-distributed technologies like scikit-learn.
Analysis for a streaming daily retail data using Spark structured streaming and querying this data to get insights
An investigatory analysis of restaurant sales data using Apache Spark in an attempt to give some insights as to how to boost up the sales of less frequently sold items. This is a real-world dataset from an actual restaurant.
Exploring World Development Indicators: Identifying relationship between Health Indicators using Linear Regression & Classification of Income Group based on Health Indicators using Logistic Regression.
This project will show an auto-updated map with the people interaction during COVID19 in the US using big data technologies to analysis a real-time stream of Twitter data.
Created a SparkML RandomForest model to predict total employee compensation. Queried data with SparkSQL, ran PySpark scripts to run EDA, pre-process data, and train model achieving with 0.98 R2 score.
Add a description, image, and links to the spark-sql topic page so that developers can more easily learn about it.
To associate your repository with the spark-sql topic, visit your repo's landing page and select "manage topics."