Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
-
Updated
May 19, 2021 - Scala
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .
Various data stream/batch process demo with Apache Scala Spark 🚀
Repository for Spark structured streaming use case implementations.
Big Data - Split a large CSV file into N smaller ones and save them into the local disk
Spark BigQuery Parallel
make easier the use of columnar spark files
Calculate user sessions & stats on top of them for imaginary ecom site using Spark sql & aggregations
Add a description, image, and links to the spark-dataframes topic page so that developers can more easily learn about it.
To associate your repository with the spark-dataframes topic, visit your repo's landing page and select "manage topics."