Code for "Efficient Data Processing in Spark" Course
-
Updated
May 15, 2024 - Python
Code for "Efficient Data Processing in Spark" Course
Repository of notebooks and related collateral used in the Databricks Demo Hub, showing how to use Databricks, Delta Lake, MLflow, and more.
Pyspark Notebook With Docker
The project aims to process Formula 1 racing data, create an automated data pipeline, and make the data available for presentation and analysis purposes.
Continuous Delivery tool for PySpark Notebooks based jobs on Databricks
Loading different types of dataset files using Flume and pyspark
Pyspark RDD, DataFrame and Dataset Examples in Python language
Automate Amazon EMR clusters using Lambda for streamlined and scalable data processing workflows. Unlock the full potential of your data pipeline with LambdaEMR Automator.
An anime recommendation engine that allows us to recommend anime based on a given anime title or a given user using Pyspark
A simple tool to compare new data to historical records. It will tag rows accordingly as duplicate or NULL. The team of interns I was in designed this tool using PySpark and Jupyter Notebook in Microsoft Fabric as a practice exercise within Lexmark Research and Development Corporation's Digital Transformation program.
spark247-jupyter-dockerized
Scaling sentiment analysis with AWS Glue and Amazon Comprehend.
Add a description, image, and links to the pyspark-notebook topic page so that developers can more easily learn about it.
To associate your repository with the pyspark-notebook topic, visit your repo's landing page and select "manage topics."