This repo contains PySpark codes
-
Updated
Feb 15, 2025 - Jupyter Notebook
This repo contains PySpark codes
A simulated Kafka data pipeline that generates fake customer and order data, processes it through Kafka, and stores it in PostgreSQL for real-time analysis with PySpark. Includes Kafdrop UI for monitoring. 🚀
Leveraging NYC Open Data, this repository contains Databricks notebooks for analyzing motor vehicle collisions. We perform EDA, spatial clustering, and predictive modeling on collision, vehicle, and person datasets to understand accident trends and predict potential risks.
Report on Advanced Database NTUA 2024-25
Sample pyspark Notebook
This project demonstrates how to perform Exploratory Data Analysis (EDA) on the Netflix dataset using PySpark in a Jupyter Notebook environment. It involves setting up Spark, loading a dataset, performing basic data cleaning, and visualizing the results. All of it is runnning on a container in Docker.
This repository contains about data analytics & data warehouse project from bike store
Tabela calendário para lakehouse Fabric a partir do notebook spark
References for building custom IDEs
PySpark complete tutorial
Code for "Efficient Data Processing in Spark" Course
Big Data: Spark Lab and ClickHouse Lab Solutions
📈📊 Big Data Notebooks . ▫️ Análisis masivos de datos con pyspark ▫️ Ingesta de datos. ▫️ Algoritmos de machine learning con datos masivos. ▫️ Procesamiento de mensajes en tiempo real con Kafka.
This repo is for the Structured Streaming and Projects
This repo is built to learn and practice databricks and PySpark. This is the practice repo for databricks Data Engineering Associate Certification
APACHE SPARK: Data Analysis, Transformation, and Visualisation with PySpark, IPL Data Analysis
spark247-jupyter-dockerized
AEMO Aggregated price and demand data
Tracking Tweet sentiment at scale using a pretrained Huggingface transformer (classifier) Model
Add a description, image, and links to the pyspark-notebook topic page so that developers can more easily learn about it.
To associate your repository with the pyspark-notebook topic, visit your repo's landing page and select "manage topics."