Module 22 challenge: Using Google Colab to work on Big Data queries with PySpark SQL, parquet, and cache partitions
-
Updated
Jun 1, 2024 - Jupyter Notebook
Module 22 challenge: Using Google Colab to work on Big Data queries with PySpark SQL, parquet, and cache partitions
This script builds a linear regression model using PySpark to predict student admissions at Unicorn University.
Utilizing Apache Spark & PySpark to analyze a movie dataset. Tasks include data exploration, identifying top-rated movies, training a linear regression model, and experimenting with Airflow.
Big data management with PySpark
PySpark House Price Prediction features a PySpark-based Linear Regression model for predicting median house prices. It showcases data preprocessing, model training, and evaluation, yielding an RMSE of around 0.11. The code offers insights into building robust predictive models using PySpark.
Working with pyspark module in python and using google colab environment in order to apply some queries to the dataset. The dataset consist of two csv files listening.csv and genre.csv. Also, visualizing query results using matplotlib.
Inventory value is also important for determining a company's liquidity, or its ability to meet its short-term financial obligations. A high inventory value can indicate that a company has too much money tied up in inventory, which could make it difficult for the company to pay its bills.
Nifi - Kafka - Pyspark merupakan sarana belajar saya untuk mengeksplorasi lebih dalam terkait penggunaan tools tersebut
Data analysis project with Pyspark on Jupyter Notebook
The notebook shows how tools of the PySpark SQL module work in practice.
Worked on Pyspark file streaming
twitter real-time sentiment analysis
Project based on application of azure databricks
Repositorio para realizar el curso en Udemy llamado "Airflow2.0 De 0 a Héroe", de la academia "Datapath".
Problems on Hadoop-MapReduce, Hive and PySparkSQL
All updated cheat sheets regarding data science, data analysis provided by Datacamp are here. These cheat sheets cover quick reads on Machine Learning, Deep Learning, Python, R, SQL and more. Perfect cheat sheets when you want to revise some topics in less time.
Add a description, image, and links to the pyspark-sql topic page so that developers can more easily learn about it.
To associate your repository with the pyspark-sql topic, visit your repo's landing page and select "manage topics."