pyspark

Here are 104 public repositories matching this topic...

databrickslabs / automl-toolkit

Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.

scala spark apache-spark ml pyspark machinelearning feature-engineering

Updated Jun 1, 2021
HTML

lbdeoliveira / song-playlist-recommendation

Star

This project was a joint effort by Lucas De Oliveira, Chandrish Ambati, and Anish Mukherjee to create a song and playlist embeddings for recommendations in a distributed fashion using a 1M playlist dataset by Spotify.

music spotify mongodb deep-learning word2vec distributed-computing pyspark recommendation pyspark-mllib

Updated May 18, 2023
HTML

mahmoudparsian / big-data-mapreduce-course

Star

Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University

Updated Oct 11, 2024
HTML

Spratiher9 / Sparkora

Star

Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟

python open-source data opensource apache-spark toolkit exploratory-data-analysis apache eda python3 pyspark data-analytics data-analysis easy-to-use data-analysis-python

Updated Jan 8, 2022
HTML

arjones / bigdata-workshop-es

Star

Workshop Big Data en Español

docker postgres machine-learning scala kafka spark apache-spark postgresql jupyter-notebook superset pyspark pyspark-notebook

Updated Nov 9, 2023
HTML

cassiobolba / Data-Engineering

Star

Projects and studies regarding Data Engineering Area

python git sql lambda-functions pyspark apache-beam gitci

Updated May 27, 2024
HTML

andreichiro / data_engineer_end2end

Star

End-to-end data engineer project

python docker ansible-playbook sql aws-lambda terraform aws-s3 pandas pyspark api-rest databricks aws-api-gateway aws-glue aws-ecs-fargate dbt-cloud vitrinedev

Updated Aug 17, 2023
HTML

airscholar / Japan-visa-data-engineering

Star

This project provides an end-to-end data processing and visualization of visa numbers in Japan using PySpark and Plotly. The spark clusters are set up within a Docker container on Azure.

python docker azure spark-clusters japan pyspark master-worker-architecture

Updated Oct 11, 2023
HTML

prakhar21 / spark-streaming

Star

Twitter Spark Streaming using PySpark

twitter apache pyspark spark-streaming tweepy-api

Updated Jan 23, 2020
HTML

CrossNox / 7506-OD2

Star

Recursos para 7506 (FIUBA)

pandas pyspark fiuba 7506

Updated Feb 14, 2022
HTML

shivankurkapoor / raijin

Star

Movie Recommendation Engine using PySpark

mongodb pyspark recommender-system recommend-movies movie-recommendation-engine

Updated Apr 22, 2018
HTML

YJiangcm / MSBD-5003-project

Star

parallel implementation of hierarchical clustering algorithm based on pyspark

pyspark

Updated Dec 18, 2020
HTML

jpacerqueira-zz / Akamai-log-Analysis-SparkML-H2o

Star

Transformation of Akamai Logs with Spark ETL and discover of Values and similarities in logs used SparkML and H2O ML

h2o sparkml pyspark pysparkling h2oai pyspark-notebook pyspark-mllib h2o-automl

Updated Feb 28, 2019
HTML

brunowdev / sparkify

Star

This is the final project for the Data Scientist Nanodegree, where our goal is to predict churn for a fictional streaming service called Sparkify.

udacity pyspark pyspark-mllib data-science-capstone sparkify

Updated Jul 6, 2023
HTML

easonlai / Samples_for_Azure_Databricks_Orientation

Star

Samples for Azure Databricks Orientation

Updated Nov 3, 2020
HTML

DunnBC22 / Spark_Projects

Star

This is the repository for all of my Spark projects, which include Spark NLP & Computer Vision projects.

python scala spark pyspark image-classification spark-nlp

Updated Oct 9, 2023
HTML

arjunsingh88 / Big-Data-Pyspark

Star

The goal was to perform predictive maintenance on commercial turbofan engine. The approach used here is a data-driven approach, meaning that data collected from the operational jet engine is used to perform predictive maintenance modeling. To be specific, to build a predictive model to estimate the Remaining Useful Life ( RUL) of a jet engine ba…

big-data pyspark predictive-modeling regression-models classification-models