pyspark

Here are 4,492 public repositories matching this topic...

ibis-project / ibis

the portable Python dataframe library

mysql python bigquery sql database clickhouse sqlite impala postgresql snowflake pandas pyspark mssql trino pyarrow datafusion duckdb polars

Updated Nov 28, 2025
Python

microsoft / SynapseML

Star

Simple and Distributed Machine Learning

Updated Nov 27, 2025
Scala

JohnSnowLabs / spark-nlp

Star

State of the Art Natural Language Processing

Updated Nov 27, 2025
Scala

apache / linkis

Star

Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.

Updated Nov 23, 2025
Java

AlexIoannides / pyspark-example-project

Star

Implementing best practices for PySpark ETL jobs and applications.

python data-science spark etl pyspark data-engineering etl-pipeline etl-job

Updated Jan 1, 2023
Python

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

machine-learning deep-learning tensorflow pytorch pyspark parquet parquet-files sysml pyarrow

Updated Oct 31, 2025
Python

awesome-spark / awesome-spark

Star

A curated list of awesome Apache Spark packages and resources.

awesome apache-spark pyspark sparkr

Updated Oct 24, 2024
Shell

jadianes / spark-py-notebooks

Star

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

python data-science machine-learning big-data spark notebook ipython bigdata ipython-notebook pyspark mllib data-analysis

Updated Mar 16, 2024
Jupyter Notebook

ptyadana / SQL-Data-Analysis-and-Visualization-Projects

Star

SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.

mysql python postgres sql apache-spark sqlite postgresql challenges pyspark mysql-database data-analysis exercises tableau sql-queries pgadmin mysqlworkbench mysql-notes digital-music-store sql-data-analysis

Updated Jul 18, 2022
Jupyter Notebook

hi-primus / optimus

Star

🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

data-science machine-learning spark bigdata data-transformation pyspark data-extraction data-analysis data-wrangling dask data-exploration data-preparation data-cleaning data-profiling data-cleansing big-data-cleaning data-cleaner cudf dask-cudf

Updated Dec 2, 2024
Python

narwhals-dev / narwhals

Star

Lightweight and extensible compatibility layer between dataframe libraries!

pandas pyspark dask ibis pyarrow cudf duckdb polars

Updated Nov 27, 2025
Python

jupyter-incubator / sparkmagic

Star

Jupyter magics and kernels for working with remote Spark clusters

magic spark kernel jupyter notebook cluster pandas-dataframe jupyter-notebook sql-query pyspark kerberos livy

Updated Sep 9, 2025
Python

mahmoudparsian / pyspark-tutorial

Star

PySpark-Tutorial provides basic algorithms using PySpark

big-data spark pyspark dataframes spark-dataframes pyspark-tutorial big-data-analytics data-algorithms spark-rdd rdds pyspark-sql ranking-functions

Updated May 26, 2025
Jupyter Notebook

logicalclocks / hopsworks

Star

Hopsworks - Data-Intensive AI platform with a Feature Store

python aws data-science machine-learning serverless azure gcp ml pyspark feature-engineering governance model-serving mlops feature-store feature-management hopsworks kserve

Updated Feb 10, 2025
Java

graphframes / graphframes

Star

GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs

big-data spark apache-spark graph-algorithms graphs pyspark graph-theory networks dataframe dataframes graph-analysis network-motifs connected-components network-motif

Updated Nov 20, 2025
Scala

mahmoudparsian / data-algorithms-book

Star

MapReduce, Spark, Java, and Scala for Data Algorithms Book

python java machine-learning scala apache-spark distributed-computing design-patterns pyspark mapreduce reducers partitioning hadoop-mapreduce distributed-algorithms mappers data-algorithms apache-hadoop

Updated Oct 14, 2024
Java

lakehq / sail

Star

LakeSail's computation framework with a mission to unify batch processing, stream processing, and compute-intensive AI workloads.

python rust data machine-learning sql big-data spark arrow distributed-computing artificial-intelligence pyspark data-engineering datafusion

Updated Nov 27, 2025
Rust

h2oai / sparkling-water

Star

Sparkling Water provides H2O functionality inside Spark cluster

machine-learning scala big-data spark integration h2o pyspark pysparkling rsparkling

Updated Nov 5, 2025
Scala

lyhue1991 / eat_pyspark_in_10_days

Star

pyspark🍒🥭 is delicious，just eat it!😋😋

spark pyspark

Updated Sep 22, 2022
Python

WeBankFinTech / Scriptis

Star

Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.

scala sql spark hive ide pyspark udf hue zeppelin hql hive-table resouce-management linkis errorcode

Updated Dec 11, 2024
Vue

Improve this page

Add a description, image, and links to the pyspark topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pyspark topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyspark

Here are 4,492 public repositories matching this topic...

ibis-project / ibis

microsoft / SynapseML

JohnSnowLabs / spark-nlp

apache / linkis

AlexIoannides / pyspark-example-project

uber / petastorm

awesome-spark / awesome-spark

jadianes / spark-py-notebooks

ptyadana / SQL-Data-Analysis-and-Visualization-Projects

hi-primus / optimus

narwhals-dev / narwhals

jupyter-incubator / sparkmagic

mahmoudparsian / pyspark-tutorial

logicalclocks / hopsworks

graphframes / graphframes

mahmoudparsian / data-algorithms-book

lakehq / sail

h2oai / sparkling-water

lyhue1991 / eat_pyspark_in_10_days

WeBankFinTech / Scriptis

Improve this page

Add this topic to your repo