pyspark
Here are 1,119 public repositories matching this topic...
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
-
Updated
Jun 6, 2024 - Python
the portable Python dataframe library
-
Updated
Jun 6, 2024 - Python
ORM for Apache Spark and DataFrames schema manager
-
Updated
Jun 6, 2024 - Python
A Comprehensive Framework for Building End-to-End Recommendation Systems with State-of-the-Art Models
-
Updated
Jun 6, 2024 - Python
Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
-
Updated
Jun 5, 2024 - Python
Open Source Contributor Index
-
Updated
Jun 6, 2024 - Python
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
-
Updated
Jun 4, 2024 - Python
Dataproc templates and pipelines for solving simple in-cloud data tasks
-
Updated
Jun 5, 2024 - Python
A tool for building feature stores.
-
Updated
Jun 6, 2024 - Python
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
-
Updated
Jun 3, 2024 - Python
Possibly the fastest DataFrame-agnostic quality check library in town.
-
Updated
Jun 3, 2024 - Python
This project demonstrates data engineering tasks using basketball data within a Databricks environment. The main goals are to process raw data, cleanse it, and analyze it using SQL to gain insights into player demographics across different teams.
-
Updated
Jun 1, 2024 - Python
Improve this page
Add a description, image, and links to the pyspark topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the pyspark topic, visit your repo's landing page and select "manage topics."