Skip to content
#

pyspark

Here are 1,120 public repositories matching this topic...

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

  • Updated Jun 4, 2024
  • Python

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines

  • Updated Jun 8, 2024
  • Python

MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization

  • Updated Oct 2, 2019
  • Python

Improve this page

Add a description, image, and links to the pyspark topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pyspark topic, visit your repo's landing page and select "manage topics."

Learn more