pyspark

Star

Here are 35 public repositories matching this topic...

microsoft / SynapseML

Star

Simple and Distributed Machine Learning

Updated Jun 6, 2024
Scala

JohnSnowLabs / spark-nlp

Star

State of the Art Natural Language Processing

Updated Jun 6, 2024
Scala

G-Research / spark-dgraph-connector

Star

A connector for Apache Spark and PySpark to Dgraph databases.

spark pyspark dgraph gr-oss

Updated Jun 5, 2024
Scala

mohankrishna02 / interview-scenerios-spark-sql

Star

This repository focuses on providing interview scenario questions that I have encountered during interviews. The questions are designed to simulate real-world scenarios and test your problem-solving and technical skills. By exploring these scenarios, you can gain insights into common interview topics and prepare yourself for similar challenges.

sql spark pyspark spark-sql

Updated May 29, 2024
Scala

G-Research / spark-extension

Star

A library that provides useful extensions to Apache Spark and PySpark.

python java scala spark pyspark gr-oss

Updated May 31, 2024
Scala

h2oai / sparkling-water

Star

Sparkling Water provides H2O functionality inside Spark cluster

machine-learning scala big-data spark integration h2o pyspark pysparkling rsparkling

Updated May 27, 2024
Scala

Azure / azure-cosmosdb-spark

Star

Apache Spark Connector for Azure Cosmos DB

spark apache-spark connector jupyter-notebook pyspark databricks changefeed lambda-architecture azure-cosmos-db databricks-notebooks cosmos-db azure-databricks

Updated May 20, 2024
Scala

archivesunleashed / aut

Star

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

scala big-data spark apache-spark hadoop analysis python3 text-extraction pyspark digital-humanities dataframe big-data-analytics webarchives network-graphing

Updated Feb 27, 2024
Scala

paypal / gimel

Star

Big Data Processing Framework - Unified Data API or SQL on Any Storage

python elasticsearch paypal scala kafka big-data spark cassandra jdbc hbase restapi pyspark spark-streaming aerospike teradata data-api gimel streaming-sql

Updated Nov 23, 2023
Scala

ttiimmothy / spark

Star

Apache Spark - A unified analytics engine for large-scale data processing

scala spark pyspark

Updated Nov 17, 2023
Scala

codyle50 / spark-bigquery-parallel

Star

bigquery spark apache-spark pyspark google-cloud-platform spark-sql spark-dataframes pyspark-python

Updated Oct 23, 2023
Scala

mohankrishna02 / Walmart-Stock-Analysis-Spark

Star

This GitHub repository contains code that performs analysis on a Walmart stock dataset using Spark, a fast and distributed data processing engine. The code utilizes various Spark functions to explore and manipulate the dataset, and computes statistics to gain insights into the stock's performance.

python scala spark bigdata pyspark