sparksql

Designed a Machine Learning model which takes newsgroup dataset and performs binary classification to predict if a given document has Atheistic or Christian sentiment. Used LIME library and PySpark. Performed feature selection to improve classifier’s performance.

feature-selection pyspark mllib sparksql python-3 binary-classification lime f1-score newsgroups-dataset explain-classifiers

Updated Apr 15, 2020
Python

dpghazi-zz / stack-overflow-big-data-processing

Star

Code for creating a Spark application written in Python and Big Data Processing with Spark (PySpark) and AWS (EMR)

emr aws sql big-data spark apache-spark hadoop ec2 s3-bucket pyspark sparksql

Updated Sep 1, 2022
Python

ritamghoshgds / DnA-F1-POC

Star

The project harnessed an ETL multi-hop architecture, ingesting data from the Ergast API into a storage backed by Azure Data Lake. The process involved weekly ingestion of bronze layer data as cutover and delta files. Raw data, in varied formats, was transformed using Azure Databricks PySpark notebooks into enriched Silver and Gold layers.

python pyspark sparksql databricks-notebooks

Updated Aug 28, 2023
Python

AfonsoFeliciano / Dados-Abertos-Eleicoes

Star

Repositório para processamento e modelagem dimensional dos dados das eleições utilizando Spark no Databricks Community

spark pyspark sparksql databricks eleicoes modelagem-dimensional

Updated Oct 3, 2022
Python

ashshetty90 / spark-kafka-application

Star

docker kafka spark python3 spark-streaming sparksql pub kafka-producer producer-consumer docker-spark docker-kafka kafka-python dockerise

Updated Nov 11, 2019
Python

shubhammirajkar / superstore_azure_de_project

Star

Copying data from Amazon S3 bucket to Azure Blob container by using Azure Data Factory pipeline. This Data is mounted to Databricks and further analysis is done using Spark SQL.

s3-bucket sparksql databricks azuredatafactory

Updated Dec 10, 2023
Python

Improve this page

Add a description, image, and links to the sparksql topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the sparksql topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sparksql

Here are 31 public repositories matching this topic...

commoncrawl / cc-pyspark

4paradigm / DemoApps

kaushikamaravadi / Hadoop

BrooksIan / CensusEcon

largecats / sparksql-formatter

vaibhavi1321 / SparkBasics

swarna0712 / San-Fransisco-Crime-Classification-using-PySpark

Rmandha / Artworks

pratikSethi / perf-ops

omarfessi / data-modeling-Spark-S3

fdabhi / Aadhar-Data-Analysis

bislaravi / Yelp_Business_Helper

kaantas / twitter-trending-topics

BigBigRadish / spark-machine-learning

p-disha / Data-Mining-on-Newsgroup-data

dpghazi-zz / stack-overflow-big-data-processing

ritamghoshgds / DnA-F1-POC

AfonsoFeliciano / Dados-Abertos-Eleicoes

ashshetty90 / spark-kafka-application

shubhammirajkar / superstore_azure_de_project

Improve this page

Add this topic to your repo