Process Common Crawl data with Python and Spark
-
Updated
Apr 8, 2024 - Python
Process Common Crawl data with Python and Spark
demo applications that show how to deploy offline feature engineering solutions to online in one minute with fedb and nativespark
Data Mining Census ECON using Apache Spark
A SparkSQL formatter based on https://github.com/zeroturnaround/sql-formatter, with customizations and extra features.
Spark application using python API to run analytics using CSV and JSON data
Big Data Project - SSML - Spark Streaming for Machine Learning
Data Pipeline created from scraping the artsy.net website
Extract Load Transform data from S3 TO S3 using Spark on AWS
Structured Spark Streaming with Apache Kafka and Twitter
Designed a Machine Learning model which takes newsgroup dataset and performs binary classification to predict if a given document has Atheistic or Christian sentiment. Used LIME library and PySpark. Performed feature selection to improve classifier’s performance.
The project harnessed an ETL multi-hop architecture, ingesting data from the Ergast API into a storage backed by Azure Data Lake. The process involved weekly ingestion of bronze layer data as cutover and delta files. Raw data, in varied formats, was transformed using Azure Databricks PySpark notebooks into enriched Silver and Gold layers.
Repositório para processamento e modelagem dimensional dos dados das eleições utilizando Spark no Databricks Community
Copying data from Amazon S3 bucket to Azure Blob container by using Azure Data Factory pipeline. This Data is mounted to Databricks and further analysis is done using Spark SQL.
Add a description, image, and links to the sparksql topic page so that developers can more easily learn about it.
To associate your repository with the sparksql topic, visit your repo's landing page and select "manage topics."