Simple and Distributed Machine Learning
-
Updated
Jun 6, 2024 - Scala
Simple and Distributed Machine Learning
State of the Art Natural Language Processing
This repository focuses on providing interview scenario questions that I have encountered during interviews. The questions are designed to simulate real-world scenarios and test your problem-solving and technical skills. By exploring these scenarios, you can gain insights into common interview topics and prepare yourself for similar challenges.
Sparkling Water provides H2O functionality inside Spark cluster
Apache Spark Connector for Azure Cosmos DB
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
This GitHub repository contains code that performs analysis on a Walmart stock dataset using Spark, a fast and distributed data processing engine. The code utilizes various Spark functions to explore and manipulate the dataset, and computes statistics to gain insights into the stock's performance.
An Azure Databricks workshop leveraging the New York Taxi and Limousine Commission Trip Records dataset
Big Data Recipes
Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
Online latent state estimation with Spark
Isolation Forest on Spark
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Add a description, image, and links to the pyspark topic page so that developers can more easily learn about it.
To associate your repository with the pyspark topic, visit your repo's landing page and select "manage topics."