Demonstration of using Python to process the Common Crawl dataset with the mrjob framework
-
Updated
Apr 1, 2022 - Python
Demonstration of using Python to process the Common Crawl dataset with the mrjob framework
Efficient and scalable parallelism using the message passing interface (MPI) to handle big data and highly computational problems.
RedisGears python client
There are Python 2.7 codes and learning notes for Spark 2.1.1
Code for paper "Locally Distributed Deep Learning Inference on Edge Device Clusters"
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Iterable Java8 style Streams for Python
Learn Big Data tools/ framework by doing examples, POC, per projects.
A case study on mining association rules between different factors related to deaths of people in the United States
Distributed encoding, second generation.
This is a TF-IDF calculator for shakespearean play dataset
🎓Repository for masters labs on FCSN, BSUIR
Dirt simple map/reduce
A tool that converts long audio files into a thorough, summarized report. Leverages OpenAI and its API (ChatGPT backend), Langchain for text processing, and Pinecone for vector database facilitation.
ETH Data Mining Class
Big Data Processing with Hadoop
Laboratory exercise created with Apache Spark, in the context of the "Advanced Topics in Database Systems" course in NTUA
Assignment for "Big Data" course in PESU.
Add a description, image, and links to the map-reduce topic page so that developers can more easily learn about it.
To associate your repository with the map-reduce topic, visit your repo's landing page and select "manage topics."