- discoproject/disco 1,367 a Map/Reduce framework for distributed computing
- cloudera/flume 857 WE HAVE MOVED to Apache Incubator. https://cwiki.apache.org/FLUME/ . Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log…
- erikfrey/bashreduce 750 mapreduce in bash
- muricoca/crab 712 Crab is a ﬂexible, fast recommender engine for Python that integrates classic information ﬁltering recommendation algorithms in the world of scientiﬁc Python packages (numpy, scipy, matplotlib).
- twitter/hadoop-lzo 399 Refactored version of code.google.com/hadoop-gpl-compression for hadoop 0.20