Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Scalding is a Scala library that makes it easy to write MapReduce jobs in Hadoop. It's similar to other MapReduce platforms like Pig and Hive, but offers a higher level of abstraction by leveraging the full power of Scala and the JVM.
Scalding is built on top of Cascading, a Java library that abstracts away much of the complexity of Hadoop (such as the need to write raw
Need a suggestion for where to start? Try the Alice in Wonderland walkthrough which shows how to use Scalding step by step to learn about the book's text.
- Scaladocs: Generated documentation for current version of Scalding.
sbt docwill build scaladocs under the
target/2.9.2/api/directory, which you can then open in your browser.
- Type-safe API Reference. This API is very close to the scala collections API.
- REPL Reference
- Automatic Orderings, Monoids and Arbitraries: using macros to automatically generate needed Ordering, Moniod, Semigroup or Arbitrary instances for case classes and scala collections.
- Scalding Sources
- Scalding-Commons. The README of the former scalding-commons library.
- Rosetta Code. A collection of MapReduce tasks translated (from Pig, Hive, Cascalog, MapReduce Streaming, etc.) into Scalding.
- Oscar's Scalding Talk at the Hadoop Summit. Slides from Oscar's talk at the Hadoop Summit.
- Upgrading to 0.9.0 means fixing some compile issues. These sed rules may help.
- DEPRECATED: Fields-based API Reference. This is the original, Cascading DSL API to scalding using a named tuple model. We highly recommend the Type-safe API, using TypedPipe, for any new code. This page also contains many example code snippets illustrating each Scalding function. See Field Rules for more on Fields.
Third Party Modules
- Scalding-cassandra support for reading/writing cassandra
- [Spy Glass] (https://github.com/ParallelAI/SpyGlass) - Advanced featured HBase wrapper for Cascading and Scalding
- Scalding: Powerful & Concise MapReduce Programming
- Scalding lecture for UC Berkeley's Analyzing Big Data with Twitter class
- Scalding with CDH3U2 in a Maven project
- Running your Scalding jobs in Eclipse
- Run/Test jobs locally from Intellij IDEA
- Running your Scalding jobs in IDEA intellij
- Running Scalding jobs on EMR
- Running Scalding with HBase support: Scalding HBase wiki
- Using the distributed cache
- Calling Scalding from inside your application
- Unit Testing Scalding Jobs
- Using counters
NOTE: all of the following tutorials use the Fields API, which is deprecated
- Scalding for the impatient great set of tutorials on using scalding walking through simple to more complex examples (including TF-IDF).
- Movie Recommendations and more in MapReduce and Scalding
- Generating Recommendations with MapReduce and Scalding, a shorter version of the above post.
- Poker collusion detection with Mahout and Scalding
- Portfolio Management in Scalding
- Find the Fastest Growing County in US, 1969-2011, using Scalding
- Dean Wampler's Scalding Workshop. Presented by Dean at StrangeLoop 2012.
- Typesafe's Activator for Scalding. Also created by Dean Wampler.
Articles and presentations from around the web
- Hive, Pig, Scalding, Scoobi, Scrunch and Spark: A Comparison of Hadoop Frameworks
- Why Hadoop MapReduce needs Scala
- How Twitter is doing its part to democratize big data
- Meet the combo powering Hadoop at Etsy, Airbnb and Climate Corp.
- Scalding wins a Bossie award from InfoWorld
- Scalding: Hadoop Word Count in LESS than 70 lines of code
- Using Scalding with other versions of Scala
- Scala and sbt for Homebrew users
- Scala and sbt for MacPorts users
- Comparison to Scrunch and Scoobi
- Powered-By see who is using scalding in production.