From 82225dac839a56d3f63128a8fedde23d6c7791be Mon Sep 17 00:00:00 2001 From: Edwin Chen Date: Sun, 8 Apr 2012 21:36:06 -0700 Subject: [PATCH 1/4] Update README. --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 36db42c9b5..532976d548 100644 --- a/README.md +++ b/README.md @@ -49,8 +49,9 @@ You can find this example code and more under [examples/](https://github.com/twi ## Getting Started * Check out the [Getting Started](https://github.com/twitter/scalding/wiki/Getting-Started) page on the [wiki](https://github.com/twitter/scalding/wiki). -* Next, run through the [tutorials](https://github.com/twitter/scalding/tree/master/tutorial) provided in the source. +* Next, go through the [runnable tutorials](https://github.com/twitter/scalding/tree/master/tutorial) provided in the source. * The [API Reference](https://github.com/twitter/scalding/wiki/API-Reference) contains general documentation, as well as many example Scalding snippets. +* The [Scalding Wiki](https://github.com/twitter/scalding/wiki) contains more useful information. ## Building 0. Install sbt 0.11 From 053569f2f30945e57fbdbb744737b7f951118ebc Mon Sep 17 00:00:00 2001 From: Edwin Chen Date: Sun, 8 Apr 2012 21:38:22 -0700 Subject: [PATCH 2/4] Update README. --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 532976d548..a950f3dce5 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # Scalding -Scalding is a Scala library that makes it easy to write MapReduce jobs in Hadoop. Instead of forcing you to write raw `map` and `reduce` functions, Scalding allows you to write code that looks like *natural* Scala. It's similar to other MapReduce platforms like Pig, but offers a higher level of abstraction due to its built-in integration with Scala and the JVM. +Scalding is a Scala library that makes it easy to write MapReduce jobs in Hadoop. Instead of forcing you to write raw map and reduce functions, Scalding allows you to write code that looks like *natural* Scala. It's similar to other MapReduce platforms like Pig, but offers a more powerful level of abstraction due to its built-in integration with Scala and the JVM. Scalding is built on top of [Cascading](http://www.cascading.org/), a Java library that abstracts away much of the complexity of Hadoop. From 20bc66c2e2e2c119a488f1f375a9dc29a145f371 Mon Sep 17 00:00:00 2001 From: Edwin Chen Date: Sun, 8 Apr 2012 21:45:50 -0700 Subject: [PATCH 3/4] Update README. --- README.md | 19 +++---------------- 1 file changed, 3 insertions(+), 16 deletions(-) diff --git a/README.md b/README.md index a950f3dce5..c3beae0359 100644 --- a/README.md +++ b/README.md @@ -4,6 +4,8 @@ Scalding is a Scala library that makes it easy to write MapReduce jobs in Hadoop Scalding is built on top of [Cascading](http://www.cascading.org/), a Java library that abstracts away much of the complexity of Hadoop. +Current version: 0.4.1 + ## Word Count Hadoop is a distributed system for counting words. Here is how it's done in Scalding. @@ -13,21 +15,6 @@ package com.twitter.scalding.examples import com.twitter.scalding._ -class WordCountJob(args : Args) extends Job(args) { - TextLine( args("input") ) - .flatMap('line -> 'word) { line : String => line.split("\\s+") } - .groupBy('word) { _.size } - .write( Tsv( args("output") ) ) -} -``` - -Here's another example that uses a slightly more complex tokenizer. - -```scala -package com.twitter.scalding.examples - -import com.twitter.scalding._ - class WordCountJob(args : Args) extends Job(args) { TextLine( args("input") ) .flatMap('line -> 'word) { line : String => tokenize(line) } @@ -44,7 +31,7 @@ class WordCountJob(args : Args) extends Job(args) { Notice that the `tokenize` function, which is standard Scala, integrates naturally with the rest of the MapReduce job. This is a very powerful feature of Scalding. (Compare it to the use of UDFs in Pig.) -You can find this example code and more under [examples/](https://github.com/twitter/scalding/tree/master/src/main/scala/com/twitter/scalding/examples). +You can find more example code under [examples/](https://github.com/twitter/scalding/tree/master/src/main/scala/com/twitter/scalding/examples). If you're interested in comparing Scalding to other languages, see the [Rosetta Code page](https://github.com/twitter/scalding/wiki/Rosetta-Code), which contains several MapReduce tasks translated from other frameworks like Pig and Hadoop Streaming into Scalding. ## Getting Started From 90a9d7f60a6000c9bd12d3405a432ac4cbfbb306 Mon Sep 17 00:00:00 2001 From: Edwin Chen Date: Mon, 9 Apr 2012 12:44:44 -0700 Subject: [PATCH 4/4] Update README. --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index c3beae0359..b4e904873a 100644 --- a/README.md +++ b/README.md @@ -31,7 +31,7 @@ class WordCountJob(args : Args) extends Job(args) { Notice that the `tokenize` function, which is standard Scala, integrates naturally with the rest of the MapReduce job. This is a very powerful feature of Scalding. (Compare it to the use of UDFs in Pig.) -You can find more example code under [examples/](https://github.com/twitter/scalding/tree/master/src/main/scala/com/twitter/scalding/examples). If you're interested in comparing Scalding to other languages, see the [Rosetta Code page](https://github.com/twitter/scalding/wiki/Rosetta-Code), which contains several MapReduce tasks translated from other frameworks like Pig and Hadoop Streaming into Scalding. +You can find more example code under [examples/](https://github.com/twitter/scalding/tree/master/src/main/scala/com/twitter/scalding/examples). If you're interested in comparing Scalding to other languages, see the [Rosetta Code page](https://github.com/twitter/scalding/wiki/Rosetta-Code), which contains several MapReduce tasks translated from other frameworks (e.g., Pig and Hadoop Streaming) into Scalding. ## Getting Started @@ -57,11 +57,11 @@ artifact="scalding_2.8.1" or artifact="scalding_2.9.1". Currently we are using the cascading-user mailing list for discussions: -Follow @Scalding on Twitter for updates: - In the remote possibility that there exist bugs in this code, please report them to: +Follow [@Scalding](http://twitter.com/scalding) on Twitter for updates. + ## Authors: * Avi Bryant * Oscar Boykin