Permalink
Browse files

Add scala syntax highlighting to CodeSnippets.md

  • Loading branch information...
1 parent 2f2bed5 commit ecd89aa3eb6fbbf986596cd982600434f664a01b @azymnis azymnis committed Feb 20, 2012
Showing with 48 additions and 30 deletions.
  1. +48 −30 tutorial/CodeSnippets.md
View
@@ -11,7 +11,7 @@ Filter
------
Filter out rows.
-
+```scala
val birds = animals.filter('type) { type : String => type == "Flying" }
// We can also filter on multiple fields at once.
@@ -21,11 +21,12 @@ Filter out rows.
val (speed, height) = x
(speed > 100) && height > 100
}
+```
Map
-----
Add new columns that are functions of the existing ones.
-
+```scala
// Map.
val addSpeedInKm =
birds
@@ -50,39 +51,42 @@ Add new columns that are functions of the existing ones.
val foo = bar.map(('a, 'b, 'c) -> ('a, 'b)) { ... } // This works. The new a and b columns replace the old a and b columns.
// However, if the mapped-to columns intersect, but are not a subset of the mapped-from columns, you get an error.
val foo = bar.map(('a, 'b, 'c) -> ('a, 'd)) { ... } // Error!
+```
Discard, Project
--------------------
Remove columns from your pipe.
-
+```scala
// We can remove fields we don't care about.
val forgetBirth = people.discard('birthplace, 'birthday)
-
+
// Discarding is the opposite of projecting.
val keepOnlyWorkplace = people.project('jobTitle, 'salary)
-
+```
+
Unique
------
Keep only unique rows.
-
+```scala
// Keep only the unique (firstName, lastName) pairs. All other fields are discarded.
people.unique('firstName, 'lastName)
+```
MapTo
------
MapTo is equivalent to mapping and then projecting.
-
+```scala
val savings =
items
.mapTo(('price, 'discountedPrice) -> 'savings) {
x : (Float, Float) =>
val (price, discountedPrice) = x
price - discountedPrice
}
-
+
// Equivalent to...
val savingsSame =
items
@@ -92,37 +96,40 @@ MapTo is equivalent to mapping and then projecting.
price - discountedPrice
}
.project('savings)
+```
FlatMap, FlatMapTo
----------------------
+```scala
val words =
books
.flatMap('text -> 'word) {
text : String =>
text.split("\\s+").map { word : String => word }
}
-
+
// Same as above, but keep only the word column.
val wordsOther =
books
.flatMapTo('text -> 'word) {
text : String =>
text.split("\\s+").map { word : String => word }
}
-
+```
+
Limit
-----
Make a pipe smaller.
-
+```scala
// Keep (approximately) 100 rows.
val oneHundredPeople = people.limit(100)
-
+```
GroupBy
==========
Group your pipe by the values in a specified set of columns, and then apply a grouping function to the values in each group.
-
+```scala
val wordCounts =
books
.flatMap('text -> 'word) {
@@ -136,75 +143,82 @@ Group your pipe by the values in a specified set of columns, and then apply a gr
_.size('count)
}
// We now have (word, count) columns in the pipe.
+```
Grouping functions include...
size
-----
Count the number of rows in this group
-
+```scala
wordCounts
.groupBy('word) {
// By default, if you don't pass in a new name, the new column is simply called 'size'.
// Here we call the new column 'count'.
_.size('count)
}
-
+```
+
average
-------
Take the mean of a column.
-
+```scala
// Find the mean age of boys vs. girls
people
.groupBy('sex) {
// The new column is called 'meanAge'.
_.average('age -> 'meanAge)
}
-
+```
+
mkString
-----------
Turn a column in the group into a string.
-
+```scala
wordCounts
.groupBy('count) {
// Take all the words with this count, join them with a comma, and call the new column "words".
_.mkString('word -> 'words, ",")
}
+```
toList
-------
Turn a column in the group into a list.
-
+```scala
wordCounts
.groupBy('count) {
// Take all the words with this count, join them into a list in a new column called "words".
_.toList[String]('word -> 'words)
}
+```
sum
----
Sum over a column in the group.
-
+```scala
expenses
.groupBy('shoppingLocation) {
// Sum over the 'cost' column, and rename the summed column to 'totalCost'.
_.sum('cost -> 'totalCost)
}
+```
reduce
-------
We can also reduce over grouped columns. This is equivalent to the previous sum.
The reduce function is required to be associative, so that the work can be done on the map side and not solely on the reduce side (like a combiner).
-
+```scala
expenses
.groupBy('shoppingLocation) {
_.reduce('cost -> 'totalCost) {
(costSoFar : Double, cost : Double) => costSoFar + cost
}
}
+```
foldLeft
-----------
@@ -215,44 +229,48 @@ Like reduce, but all the work happens on the reduce side (so the fold function i
count
-------
We can count the number of rows in a group that satisfy some predicate.
-
+```scala
val usersWithImpressions =
users
.groupBy('user) { _.count('numImpressions) { x : Long => x > 0 } }
+```
GroupAll
---------
There's also a groupAll function, which is useful if you want to (say) count the total number of rows in the pipe.
-
+```scala
// vocabSize is now a pipe with a single entry, containing the total number of words in the vocabulary.
val vocabSize =
wordCounts
.groupAll { _.size('vocabSize) }
-
-It's also useful if, right before outputting a pipe, you want to sort by certain columns.
+```
- val sortedPeople =
+It's also useful if, right before outputting a pipe, you want to sort by certain columns.
+```scala
+ val sortedPeople =
people.groupAll {
// Sort by lastName, then by firstName.
_.sortBy('lastName, 'firstName)
}
+```
Joins
-------------
We can do inner joins.
joinWithSmaller, joinWithLarger
-------------------------------
-
+```scala
// people is a large pipe with a birth_city_id. We join it with the smaller cities pipe on id.
val peopleWithBirthplaces = people.joinWithSmaller('birth_city_id -> 'id, cities)
-
+
// Equivalent to...
val peopleWithBirthplaces = cities.joinWithLarger('id -> 'birth_city_id, people)
-
+
// Note that the two pipes can not have a field name in common (not even the field they join on).
// For example, this throws an error:
people.joinWithSmaller('ssn -> 'ssn, teachers)
-
+
// Instead, we first rename the ssn field of one of the pipes:
people.rename('ssn -> 'ssnOther).joinWithSmaller('ssnOther -> 'ssn, teachers)
+```

0 comments on commit ecd89aa

Please sign in to comment.