# Scala Primer

[Scala](https://en.wikipedia.org/wiki/Scala_(programming_language) is JVM based and has both object-oriented and functional aspects.  Whereas Python is an older imperative language that has tried to graft in functional programming elements, Scala was built ground-up as a [functional language](https://en.wikipedia.org/wiki/Functional_programming), with appropriate concessions for [object-oriented programming](https://en.wikipedia.org/wiki/Object-oriented_programming).
- Scala is [strongly typed](https://en.wikipedia.org/wiki/Strong_and_weak_typing), and strongly-typed functional programming languages have the nice (empirical) property that if you can get the code to compile, it will almost always work as intended.
- Unlike many other typed programming languages, you do not have to annotate every variable because the compiler has a strong [type inference](http://www.scala-lang.org/old/node/127).
- Scala is JVM so it can directly tap into existing [Java libraries](http://docs.scala-lang.org/tutorials/scala-for-java-programmers.html).
- The downside of Scala is that the compiler is [notoriously slow](http://www.quora.com/Why-is-the-Scala-compiler-so-slow).

In [1]:
val x: Int = 1+1         // you can annotate the type
val z= 1+1            // or not
assert(x == z)
println(s"x: $x, y: $z") // string interpolation

x: 2, y: 2




In [3]:
// + is actually a method of the "Int" class
assert(x.+(z) == x + z)

// you can also omit traditional class method invocation syntax (but probably shouldn't)
(1 until 5) foreach println



1
2
3
4


## Values vs. Variables

In [4]:
var x = 1
x = 2   // but don't use vars!
val y = 1  // this throws an exception
// y = 2





1

## Chaining and anonymous functions

In [2]:
val urls = List(
    "http://www.google.com",
    "http://www.google.com?q=scala",
    "http://www.yahoo.com",
    "http://www.bing.com"
)

// chained maps
val names1 = urls
    .map(_.split("//").last)
    .map(_.split('.')(1))


// multiline anonymous function
val names2 = urls.map({x =>
    val domain = x.split("//").last
    domain.split('.')(1)
})

assert(names1 == names2)





In [3]:
names1





List(google, google, yahoo, bing)

## Basic data structures

In [5]:
val smallList = List(1, 2, 3)
val smallMap = Map('a' -> 1, 'b' -> 2)        // equiv: Map(('a', 1), ('b', 2))
val smallSet = Set('a', 'b')
val smallArray = Array(1, 2, 3)
val optSome = Some(2)                         // Option type
val optNone = None

val list = (0 until 10).toList                 // alt: (0 to 9).toList
val largerNumbers = list.map(_ + 1000)         // alt: list.map(x => x + 1000)
val evens = list.filter(_ % 2 == 0)
val set = list.toSet                           // alt: list.toSet(), parens optional
val alphabet = ('a' to 'z').toList
val alphabetMap = alphabet.zipWithIndex.toMap  // like python's enumerate function

// foreach is like map but when you don't expect a return value
smallList.foreach(println)
smallMap.foreach(println)

1
2
3
(a,1)
(b,2)




## Equality in Scala

*It depends on what the meaning of the word 'is' is*: [a famous ex-president.]

There are [two notions](https://en.wikipedia.org/wiki/Relational_operator#Object_identity_vs._content_equality) of `==`:
- *Object identity* or *shallow equality* is when two objects are pointing  at the same thing (in C, we might call this a pointer check)
- *Structural equality* or *deep equality* is when the two objects reference equivalent sturctures

This can be a little confusing because `==` can mean both:

In [6]:
val a1 = (0 to 10).toArray
val a2 = (0 to 10).toArray
assert(a1 != a2)  // legacy java notion of comparison

val l1 = (0 to 10).toList
val l2 = (0 to 10).toList
assert(l1 == l2)





In [17]:

l1.foreach(println)

0
1
2
3
4
5
6
7
8
9
10




## List vs. Array

A Scala `List` is implemented as a linked list.  An `Array` is really a fixed length in memory buffer (just like the Java array).  They both inherit from the more generic `Seq` interface.

In [15]:
// construct list
val list = 1 :: 2 :: 3 :: Nil // 1 -> 2 -> 3 -> end 

// and use it
def sum(list: List[Int]): Int = list match {
    case head :: tail => head + sum(tail)
    case Nil => 0
}
assert(sum(list) == 6)

// construct an array
val array = new Array[Int](3)
array(0) = 1
array(1) = 2
array(2) = 3
// array(4) = 4  // this throws an exception

assert(list(2) == array(2))  // select
assert((0 :: list) == (0 +: array).toList)  // prepend
assert((list :+ 4) == (array :+ 4).toList)  // append





In [21]:
array.foreach(println)


1
2
3




**Question:**
- What is the running time of `list(n)` in Scala (and Python)?
- What is the running time of `array(n)`?
- What is the cost of prepending to `list` (i.e. `a :: list`)?
- What is the cost of prepending to `array` (i.e. `a +: list`)?
- What is the cost of appending to `list` (i.e. `list :+ a`)?
- What is the cost of appending to `array` (i.e. `list :+ a`)?

## Monads and map, flatMap, filter, foreach

#### In Python
It is Pythonic to operate on lists -- element-wise operations, filtering, etc. -- by using list comprehensions.  In many other languages, starting with Lisp but extending to many "functional" programming languages, a different style is preferred:

The idea is that if `f` is a function, then one thinks of the application
>          
    list   |---->   [ f(x) for x in list ]

on lists as a function of _two_ arguments: `f` and `list`.  The idea of viewing the function `f` as a parameter is typical in functional programming languages, and can be taken as a definition of the later term.

Some common idioms in this style, with Pythonic equivalents, are:

- Apply `f` element-wise to `list`:
    ```python
    map(f, list) == [ f(x) for x in list ]
    ```
- Filter `list` using `f`:
    ```python
    filter(f, list) == [ x for x in list if f(x) ]
    ```
- Here `f` is a function that eats elements (of the type contained in `list`) and spits out lists, and `flatMap` first applies `f` element-wise to the elements of `list` and then _flattens_ or _concatenates_ the resulting lists.  It is sometimes also called `concatMap`.
    ```python
    flatMap(f, list) == [ x for y in list for x in f(y) ]
    ```
- `reduce(f, list[, initial])`: Here `f` is a function of _two_ variables, and folds over the list applying `f` to the "accumulator" and the next value in the list.  That is, it performs the following recursion
    $$    a_{-1} = \mathrm{initial} $$
    $$    a_i = f(a_{i-1}, \mathrm{list}_i) $$
    with the final answer being $a_{\mathrm{len}(\mathrm{list})-1}$.  (If initial is omitted, just start with $a_0 = \mathrm{list}_0$.)  For instance,
    ``` python           
    reduce(lambda x,y: x+y, [1,2,3,4]) == ((1+2)+3)+4 == 10
    ```
    
    
###Remark:
This is where the name "map reduce" comes from.

### Examples:
Below are some examples of iterations.

In [22]:
// All factors of a number: what type does this return?
def factors(num: Int) = {
    (1 to num).filter(num % _ == 0).toList  // last line in function is returned
}

val factors6 = factors(6)

// count of all factors up to n
def factorSum(num: Int) = {
    val allFactors = (1 to num).flatMap(factors)
    allFactors.groupBy(x => x).map(t => (t._1, t._2.length))  // can map over a HashMap
}

val factorSum10 = factorSum(10)





Map(5 -> 2, 10 -> 1, 1 -> 10, 6 -> 1, 9 -> 1, 2 -> 5, 7 -> 1, 3 -> 3, 8 -> 1, 4 -> 2)

In [23]:
// largest palindrome that is the product of two 3-digit numbers

// this is hard to read
val r1 = ((100 to 999).view
            .flatMap(i => (i to 999).map(i * _))
            .filter(n => n.toString == n.toString.reverse)
            .max)

// for comprehension is easier to read
val r2 = (for {
    i <- 100 to 999
    j <- i to 999
    val n = (i * j)
    val s = n.toString
    if s == s.reverse
} yield { n }).max

assert(r1 == r2)





In [24]:
r2





906609

## Options and map

`Option` is a funny collection because it contains at most a single object.  But it turns out to be the most useful and very powerful when combined with `map` and `flatMap`.  The type system checks that you've considered all the corner cases.

In [None]:
// so what are options for?
import java.util.Arrays

// binary search does not always return an index, so the return type should be optional
def binarySearch(array: Array[Char], key: Char): Option[Int] = {
    Arrays.binarySearch(array, key) match {
        case index if index < 0 => None
        case index => Some(index)
    }
}

val array = ('a' to 'z').toArray

assert(binarySearch(array, 'b') == Some(1))
assert(binarySearch(array, 'B') == None)

// character after 'b' (don't forget the null case)
def letterAfter1(array: Array[Char], key: Char): Option[Char] = {
    binarySearch(array, key) match {
        case Some(index) => if (array.isDefinedAt(index + 1)) {
            Some(array(index + 1))
        } else {
            None
        }
        case None => None
    }
}

// an even more idiomatic implementation
def letterAfter2(array: Array[Char], key: Char): Option[Char] = {
    binarySearch(array, key)
        .flatMap(index => array.drop(index+1).headOption)
}

assert(letterAfter1(array, 'b') == letterAfter2(array, 'b'))
assert(letterAfter1(array, 'z') == letterAfter2(array, 'z'))
assert(letterAfter1(array, 'B') == letterAfter2(array, 'B'))

## Classes and objects

### A simple example

In [25]:
trait PointTrait {
    def square(x: Double) = x * x // don't even need to declare return type
}

class Point(val x: Double, val y: Double) extends PointTrait {
    // val means constructor arguements are public    
    def distance(p: Point): Double = {
        math.sqrt(square(p.x - x) + square(p.y - y))
    }
    
    val norm = math.sqrt(square(x) + square(y))  // as if in constructor (e.g. __init__)
}

object Point extends PointTrait

assert(new Point(0., 0.).distance(new Point(1., 1.)) == new Point(1., 1. ).norm)





### A more complex example

In [27]:
class Stack[T] {  // T is a type parameter (see usage)
    var data: List[T] = Nil
    
    def push(x: T) {  // no return value => don't need '='
        data = x :: data
    }

    def pop: Option[T] = data match {  // matches on type of 
        case init :: tail => {  // matches non-empty list
            data = tail
            Some(init)
        }
        case Nil => None  // matches empty list
    }
}

val stack = new Stack[Int]
stack.push(2)
stack.push(3)
assert(stack.pop == Some(3))
assert(stack.pop == Some(2))
assert(stack.pop == None)





A good [reference](https://twitter.github.io/scala_school/type-basics.html) for Scala typing.

## Pass arguments by name or value

In [28]:
def passValue(x: Int) = {
  println("x1=" + x)
  println("x2=" + x)
}

def passAnonymousFunction(x: => Int) = {
  println("x1=" + x)
  println("x2=" + x)
}

def printfn() = {
    println("hello")
    3
}

passValue(printfn)
println("-----")
passAnonymousFunction(printfn)

hello
x1=3
x2=3
-----
hello
x1=3
hello
x2=3




## Try, catch, and finally
Try and catch in scala is a lot like 

In [29]:
try{
    println("Before the exception")
    throw new IllegalArgumentException ("")
    println("After the exception")
} catch {
    case e: Exception => println("got an exception: " + e)
} finally {
    println("print me no matter what")
}

Before the exception
got an exception: java.lang.IllegalArgumentException: 
print me no matter what




## Advantages and disadvantages of Scala?
1. It reduces performance overhead
2. Access to the latest and greatest
3. Understand the underlying philosophy of computation that Spark inherits from being developed in Scala
4. One language to rule them all:
> With Spark and Scala, the experience is different, because you’re using the same language for everything. You’re writing Scala to retrieve data from the cluster via Spark. You’re writing Scala to manipulate that data locally on your own machine. And then — and this is the really neat part — you can send Scala code into the cluster so that you can perform the exact same transformations that you performed locally on data that is still stored in the cluster. It’s difficult to express how transformative it is to do all of your data munging and analysis in a single environment, regardless of where the data itself is stored and processed. It’s the sort of thing that you have to experience for yourself to understand, and we wanted to be sure that our recipes captured some of that same magic feeling that we felt when we first started using Spark.

## Additional resources

- [Scala for Python programmers](http://bugra.github.io/work/notes/2014-10-18/scala-basics-for-python-developers/)
- [The Official Scala-Lang Learning Page](http://www.scala-lang.org/documentation/)
- [Another tutorial on Scala](http://www.scala-lang.org/docu/files/ScalaTutorial.pdf)
- [Scala For The Impatient](http://fileadmin.cs.lth.se/scala/scala-impatient.pdf)

#### Exit Tickets
1. Explain flatMap to a layman.
1. Write a fully typed function that accepts an integer as input, returns the integer if it is divisible by k, and if not returns None.

*Copyright &copy; 2015 The Data Incubator.  All rights reserved.*