# Implementing Combiners

A transformer operation creates another collection instead of single value. Methods such as `map`, `flatMap`, `groupBy` are examples of such transformer operations. By contrast methods such as `fold`, `aggregate` and `sum` are not transfomer operations.

We also sequential operations can be implemented generically with `Builders`.
## Builders
Builders are used in sequential collection methods:
```scala
trait Builder[T, Repr] {
def +=(elem: T): this.type
def result: Repr
}
```

`T` type of element (Ex `String`), `Repr` denotes the type of collection (Ex `Seq[String]`). `Builders` can only be used to implement sequential operations.
To implement parallel collections, we need abstraction called `Combiner`.

## Combiners

```scala
trait Combiner[T, Repr] extends Builder[T, Repr] {
def combine(that: Combiner[T, Repr]): Combiner[T, Repr]
}
```

The old combiners are invalid once they are combined using `combine` method. 


How can we implement the combine method efficiently?

- when `Repr` is a `set` or a `map`, `combine` represents `union`
- when `Repr` is a `sequence`, `combine` represents `concatenation`

The combine operation must be efficient, i.e. execute in $\mathcal{O}(\log n + \log m)$
time, where $n$ and $m$ are the sizes of two input combiners.

* Question: Is the method combine efficient?
```scala
def combine(xs: Array[Int], ys: Array[Int]): Array[Int] = {
val r = new Array[Int](xs.length + ys.length)
Array.copy(xs, 0, r, 0, xs.length)
Array.copy(ys, 0, r, xs.length, ys.length)
r
}
```
No. Let's count total number of steps. `r` allocation takes `n+m` steps in JVM. Copying arrays `n+m` steps. Total `2(n+m)`. So it's $\mathcal{O}(n+m)$.\

Arrays cannot be efficiently concatenated. Since arrays occupy contigous block of memory concatenating two arrays means moving two entire arrays to new position. If both arrays are adjacent we can quickly return but generally it's not the case.
Typically, set data structures have efficient lookup, insertion and deletion.


* hash tables – expected $\mathcal{O}(1)$ -a contigous block of memory with partially filled elements. To get element we compute hash code and look up the index.

* balanced trees – $\mathcal{O}(\log n)$ -- Contains trees with optional child nodes. Nodes without children are called leaves. Length of longest path from root to leaf is never 2 times larger than shortest path to leaf. Height of tree is always $\mathcal{O}(\log n)$.
* linked lists – $\mathcal{O}(n)$

Most set implementations do not have efficient union operation.


## Sequences
Operation complexity for sequences can vary.
* mutable linked lists – $\mathcal{O}(1)$ prepend and append, $\mathcal{O}(n)$ insertion
* functional (cons) lists – $\mathcal{O}(1)$ prepend operations, everything else $\mathcal{O}(n)$
* array lists – amortized $\mathcal{O}(1)$ append, $\mathcal{O}(1)$ random accesss, otherwise $\mathcal{O}(n)$
Mutable linked list can have $\mathcal{O}(1)$ concatenation, but for most sequences,
concatenation is O(n).