# Data-Parallel Operations I

In Scala, most collection operations can become data-parallel.

The `.par` call converts a sequential collection to a parallel collection.

However, some operations are not parallelizable.

## Non-Parallelizable Operations

Task: implement the method sum using the foldLeft method.

```scala
def sum(xs: Array[Int]): Int = {
xs.par.foldLeft(0)(_ + _)
}
```
Does this implementation execute in parallel? Why not?

No.

```scala
Let’s examine the foldLeft signature:
def foldLeft[B](z: B)(f: (B, A) => B): B
```

In order for values of `B` to be available for the right elements, the left elements has to computed with `f`.

Operations `foldRight`, `reduceLeft`, `reduceRight`, `scanLeft` and `scanRight` similarly must process the elements sequentially.

Next, let’s examine the fold signature:

```scala
def fold(z: A)(f: (A, A) => A): A
```

The fold operation can process the elements in a reduction tree, so it can execute in parallel.

# Data-Parallel Operations II


In this lecture, we study the parallel fold operation more closely, to understand its advantages and some limitations.

## Use-cases of the `fold` Operation

Implement the `sum` method:
```scala
def sum(xs: Array[Int]): Int = {
xs.par.fold(0)(_ + _)
}
```
Implement the `max` method:
```scala
def max(xs: Array[Int]): Int = {
xs.par.fold(Int.MinValue)(math.max)
}
```

## Preconditions of the `fold` Operation



Given a list of "paper", "rock" and "scissors" strings, find out who won:
```scala
Array("paper", "rock", "paper", "scissors")
.par.fold("")(play)
```


In [2]:
def play(a: String, b: String): String = List(a, b).sorted match {
case List("paper", "scissors") => "scissors"
case List("paper", "rock") => "paper"
case List("rock", "scissors") => "rock"
case List(a, b) if a == b => a
case List("", b) => b
}

defined [32mfunction[39m [36mplay[39m

```scala
Array("paper", "rock", "paper", "scissors").par.fold("")(play)

play(play("paper", "rock"), play("paper", "scissors")) == "scissors"

play("paper", play("rock", play("paper", "scissors"))) == "paper"
```

But the data parallel scheduler is allowed to organize in different way (bottom), so we get unexpected result.

Why does this happen?

The play operator is commutative, but not associative

In order for the fold operation to work correctly, the following relations
must hold:

$$f(a, f(b, c)) == f(f(a, b), c)$$

$$f(z, a) == f(a, z) == a$$
We say that the neutral element z and the binary operator f must form a monoid.

Commutativity does not matter for fold – the following relation is not
necessary:
$$f(a, b) == f(b, a)$$


Given an array of characters, use fold to return the vowel count:
```scala
Array(‘E‘, ‘P‘, ‘F‘, ‘L‘).par
.fold(0)((count, c) => if (isVowel(c)) count + 1 else count)
```
Above program  does not compile -- 0 is not a Char
The fold operation can only produce values of the same type as the collection that it is called on.

## The aggregate operation

Let’s examine the aggregate signature:
```scala
def aggregate[B](z: B)(f: (B, A) => B, g: (B, B) => B): B
```

`aggregate` takes a sequential folding operator `f` and parallel folding operator `g`.

## Using the aggregate Operation

Count the number of vowels in a character array:
```scala
Array(‘E‘, ‘P‘, ‘F‘, ‘L‘).par.aggregate(0)(
(count, c) => if (isVowel(c)) count + 1 else count,
_ + _
)
```
The parallel reduction operator `g` and neutral element `z` must form monoid.

So far, we saw the accessor combinators.
Transformer combinators, such as `map`, `filter`, `flatMap` and `groupBy`, do not return a single value, but instead return new collections as results.