##### Copyright 2020 Google Inc.

Licensed under the Apache License, Version 2.0 (the "License").
<!--
    Licensed to the Apache Software Foundation (ASF) under one
    or more contributor license agreements.  See the NOTICE file
    distributed with this work for additional information
    regarding copyright ownership.  The ASF licenses this file
    to you under the Apache License, Version 2.0 (the
    "License"); you may not use this file except in compliance
    with the License.  You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing,
    software distributed under the License is distributed on an
    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    KIND, either express or implied.  See the License for the
    specific language governing permissions and limitations
    under the License.
-->


# Element-wise operations

The previous notebook showed two ways of doing operations at the element level: `ParDo` and `Map`. `ParDo` as the most general operation and `Map` as a simplification for *one-to-one* operations. While these two operations suffice (actually, just `ParDo` can do everything), there are some other **element-wise** operations that help with readability and optimization.

Let's import what the notebook needs first:

In [None]:
import logging
import sympy

import apache_beam as beam
from apache_beam import Create, FlatMap, Map, ParDo, Filter, Flatten, Partition, MapTuple, FlatMapTuple
from apache_beam import Keys, Values
from apache_beam.transforms.util import WithKeys

from apache_beam.runners.interactive.interactive_runner import InteractiveRunner
import apache_beam.runners.interactive.interactive_beam as ib

**`Filter`** applies a function for every element and outputs it if the function returns `True`.

In [None]:
p = beam.Pipeline(InteractiveRunner())
N = 20

primes = (p | "CreateNumbers" >> Create(range(N))
            | "IsPrime" >> Filter(sympy.isprime))

ib.show(primes)

**`FlatMap`** applies a transformation to an element and outputs none, one, or more elements. High level transformation. It's a simplification of the `ParDo`.

In [None]:
p = beam.Pipeline(InteractiveRunner())

elements = ["Lorem ipsum dolor sit amet. Consectetur adipiscing elit. Sed eu velit nec sem vulputate loborti",
            "In lobortis augue vitae sagittis molestie. Mauris volutpat tortor non purus elementum",
            "Ut blandit massa et risus sollicitudin auctor"]

lines = (p | Create(elements)
           | FlatMap(lambda x: x.split(". ")))

ib.show(lines)

Note that for every element, the output is more than one element (input is 3 elements and output is 6 elements). This operation could not be done with a `Map`, since `Map` can only output one (or none) element for every element. 

The function used needs to output an iterable.


</br>

Pipelines can use `PCollections` as parameters in functions, using `Side Inputs`. This parameter can be treated as a dictionary, as a list, as a singleton, or as an iterable. More details in the [Apache Beam documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.pvalue.html).

In [None]:
p = beam.Pipeline(InteractiveRunner())

values = [
    {"currency": "USD", "amount": 2.728281},
    {"currency": "EUR", "amount": 3.141592},
    {"currency": "CHF", "amount": 1729},
]

eur = {"CHF":1.0585,"EUR":1, "USD":1.0956}
usd = {"CHF":0.9661372764,"EUR":0.9127418766,"USD":1}
chf = {"EUR":0.9447331129,"CHF":1,"USD":1.0350495985}
rates = {"EUR":eur, "USD":usd, "CHF":chf}


def change_currency(value, ratios):
    current = value["currency"]
    exchanged = {"Original":current}
    for key in ratios[current]:
        exchanged[key] = value["amount"] * ratios[current][key]
    return [exchanged]

rates_pc = p | "Rates" >> Create(rates)

exchange = (p | Create(values)
              | ParDo(change_currency, ratios=beam.pvalue.AsDict(rates_pc)))

ib.show_graph(p)
ib.show(exchange)

`InteractiveRunner` has the option to export the output `PCollection` as a Pandas DataFrame, using `collect`

In [None]:
df = ib.collect(exchange)
df, type(df)

__________________________________________ 

One of the fundamentals of all ETL frameworks is the use of key-value pairs for aggregating and/or grouping data according to some logic. In Apache Beam, tuples of two values are treated as key-values.

Apache Beam has some built-in operations to add/extract keys and values.

**`WithKeys`** adds a key to each element and outputs the key and old element. 

**`Keys`** outputs the key of a key-value pair.

**`Values`** outputs the value of a key-value pair.

In [None]:
p = beam.Pipeline(InteractiveRunner())

elements = [
    {"country": "China", "population": 1389, "continent": "Asia"},
    {"country": "India", "population": 1311, "continent": "Asia"},
    {"country": "USA", "population": 331, "continent": "America"},
    {"country": "Australia", "population": 25, "continent": "Oceania"},
    {"country": "Brazil", "population": 212, "continent": "America"},
]

create = (p | "Create" >> Create(elements)
            | WithKeys(lambda x: x["continent"]))

key_pc = create | Keys()

value_pc = create | Values()

ib.show(key_pc, value_pc)

You can use `MapTuple` and `FlatMapTuple` instead of `Map` and `FlatMap` if the elements are tuples.

In [None]:
p = beam.Pipeline(InteractiveRunner())

elements = [
    (2, 1),
    (3, 4),
    (7, 11),
    (18, 29),
    (47, 76),
    (123, 199)
]

create = p | "Create" >> Create(elements)

tuple_mapped = create | MapTuple(lambda x, y: (x, y / x))
ib.show(tuple_mapped)

tuple_flatten = create | FlatMapTuple(lambda *x: [*x])
ib.show(tuple_flatten)

This is the version using `Map` and `FlatMap`:

In [None]:
p = beam.Pipeline(InteractiveRunner())

elements = [
    (2, 1),
    (3, 4),
    (7, 11),
    (18, 29),
    (47, 76),
    (123, 199)
]

create = p | "Create" >> Create(elements)

tuple_map = create | Map(lambda x: (x[0], x[1] / x[0]))
ib.show(tuple_map)

tuple_flatten = create | FlatMap(lambda x: [*x])
ib.show(tuple_flatten)

## Exercise

The next pipeline is going to create N lists of numbers. For each list, it needs to output the odd numbers as individual elements. As an example, an input element that consists of a list `[5, 24, 10, 13, 1]` will be transformed into three individual output elements `5`, `13` and `1`.

There are hints below and the solution at the end.

Since we are going to test if the pipeline is right, be sure to name the final pipeline `final`. 

In [None]:
from apache_beam.testing.util import assert_that
from apache_beam.testing.util import matches_all, equal_to
from utils.solutions import solutions

In [None]:
p = beam.Pipeline(InteractiveRunner())
    
elements = [[1, 6, 29, 17],
            [4, 7, 1729, 3],
            [3.1415]]

# TODO: Finish the pipeline 
final = (p | "Create" >> Create(elements)
        )

ib.show(final)

# For testing the solution - Don't modify         
assert_that(final, equal_to(solutions[2]))

### Hints

**Process created elements**
<details><summary>Hint</summary>
<p>

Since from one element (the list) we need to output one or more elements, you need to use a `FlatMap` or `ParDo`.
</p>
</details>


<details><summary>Code</summary>
<p>

Every element is an iterable, so you can just return the iterable from the `FlatMap`:   
```
create = (p | "Create" >> Create(elements)
            | "Flatmap" >> FlatMap(lambda x: x))
```

</p>
</details>

**Eliminate elements given according to a rule**
<details><summary>Hint</summary>
<p>

You need to filter the elements by odd or even, so we can use `Filter` (as always, you can use the general `ParDo` or even `Map`).
</p>
</details>

<details><summary>Code</summary>
<p>
    
```   
create | Filter(lambda x: x % 2 == 1)
```

</p>
</details>

**Full code**
<details><summary>Code</summary>
<p>

```
p = beam.Pipeline(InteractiveRunner())
    
elements = [[1, 6, 29, 17],
            [4, 7, 1729, 3],
            [3.1415]]

# TODO: Finish the pipeline 
final = (p | "Create" >> Create(elements)
   | FlatMap(lambda x: x)
   | Filter(lambda x: x % 2 == 1))

ib.show(final)

# For testing the solution - Don't modify         
assert_that(final, equal_to(solutions[2]))   
```
    

</p>
</details>
