# Functional programming in Python

### **Objective**

Many parallel or distributed data processing libraries or frameworks adopt concepts from FP. This notebook provides examples and exercises to familiarize students with functional programming (FP) concepts using Python and the "toolz" library. This notebook is designed to be self-contained and not rely on externally installed software like Apache Spark in order to make the exercise session as straightforward and reproducible as possible. We encourage the students to complete all exercises during the session or at home if necessary.

### **Agenda**
* inline functions
* compose
* partial application of functions
* map / filter / reduce / fold
* aggregations
* parallel stuff -> Dask

### **Materials**
* install Python / Anaconda
* install Jupyter

### **Important**
Note that the examples here are intended as didactic material to teach FP concepts, the are not necessarily idiomatic Python nor the best approach for every use case.

* https://github.com/jdorfman/awesome-json-datasets
* https://www.kaggle.com/usdod/world-factbook-country-profiles/data

In [43]:
import json
import toolz as tz
import dask.bag as db

## 1. Itertoolz

Operations on iterables.

In [5]:
map?

[0;31mInit signature:[0m [0mmap[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
map(func, *iterables) --> map object

Make an iterator that computes the function using arguments from
each of the iterables.  Stops when the shortest iterable is exhausted.
[0;31mType:[0m           type


In [6]:
tz.map?

[0;31mInit signature:[0m [0mtz[0m[0;34m.[0m[0mmap[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
map(func, *iterables) --> map object

Make an iterator that computes the function using arguments from
each of the iterables.  Stops when the shortest iterable is exhausted.
[0;31mType:[0m           type


In [15]:
mapped = map(lambda x, y: x + y, [0, 1, 2], [4, 5, 6])

In [16]:
list(mapped)

[4, 6, 8]

## 2. Dask Bag

* unordered 
* repeats allowed

In [19]:
b = db.from_sequence(range(0, 100))

In [23]:
sum_by_even = b.foldby(key=lambda x: x % 2 == 0, 
                       binop=lambda x,y: x+y)

In [24]:
list(sum_by_even)

[(False, 2500), (True, 2450)]

In [84]:
prizes = db.from_sequence(json.load(open('../data/nobel_prize.json'))['prizes'])

In [85]:
prizes.count().compute()

585

In [92]:
prizes.take(3)

({'category': 'physics',
  'laureates': [{'firstname': 'Rainer',
    'id': '941',
    'motivation': '"for decisive contributions to the LIGO detector and the observation of gravitational waves"',
    'share': '2',
    'surname': 'Weiss'},
   {'firstname': 'Barry C.',
    'id': '942',
    'motivation': '"for decisive contributions to the LIGO detector and the observation of gravitational waves"',
    'share': '4',
    'surname': 'Barish'},
   {'firstname': 'Kip S.',
    'id': '943',
    'motivation': '"for decisive contributions to the LIGO detector and the observation of gravitational waves"',
    'share': '4',
    'surname': 'Thorne'}],
  'year': '2017'},
 {'category': 'chemistry',
  'laureates': [{'firstname': 'Jacques',
    'id': '944',
    'motivation': '"for developing cryo-electron microscopy for the high-resolution structure determination of biomolecules in solution"',
    'share': '3',
    'surname': 'Dubochet'},
   {'firstname': 'Joachim',
    'id': '945',
    'motivation': '"

In [90]:
prizes.map(lambda x: x['category']).frequencies().compute()

[('chemistry', 109),
 ('literature', 110),
 ('peace', 98),
 ('medicine', 108),
 ('physics', 111),
 ('economics', 49)]