---
title: "Functional Programming"
author: "Vahram Poghosyan"
date: "2023-01-13"
categories: ["Functional Programming", "Recursion", "Scala"]
image: "functional_programming.png"
format:
  html:
    toc: true
    toc-depth: 5
    code-fold: true
jupyter: python3
include-after-body:
  text: |
    <script type="application/javascript" src="../../javascript/light-dark.js"></script>
---

# Introduction to Functional Programming (FP)

Functional programming draws inspiration from the mathematical definition of a function which is a well-defined operation on sets. 

## Mathematical Functions

Take $f: X \rightarrow Y$, which is a **function** that maps elements of set $X$ to those of set $Y$ such that for each $x$ in set $X$ (also denoted as "$x \in X$"), there's *one and only one* $y \in Y$ which satisfies $f(x) = y$. In ordinary language, we say that a mathematical function maps any given input to a unique output. That's not to say that $f$ can't map the two different $x$'s to the same $y$, but it can't map the same $x$ to more than one $y$. Notice that the output in set $Y$ depends *only* on the input from set $X$, and that the function $f$ *only* operates on set $X$ and nothing external to it. In other words there is no *hidden state* (i.e. some value outside $X$) that affects $f$-s output, so $f$ always produces predictable output. Also, $f$ doesn't affect any element in $X$ (or, for that matter, $Y$). The expression $f(x)$ is simply understood as the function $f$ applied to an element $x \in X$ which produces a value from the set $Y$ (but, once again, it's not like *the* specific element in the set $Y$ is somehow retrieved or affected in any way).

## Pure Functions and Side Effects

In functional programming, we loosely think of built-in **types** or abstract types (i.e. some chunks of computer memory) as being like sets. This lets us, somewhat awkwardly, pluck the mathematical definition of a function from the concrete world of mathematics and bring it into the practical world of software engineering. 

A **pure function** then, in the FP sense, is a function which depends only on its input (and not on any other value stored elsewhere in external computer memory or other external source) and it affects nothing outside itself. Additionally, like the mathematical functions they try to emulate, pure functions *have* to output a value and that value must be unique for a given input.

To recap, here are the above properties again.

### Properties of Pure Functions: Software Engineering Perspective

1. A pure function must return a single output for a given input 
2. A pure function's output should only depend on its input (in other words, no external hidden state should affect the output)
3. A pure function shouldn't change any external state (i.e. an external chunk of computer memory)
4. <span id="my-text-highlight">There shall be no mutation of state, in general</span>

#### Nice Consequences of Working With Pure Functions
   
Working with pure functions conveys some great benefits. For instance, properties 1 and 2 make pure functions interchangeable with their output value just as, say, $f(2)$ given $f(x)=x^2$ can, just as well, stand in for the number $4$. This allows us to pass pure functions as arguments into other pure functions, as well as return pure functions from other pure functions with entirely predictable results just as we often do in mathematics where we are comfortable substituting the representation of a value for the value itself (for instance when computing partial derivatives on computation graphs or, as it's more commonly known, calculating the gradient of the loss function using a back-propagation algorithm in machine learning). By contrast, if our functions affected external state somewhere or produced other such **effects** (more on these in the section on **side effects**) such as returning random output or some other type of non-deterministic output (i.e. output that doesn't strictly depend on input), it would be a lot more difficult to model our programs as pure chains of function calls (making them harder to reason about mathematically). So, functional programs are essentially those programs that are as close to being represented as a composition of pure functions as we can get. There is also some mathematical syntactic sugar that comes with reasoning about our software as a complex chain of pure functions. It's the difference between having a *declarative* (as opposed to an *imperative*) style of writing software. Read the next section to see what *declarative* and *imperative* mean in the given context. 

### Declarative vs Imperative Styles

At a basic level, an *imperative* style of programming can be likened to cooking at home with a cookbook. Imperative languages look more like a list of commands directed at the computer. Declarative writing, by contrast, can be compared to dining at a restaurant. We aren't issuing commands at a grueling level of granularity (e.g. iterating over an array manually, appending to a list manually, etc.). Instead, we're specifying the desired outcome without the implementation details like we would in mathematics when we, for instance, write $f(x)=x^2$ succinctly and know it to mean *"square the value of $x$ if $x$ is scalar, or square every feature of $x$ if $x$ is a vector."*

In the imperative style, we're saying "step through the list, read each item, square it and append it to a new list." In the declarative style we're simply saying "square every element of the list." Of course these differences are semantic, and in real life software contains a healthy mix of both styles. The distinction is not really black or white, and is often dependent on the implementation of the given language.

However, an example is worth a thousand words and, since Python provides a good enough playground for showcasing both of these styles, here is an example in Python.

**Imperative Style**

In [2]:
#| code-fold: false
numbers = [1, 2, 3, 4, 5]
squared = []
for num in numbers:
    squared.append(num ** 2)

print(squared)

[1, 4, 9, 16, 25]


**Declarative Style**

In [5]:
#| code-fold: false
numbers = [1, 2, 3, 4, 5]
squared = map(lambda x: x**2, numbers)

print(list(squared))

[1, 4, 9, 16, 25]


### Side Effects

Functions which violate any of these properties are said to produce **side effects**. The most common side effect is when a function modifies state (i.e. a chunk of computer memory) outside itself (violating property 3). Examples of side effects include:

| Effect   | Functional Programming Way |
|----------|----------------------------|
|   A function directly modifying a variable defined in the global scope.  |   The FP approach is to pass the global variable as input instead, and have the function return a modified copy of the input. |
|   A function writing to an external database. |   This is an example of an *unavoidable side effect* in practice. The FP approach is to mitigate. Specifics are language dependent, but usually the strategy involves gathering all such unavoidable side effects into one impure corner of the code, and keeping the rest of the code pure.  |
|   A function like the built-in functions of printing to the console, retrieving system time, or a random number generator (or those functions which use them)  |   Yet more examples of unavoidable side effects. Such functions are inherently dependent on external or hidden state such as the time of day in the real world and, in general, things other than their input. |

Although some side effects are unavoidable, we should minimize their use in our code. Functional programming languages offer to do just that.

## Instructions (or Statements) vs Expressions 

In functional programming, we also distinguish between mere **instructions** to the computer (also known as **statements**) and **expressions**. This distinction is similar to the distinction between functions in the traditional programming sense and pure functions in that expressions must also return a value. Contrast this requirement with instructions like the `if`/`else` statements and `while` loops which simply direct the control flow but don't necessarily evaluate to anything.

As mentioned earlier, the use of instructions and impure functions is unavoidable at times. Different languages have different strategies of mitigating these impurities. Usually the aim is to gather the impurities together at the top in some clearly demarcated lexical block. Furthermore, some languages (such as Scala which is a blend of OOP and FP), go to great lengths to minimize side effects by enforcing the return requirement on instructions. All instructions in Scala evaluate to a value, effectively making them expressions. 

The idea is to use a clever type system to capture impurities. If side effects must exist, they should be known to Scala. To achieve this, instructions in Scala return a type known as `Unit` which can hold only `()` as its value. So, in Scala, instructions are essentially treated as expressions which return this very specific type. We will see an example illustrating the power of this design choice in the next section. 

We already touched on this briefly when we discussed declarative and imperative styles of writing code, but the distinction between instruction and expression further leads functional programming to favor certain programming styles over others.

### Control Flow: Conditional Statements and Loops

In functional languages like Scala, `if` statements are implemented as expressions similar to the familiar **ternary expressions** in Python. Scala also has `if` statement like the regular `if` statements of Python, but the expression is what's preferred. Here are some examples to show the FP approach to writing `if` statements in both languages:

**Python:**
```python
x = 1 if condition == True else 0
```

**Scala:**
```scala
val x = if (condition) 1 else 0
```

In this example, `x` evaluates to a value: one of possible two. The `if` expression may still produce a side effect, but it's not as open-ended as a normal `if` statement. In a normal `if` statement, the programmer *might* do something entirely crazy and unheard of such as accessing a database, or printing a line to the console (both considered side effects). 

This brings us to an important point. It's not that `if` statements would *necessarily* result in side effects, it's that functional programming discourages the use of language constructs that lend themselves to producing side effects more easily. Syntactic choices like this are a common theme in FP. For instance, Scala's choice to treat `()` as a returnable value rather than just syntax is very deliberate. In Scala, `for`-comprehensions are favored over `for` loops.

**Scala For Comprehension:**
```scala
val myNewList = 
    for {
        element <- myOldList
    } yield (element)
```

The above snippet copies the contents of `myOldList` into `myNewList`. It's important to note that this isn't the best way to copy a list in Scala, it's just a toy example of `for`-comprehensions with the intent to demonstrate two things about them: 

* The `for` comprehension is treated as an expression which returns a value captured by `myNewList`.
* If we wished to return no value, we'd simply say `yield ()`. However, because of Scala's cleverly designed type system, `()` actually *is* a value so even instructions, which normally wouldn't return anything, do return *something* in Scala with no additional effort. 

### Function Composition vs Iteration: Higher-Order Functions (HOFs)

Similarly to frowning upon the use of `if` statements, `while` loops (and, to an extent, also `for` loops) are considered bad practice in functional programming because, of course, the iterator is modified at each iteration and <span id="my-text-highlight">functional programming frowns upon mutation of state in general</span>. 

Take, for example, a while loop that runs until a key press (or any other user input). Of course, this may be an unavoidable side effect in many cases. But then the FP approach would just be to contain this impurity somewhere with the rest of its kind. 

In general, instead of **iteration**, **recursion** (or, more generally, **function composition**) is preferred. Normally, we use recursion when dealing with recursive data structures (like trees) or when the problem has some [optimal substructure](https://en.wikipedia.org/wiki/Optimal_substructure), but functional programming prefers this approach in general. This may sound tedious at first, but there are a few familiar examples of function composition (a broader category of recursion) that have been adopted by popular languages like Python and are very intuitive to work with (especially when dealing with data pipelines). Some of these examples are the `map` and the `filter` functions. We already saw an example of `map` in the declarative code snippet above, so we won't dive into its details here. Both `map`and `filter` are examples of **higher-order functions** (**HOF**s) -- functions that take other functions as input and/or output other functions. The `map` and `filter` functions show that function composition can be very readable and intuitive. Furthermore, neither `map` nor `filter` modify their input in-place, but rather return a modified copy of it to avoid external state mutation which, as we know, is considered good practice in FP.

#### Higher-Order Functions

To pipe functions into other functions (as in function composition), we need functions that take other functions as input and can output functions. When we treat functions this way, we basically treat them as *first-class values* which means like any other value, they can be passed and returned around. There is a mathematically-inspired reason, other than designing software as function composition, for using HoFs. 

When mathematicians write: 

$$
\sum_{x=a}^{b} f(x)
$$

where, say $a,b \in \mathcal{Z}$, they understand that $f$ stands for some general function. So there's no need to write a separate expression for summing the integers, one for summing the *squares of the integers*, and one for summing the *factorials of integers* between $[a,b]$. 

Let's write a function in Scala that sums the integers:

```scala
def sumInts(a: Int, b: Int): Int = 
    if a > b then 0 else a + sumInts(a + 1, b)
```

To get the sum of squares, we'd need to define another function:

```scala
def square(x: Int): Int = x * x

def sumSquares(a: Int, b: Int): Int = 
    if a > b then 0 else square(a) + sumSquares(a + 1, b)
```

But there's clearly some repetition here, so we can factor out a common pattern. What if we changed the signature of `sumInts` to take a function as argument? 

```scala
def sum(f: Int => Int, a: Int, b: Int): Int =
    if a > b then 0 else f(a) + sum(f, a + 1, b)
```

Now we can write: 

```scala
def sumInts(a: Int, b: Int) = sum(id, a, b) 
```

where `id` is the identity function:

```
def id(x: Int): Int = x
```

and:

```scala
def sumSquares(a: int, b: Int) = sum(square, a, b)
```

Admittedly this doesn't look great because we're creating a lot of boilerplate functions. To get rid of this boilerplate, Scala was the first language to introduce the notion of *anonymous functions* (in Python these are known by the `lambda` keyword). We can think of anonymous functions as literals. Similar to how we can do `println("hello world!")` without having to name the string literal `"hello world!"` using a variable, we can declare functions as literals. Using anonymous functions, the `id` and `square` functions above can be written respectively as: 

```scala
(x: Int): Int => x
(x: Int): Int => x * x
```

This reduces our `sumInts` and `sumSquares` to:

```scala
sumInts(a: Int, b: Int) = sum(x => x, a, b) // Types can be omitted if they can be inferred from context
sumSquares(a: Int, b: Int) = sum(x => x * x, a, b)
```

Anonymous functions are syntactic sugar. That is, they aren't necessary but make life easier.

#### Currying

But so far we've only used the HoF's ability to accept functions as input. Let's also use their ability to output functions. The pattern we are about to learn is called *currying*, and it's useful for dependency injection. 

Note, again, the functions: 

```scala
sumInts(a: Int, b: Int) = sum(x => x, a, b)
sumSquares(a: Int, b: Int) = sum(x => x * x, a, b)
```

Both `a` and `b` are passed into each unchanged. Is there a common pattern we can extract? We can use currying which is just *partial application* of the function. Here's the curried version of `sum`:

```scala
def sum(f: Int => Int): (Int, Int) => Int =
    def sumFn(a: Int, b: Int): Int =
        if a > b then 0 else f(a) + sumF(a + 1, b)
    sumF
```
Our `sum` functions now constructs and returns a new function which takes the rest of input the (`a` and `b`). This is called *partial-application*. We've split the `sum` into two parts: the first part accepts only the argument `f` as input, the second one accepts the rest of the arguments. 

Now we can define `sumInts` and `sumSquares` respectively as just: 

```scala
def sumInts = sum(x => x)
def sumSquares = sum(x => x*x)
```
Note that when we call `sum`, we get back a function with signature `(Int, Int) => Int` (which is exactly the signature we want for `sumInts` and `sumSquares`). We can now use better readable syntax like `sum(cube)(1,5) + sum(squares)(5,10)` doing away with `sumInts` and `sumSquares` (using which we'd have to write the above as: `sumInts(1,5) + sumSquares(5,10)`).

Since it can get quite clumsy to write curried functions, Scala provides a shorthand. This is equivalent to the curried `sum` written above.

```scala
def sum(f: Int => Int)(a: Int, b: Int): Int = 
    if a > b then 0 else f(a) sum(f)(a + 1, b)
```

In Python, there's support for curried functions in the functools library ([functools.partial](https://docs.python.org/3/library/functools.html#functools.partial), which is itself just a curried function).

Currying, in general, can be applied $n$ times to an $n$-dimensional function where each outer function returns an anonymous inner function which takes care of the partial application of the rest of the parameters. This means languages need not have support for functions with parameters, as long as they have support for anonymous functions.

### Functional Programming Benefits

#### Parallelization / Parallel Programming

In terms of *parallelization*, both iterative and recursive solutions can be *sequential processes*, which don't lend themselves well to parallelization, or *independent processes* which do. However, FP still confers some benefit in terms of parallelization -- not because it favors recursion but, instead, because:


* A common challenge in parallel programming is to avoid mutating data while another **thread** is using it. Due to state immutability principles in FP, this problem is eliminated
* FP avoids writing functions which rely on hidden state (i.e. any state that's not a direct input), so functions can be executed in parallel without the concern of synchronizing access to some shared state.
* FP can make it easier to identify opportunities for parallelization 
* Languages which are built around FP have powerful parallelization libraries that offer parallelized versions of common operations like `map`

### Functional Programming Hazards

#### Tail Recursion - Avoiding Stack Overflow

If we're going to favor the use of recursion (or, in general function composition) in FP over the more imperative style of writing iterative algorithms, we ought to tread carefully as to not cause **stack overflow** (which, as we know, is when the system runs out of working memory). **Tail recursion optimization** (similar to other techniques like **memoization**) helps us drastically cut the amount of stack memory used. It takes a constant amount of memory on the stack, instead of the linear, with input size, or worse. Read more about tail recursive optimization [here](../general_computer_science/recursion_optimizations.ipynb).