---
title: "A Beginner's Introduction to Concepts in Functional Programming"
author: "Vahram Poghosyan"
date: "2023-01-13"
categories: ["Functional Programming", "Recursion", "Scala"]
format:
  html:
    toc: true
    toc-depth: 5
    code-fold: true
jupyter: python3
include-after-body:
  text: |
    <script type="application/javascript" src="../../javascript/light-dark.js"></script>
---

Functional programming draws inspiration from the mathematical definition of a function which is a well-defined operation on sets. 

## Mathematical functions

Take $f: X \rightarrow Y$, which is a **function** that maps elements of the set $X$ to those of the set $Y$, such that for each $x$ in $X$ (also denoted "$x \in X$"), there's *one and only one* $y \in Y$ that satisfies the equation $f(x) = y$. In plain language we say that a mathematical function maps any given input to its own unique output (that depends only on the input). That's not to say that $f$ can't map the two different inputs, $x_i$, and $x_j$, to the same output $y$, but it *cannot* map the same input $x_i$ to more than one output. 

Notice that the output in $Y$ depends *only* on the input from set $X$, and that the function $f$ *only* operates on set $X$ and nothing external to it. In other words there is no *hidden state* (some value outside of $X$) that affects $f$-s output, so $f$ always produces predictable output. What's more $f$ doesn't really alter any element in $X$ itself (or, for that matter, in $Y$). The expression $f(x)$ is simply understood as the function $f$ applied to an element $x \in X$ which maps it to an element in set $Y$. However, it's not like *the* specific element in the set $X$ is somehow retrieved (as it is sometimes, by reference, in programming) and overwritten in any way. 

The idea behind functional programming is to bring code close to this mathematical elegance, allowing us to better reason about the systems we write.

## Pure functions and side effects

We can define some rules for the functions we write in our code to match the mathematical properties of functions, bridging the concrete world of mathematics with the practical world of software engineering. 

A **pure function**, in the FP sense, is a function which depends only on its input (and not on any other value stored elsewhere in external computer memory or other external source). A pure function affects nothing outside itself. Additionally, pure functions *must* output a value and that value must be unique for a given input.

To recap:

1. A pure function must return a single output for a given input 
2. Its output should only depend on its input
3. A pure function shouldn't change any internal or external state

Given that a functional program is just a composition of pure functions, and that state changes are often what affects the real-world outcome of our function calls (more on these **side-effects** later), the last property effectively means that pure functions don't mutate state at all. This is also the reason state mutation is often frowned upon, in general, in functional programming. This presents unique challenges, as you might expect, given that so many operations in the traditional paradigm of programming used within countless real-world codebases *do* mutate state (in fact, in some cases, they have to -- for anything useful to ever happen). But some state mutation is definitely avoidable.Take, for instance, a `for` loop which increments its index on every iteration. We will see how functional programming languages attempt to bypass for loops, or iteration in general, in favor of function composition (syntactic sugars, in your choice of language, will become indispensable here!).  

### "Nice" consequences of working with pure functions
   
Working with pure functions conveys some great benefits. For instance, properties (1) and (2) make pure functions interchangeable with their output (just as, say, $f(2)$ given $f(x)=x^2$ can reliably be substituted for the number $4$ in math). This allows us to pass in pure functions as arguments into other pure functions (as well as return them as output) with entirely predictable results. If a function, by contrast, printed something to the console, along with evaluating the square, we would consider that an **effectful** function (and therefore it would be considered impure). Such a function *cannot* be reliably substituted by its output because it also affects an external state, producing an effect that the output alone does not capture. This benefit, to reliably substitute the representation of a value for the value itself, yields nice benefits. In mathematics, for example in the field of Deep Learning, we are able to cheaply compute the gradient of a loss-function using the back-propagation algorithm which is simply storing intermediate values during the forward-pass so that, during the backwards-pass (or what's known as **back-propagation**), we avoid redundant calculations... But this simple substitution relies on the fact that the function evaluated at input is the same as the output. By contrast, if functions in math affected external state somewhere, or produced other such side-effects (more of which we will see in the section dedicated to [**Side-effects**](./functional_programming.ipynb#Side-Effects)), it would be a lot more difficult to come up with the optimization known as back-propagation in ML. So, when we use pure functions, we gain *mathematical* insights about our programs. In effect, we're just modeling the solution as a chain of pure function calls and storing the results as immutable state. This also imposes a certain component of horizontal eye-movement when it comes to reading functional code. 

In the next section we look at the differences between **declarative** and **imperative** styles of writing software and why functional programming prefers the former style.

## Declarative vs imperative styles

At a basic level, an *imperative* style of programming can be likened to cooking at home with a cookbook. Imperative languages look more like a list of commands directed at the computer. Declarative writing, by contrast, can be compared to dining at a restaurant. We aren't issuing commands at a grueling level of granularity (e.g. iterating over an array manually, or appending to a list). Instead, we're specifying the desired outcome without the implementation details like we would in mathematics when we, for instance, write $f(x)=x^2$ succinctly (implying to square every feature of the input vector $x$). We prefer declarative code to imperative in FP partly because imperative code involves a lot of state mutation and partly because writing pure functions facilitates writing declarative code.

In the imperative style, we're saying "step through the list, read each item, square it and append it to a new list." In the declarative style we're saying "just square every element of this list." These differences are mostly semantic and, in real life, software contains a mix of both styles. The distinction is also not really black or white, and is often dependent on the implementation of the given language.

An example is worth a thousand words and, since Python provides a good enough playground for showcasing these styles, here is an example in Python.

**Imperative Style**

In [2]:
#| code-fold: false
numbers = [1, 2, 3, 4, 5]
squared = []
for num in numbers:
    squared.append(num ** 2)

print(squared)

[1, 4, 9, 16, 25]


**Declarative Style**

In [5]:
#| code-fold: false
numbers = [1, 2, 3, 4, 5]
squared = map(lambda x: x**2, numbers)

print(list(squared))

[1, 4, 9, 16, 25]


Notice how in the declarative style we merely instructed our function to square each feature, but we didn't tell the program how to do it in grueling detail and we avoided the use of a `for` loop (which means we avoided mutating the state of the index of the loop). Also, more lines in the declarative style return a value, rather than just carrying our instructions (we will see the difference between mere instructions, or **statements** and pure expressions later on). However, since there's printing to the console at the end, even the declarative program would not be considered functional. 

### Side-effects

Functions which violate any of the three afforementioned properties are said to produce **side effects** (or simply *effects*). The most common side effect is when a function modifies a state (i.e. a chunk of computer memory) outside itself (violating property (3)). Examples of side effects include:

| Effect   | Functional Programming Way |
|----------|----------------------------|
|   A function directly modifying a variable defined in the global scope.  |   The FP approach is to pass the global variable as input instead, and have the function return a modified copy of the input. |
|   A function writing to an external database. |   This is an example of an *unavoidable side effect* in practice. The FP approach is to mitigate. Specifics are language dependent, but usually the strategy involves gathering all such unavoidable side effects into one impure corner of the code, and keeping the rest of the code pure.  |
|   A function like the built-in functions of printing to the console, retrieving system time, or a random number generator (or those functions which use them)  |   Yet more examples of unavoidable side effects. Such functions are inherently dependent on external or hidden state such as the time of day in the real world and, in general, things other than their input. |

Although some side effects are unavoidable, we should minimize their use in our code. Functional programming languages offer just that ability.

### Instructions (statements) vs expressions 

In functional programming, we distinguish between mere **instructions** to the computer (which are also sometimes known as **statements**) and **expressions** (or **pure expressions**). This distinction is similar to that between functions, in the programming sense, and pure functions in the mathematical sense -- Expressions, like functions, must always return a value. Contrast this with instructions like the traditional `if`/`else` statements or loops like `while` which control the flow of execution, but don't evaluate to anything. 

As mentioned earlier, such effects are unavoidable at times. However, functional languages have different strategies of mitigating these impurities and writing pure code anyway. Usually they aim to gather the impurities together at the top or bottom of the code. Some languages (such as Scala which is a blend of OOP and FP), go to great lengths to minimize side effects by enforcing the return requirement of its syntactic structures. Even though Scala has the traditional `for` loop as an instruction, it favors the use of `for`-comprehensions which are essentially syntactic sugar (enabled by **monadic** types that capture effects, more on these later). Each line of a `for`-comprehension in Scala evaluates to a value.

The idea is to use a clever type system to capture effects. If side effects must exist, they should be known to Scala. To achieve this, Scala has a monadic type known as `Unit` which can hold only `()` as its value. This is its designated side effect type. So, functional programming languages elevate instructions or statements, which normally don't return anything, the status of pure expressions by returning a dedicated side effect type. In practice there are many types for different side effects (for example, an `IO` monadic type captures side effects produced by operations like console logging). Let's see some examples of how Scala does away with traditional `for` loops and `if`/`else` statements and uses pure expressions instead.

### Control flow: conditional statements and loops

In Scala, `if` statements are implemented as expressions similar to the familiar **ternary expressions** in Python. Scala has traditional `if` statements too, but the `if` pure expression is what's preferred. Here are some examples that demonstrate difference the difference in both Python and Scala:

**Python:**
```python
x = 1 if condition == True else 0
```

**Scala:**
```scala
val x = if (condition) 1 else 0
```

In this example, `x` necessarily evaluates to a value: one of possible two. This `if` expression will not produce a side effect as would an open-ended `if` statement. Inside an `if` statement, the programmer *might* just do something crazy and unheard of like accessing a database, or printing a line to the console (both considered side effects). 

This brings us to an important point. It's not that `if` statements would *necessarily* result in side effects, it's just that functional programming discourages the use of language constructs that lend themselves to producing side effects more easily. Syntactic choices like this are a common theme in FP. For instance, Scala's choice to treat `()` as a returnable value (of type `Unit`) rather than just a piece of syntax is very deliberate. Let's see why by examining a Scala, `for`-comprehension.

**Scala For Comprehension:**
```scala
val result = 
    for {
        _ <- print("Hello")
        _ <- print("World!")
    } yield ()
```

It may not look like it, but the code snippet above (showing a `for`-comprehension) is one of the ways in which Scala actually chains many potentially side-effect producing operations together via function composition (which is what's going on in the background). Notice three things about it: 

* The `for` comprehension returns a value captured by `result`.
* The print statement produces a side effect which is discarded as _. 
* At the end we simply say `yield ()`... If we wished to return a value instead we would do so inside the `()` however, because Scala associates a type with `()`, what's actually returned is the side-effect captured as a `Unit` (so the for comprehension returns *something*)

### Function composition vs iteration and higher-order functions (HOFs)

Because FP frowns upon the use of `if`/`else` and `for`/`while` statements, it prefers function composition to iteration. In fact, the `for`-comprehension above is just cleverly disguised function composition.

Take, for example, a `while` loop that runs until a key press (or any other user input). Of course, this may be an unavoidable side effect in the real world. The FP approach would, then, just be to contain this impurity somewhere with the rest of its kind. 

In general, instead of iteration, function composition is preferred (mathematical readers will understand that *recursion*, which we're used to solving problems with in CS, is just a type of function composition). Functional programming prefers this approach most of the time. This may sound tedious and almost like having to re-learn how to program, at first, but syntactic sugars and other abstractions exist to make this pattern more readable (like `for`-comprehension in Scala, or *do-notation* in Haskell). But also, it does involve thinking abour programming in new ways, and that's very much the point! 

There are already a few familiar examples of function composition that have been adopted by popular languages like Python, and are very intuitive (especially when dealing with applications in data pipelines). Some famous examples are the `map` and the `filter` functions in Python. We already saw an example of `map` in the declarative code snippet above, so we won't dive into its specifics here. But both `map` and `filter` are examples of **higher-order functions** (**HoF**s) -- functions which take other functions as input and/or themselves output other functions. Functions `map` and `filter` show that function composition can be very readable and intuitive... Furthermore, neither `map` nor `filter` modify their input in-place. Rather, they return a modified copy of the input to avoid external state mutation which is considered poor practice in FP. Later on, we shall see that a related method called `flatMap` exists that turns out to play a key role in allowing functional programmers to write code by chaining multiple, potentially side effect-producing, functions together. This is due to `flatMap`'s unique function signature (one which flattens arrays *of* arrays back into a *one-dimensional* array), as we will see. 

#### Higher-order functions

To pipe functions into other functions (as in function composition), we need functions that take other functions as input and can also output functions. When we treat functions this way, we basically treat them as *first-class values* which means like any other value, they can be passed and returned around. There is a mathematically-inspired reason, other than designing software as function composition, for using HoFs. 

When mathematicians write: 

$$
\sum_{x=a}^{b} f(x)
$$

where, say $a,b \in \mathcal{Z}$, they understand that $f$ stands for some general function. So there's no need to write a separate expression for summing the integers, one for summing the *squares of the integers*, and one for summing the *factorials of integers* between $[a,b]$. 

Let's write a function in Scala that sums the integers:

```scala
def sumInts(a: Int, b: Int): Int = 
    if a > b then 0 else a + sumInts(a + 1, b)
```

To get the sum of squares, we'd need to define another function:

```scala
def square(x: Int): Int = x * x

def sumSquares(a: Int, b: Int): Int = 
    if a > b then 0 else square(a) + sumSquares(a + 1, b)
```

But there's clearly some repetition here, so we can factor out a common pattern. What if we changed the signature of `sumInts` to take a function as argument? 

```scala
def sum(f: Int => Int, a: Int, b: Int): Int =
    if a > b then 0 else f(a) + sum(f, a + 1, b)
```

Now we can write: 

```scala
def sumInts(a: Int, b: Int) = sum(id, a, b) 
```

where `id` is the identity function:

```
def id(x: Int): Int = x
```

and:

```scala
def sumSquares(a: int, b: Int) = sum(square, a, b)
```

Admittedly this doesn't look great because we're creating a lot of boilerplate functions. To get rid of this boilerplate, Scala was the first language to introduce the notion of *anonymous functions* (in Python these are known by the `lambda` keyword). We can think of anonymous functions as literals. Similar to how we can do `println("hello world!")` without having to name the string literal `"hello world!"` using a variable, we can declare functions as literals. Using anonymous functions, the `id` and `square` functions above can be written respectively as: 

```scala
(x: Int): Int => x
(x: Int): Int => x * x
```

This reduces our `sumInts` and `sumSquares` to:

```scala
sumInts(a: Int, b: Int) = sum(x => x, a, b) // Types can be omitted if they can be inferred from context
sumSquares(a: Int, b: Int) = sum(x => x * x, a, b)
```

Anonymous functions are syntactic sugar. That is, they aren't necessary but make life easier.

### Currying

But so far we've only used the HoF's ability to accept functions as input. Let's also use their ability to output functions. The pattern we are about to learn is called *currying* (after Haskell Curry), and it's useful for, among other things, dependency injection. 

Note, again, the functions: 

```scala
sumInts(a: Int, b: Int) = sum(x => x, a, b)
sumSquares(a: Int, b: Int) = sum(x => x * x, a, b)
```

Both `a` and `b` are passed into each unchanged. Is there a common pattern we can extract? We can use currying which is just *partial application* of the function. Here's the curried version of `sum`:

```scala
def sum(f: Int => Int): (Int, Int) => Int =
    def sumFn(a: Int, b: Int): Int =
        if a > b then 0 else f(a) + sumF(a + 1, b)
    sumF
```
Our `sum` functions now constructs and returns a new function which takes the rest of input the (`a` and `b`). This is called *partial-application*. We've split the `sum` into two parts: the first part accepts only the argument `f` as input, the second one accepts the rest of the arguments. 

Now we can define `sumInts` and `sumSquares` respectively as just: 

```scala
def sumInts = sum(x => x)
def sumSquares = sum(x => x*x)
```
Note that when we call `sum`, we get back a function with signature `(Int, Int) => Int` (which is exactly the signature we want for `sumInts` and `sumSquares`). We can now use better readable syntax like `sum(cube)(1,5) + sum(squares)(5,10)` doing away with `sumInts` and `sumSquares` (using which we'd have to write the above as: `sumInts(1,5) + sumSquares(5,10)`).

Since it can get quite clumsy to write curried functions, Scala provides a shorthand. This is equivalent to the curried `sum` written above.

```scala
def sum(f: Int => Int)(a: Int, b: Int): Int = 
    if a > b then 0 else f(a) sum(f)(a + 1, b)
```

In Python, there's support for curried functions in the functools library ([functools.partial](https://docs.python.org/3/library/functools.html#functools.partial), which implements a curried version of a function you pass to it (by itself currying the input).

Currying, in general, can be applied $n$ times to an $n$-dimensional function, each outer function returning an (anonymous) inner function which partially applies the rest of the parameters. This means languages need not have support for functions with parameters, as long as they have support for anonymous functions. In fact, the most minimal programming language, a [lambda calculus](https://learnxinyminutes.com/docs/lambda-calculus/#:~:text=Lambda%20calculus%20(%CE%BB%2Dcalculus),to%20represent%20any%20Turing%20Machine!), does away with parameters by relying on currying alone.

### Monads

*Monads* are a term borrowed from Category Theory. They have a strict definition in mathematics, but for our purposes they're useful black boxes that provide the following benefits (some already mentioned before in passing). Monads provide a way to compose potentially side-effect producing functions together, and in general they make function composition lend itself to being abstracted behind syntactic sugars (like [do notation](https://en.wikibooks.org/wiki/Haskell/Simple_input_and_output) in Haskell) which make functional programs more readable and more useful for in an applied sense. As a sidenote, to the readers who are familiar with JavaScript `Promises`, these are, in essence, the same as the side-effect type `Unit` in Scala (or, rather, more like the more specific `IO` side-effect). The main point is that a `Promise` *also* a monad, which makes it possible to come up with nice, syntactic sugars like `async` `await`. In fact, do notation in Haskell is actually the generalized version of this type of `async` `await` pattern that works with *any monad* and not just a `Promise`.

Monads, as far as we're concerned, are abstract classes that implement `flatMap` (also `map`, also an *identity map*, but the big picture is lost in the details). Yes, this is the same `flatMap` we discussed earlier in the context of it being one of the higher-order functions that are popularly used in more mainstream languages like Python. It turns out, chaining two side-effect producing operations by function compositi produces nested side-effects (and if we're using a language that uses *types* to capture side-effects, then we quickly generate some nasty, nested types). For example, chaining two IO operations that prompt the user for two strings may produce something like `IO(IO(String))`. So what's `flatMap`'s role in all of this? Our friend, `flatMap`, with its unique capacity to flatten, is *exactly* the thing that's needed to get an `IO(String)`) back! We will discuss this flattening property of `flatMap` in more detail later on. For now, it's important to re-iterate that having an implementation of `flatMap`, which may go by other names in other languages (e.g. `Bind` in Haskell), is, along with a few other key properties, what qualifies an abstract class to be a monad. 

For more on Monads, and `flatMap`, check out this excellently visualized [blog post by Matt Gllagher](https://www.cocoawithlove.com/blog/an-aside-about-flatmap-and-monads.html). 

So, it's no wonder that in a functional programming language we want our side effect-producing expressions to return a monadic type, so that we can chain two or more of such expressions together (using function composition) without generating a nested mess.


## Benefits of functional programming

### Parallelization

FP confers some benefit in terms of parallelization because:


* A common challenge in parallel programming is to avoid mutating data while another **thread** is using it. Due to state immutability principles in FP, this problem is eliminated
* FP avoids writing functions which rely on hidden state (i.e. any state that's not a direct input), so functions can be executed in parallel without the concern of synchronizing access to some shared state.
* FP can make it easier to identify opportunities for parallelization 
* Languages which are built around FP have powerful parallelization libraries that offer parallelized versions of common operations like `map`

## Functional programming hazards

### Tail recursion: avoiding stack overflow

If we're going to favor the use of recursion (or, in general function composition) over the more imperative style of coding, we ought to tread carefully as to not cause **stack overflow** (which, as we know, is when the system runs out of working memory). **Tail recursion optimization** (similar to other techniques like **memoization**) helps us drastically cut the amount of stack memory used. It takes a constant amount of memory on the stack, instead of linear or worse. Read more about tail recursive optimization [here](../../unpublished_posts/general_computer_science/recursion_optimizations.ipynb).