# Dive!

The analysis that follows pertains to the second day of the [Python Problem-Solving Bootcamp](https://mathspp.com/pythonbootcamp).

In the analysis that follows you may be confronted with code that you do not understand, especially as you reach the end of the explanation of each part.

If you find functions that you didn't know before, remember to [check the docs](https://docs.python.org/3/) for those functions and play around with them in the REPL.
This is written to be increasing in difficulty (within each part of the problem), so it is understandable if it gets harder as you keep reading.
That's perfectly fine, you don't have to understand everything _right now_, especially because I can't know for sure what _your level_ is.

## Part 1 problem statement

(Adapted from [Advent of Code 2021, day 2](https://adventofcode.com/2021/day/2))

You will be given a series of instructions like

```txt
forward 5
down 5
forward 8
up 3
down 8
forward 2
```

These instructions will change your horizontal position and your depth, two values you need to keep track of:

 - `forward X` increases the horizontal position by X units;
 - `down X` increases the depth by X units; and
 - `up X` decreases the depth by X units.

Your horizontal position and depth both start at 0. The steps above would then modify them as follows:

 - `forward 5` adds 5 to your horizontal position, a total of 5.
 - `down 5` adds 5 to your depth, resulting in a value of 5.
 - `forward 8` adds 8 to your horizontal position, a total of 13.
 - `up 3` decreases your depth by 3, resulting in a value of 2.
 - `down 8` adds 8 to your depth, resulting in a value of 10.
 - `forward 2` adds 2 to your horizontal position, a total of 15.

After following these instructions, you would have a horizontal position of 15 and a depth of 10. (Multiplying these together produces 150.)

**Calculate the horizontal position and depth you would have after following the planned course. What do you get if you multiply your final horizontal position by your final depth?**

_Using the input file `inputs/02_dive.txt`, the result should be 1727835._

In [1]:
# IMPORTANT: Set this to the correct path for you!
# INPUT_FILE = "inputs/02_dive.txt"
INPUT_FILE = "data/input.txt"

### Baseline solution

This problem is very straightforward, in that we do not have to interpret the problem statement very much to understand what we need to do from a conceptual standpoint:

 - we have a file with a series of lines;
 - each line contains an instruction and a number; and
 - the instruction and the number modify our current state.

With this in mind, we can start by experimenting with reading the file, going through each line, and interpreting the instruction:

In [2]:
with open(INPUT_FILE, "r") as f:
    instructions = f.readlines()

horiz_pos, depth = 0, 0
for line in instructions:
    command, value = line.split()
    value = int(value)
    
    if command == "forward":
        horiz_pos += value
    elif command == "up":
        depth -= value
    elif command == "down":
        depth += value
    else:
        raise ValueError("Unknown command.")

print(horiz_pos * depth)

1990000


There is nothing too wild going on here.

Perhaps the thing that can easily go most unnoticed is the fact that the line `command, value = line.split()` is already doing some input validation for us:
the fact that we are unpacking into `command, line` means we are assuming that `line.split()` returns two values.
If it returns any other number of values, we get a `ValueError`:

In [13]:
command, value = "cmd val otherthing".split()

ValueError: too many values to unpack (expected 2)

In [14]:
command, value = "cmd".split()

ValueError: not enough values to unpack (expected 2, got 1)

(You can read a bit more about unpacking [here](https://mathspp.com/blog/pydonts/deep-unpacking).)

### Pattern matching

If you are using Python 3.10 or newer, you might be tempted to use [structural pattern matching](https://mathspp.com/blog/pydonts/structural-pattern-matching-tutorial) here.
We can write a solution using `match` that is remarkably similar to the solution using `if`:

In [4]:
with open(INPUT_FILE, "r") as f:
    instructions = f.readlines()

horiz_pos, depth = 0, 0
for line in instructions:
    command, value = line.split()
    value = int(value)
    
    match command:
        case "forward":
            horiz_pos += value
        case "up":
            depth -= value
        case "down":
            depth += value
        case _:
            raise ValueError("Unknown command.")

print(horiz_pos * depth)

SyntaxError: invalid syntax (<ipython-input-4-09e597446acb>, line 9)

So, is this any better?
We can argue it is _not_, because we didn't simplify our code, and yet managed to increase its depth.

To walk towards a scenario where pattern matching would be probably be more useful, let's rewrite the `match` statement:

In [16]:
with open(INPUT_FILE, "r") as f:
    instructions = f.readlines()

horiz_pos, depth = 0, 0
for line in instructions:
    
    match line.split():
        case ["forward", value]:
            horiz_pos += int(value)
        case ["up", value]:
            depth -= int(value)
        case ["down", value]:
            depth += int(value)
        case _:
            raise ValueError("Unknown command.")

print(horiz_pos * depth)

1727835


By matching directly the `line.split()` expression, we are making it easier for ourselves to handle instructions that have a different _structure_.
For example, imagine there was a `"reset"` instruction, that resetted the horizontal position and the depth to 0.
Using `match`, this is what the solution could look like:

In [22]:
with open(INPUT_FILE, "r") as f:
    instructions = f.readlines()

instructions.append("reset")  # Add a "reset" command at the end.

horiz_pos, depth = 0, 0
for line in instructions:
    
    match line.split():
        case ["reset"]:
            horiz_pos, depth = 0, 0
        case ["forward", value]:
            horiz_pos += int(value)
        case ["up", value]:
            depth -= int(value)
        case ["down", value]:
            depth += int(value)
        case _:
            raise ValueError("Unknown command.")

print(horiz_pos * depth)  # Prints 0 because the last command was "reset".

0


We only needed to add two lines of code to handle this new command, and the handling of all commands looks similar: a `case` statement and some code.
If we were to do the same thing in the original `if` statement, we would have to special-case the `"reset"` command because we would have to check for it before unpacking the line into the `command` and `line` variables:

In [26]:
with open(INPUT_FILE, "r") as f:
    instructions = f.readlines()
    
instructions.append("reset")  # Add a "reset" command to the end.

horiz_pos, depth = 0, 0
for line in instructions:
    if line == "reset":
        horiz_pos, depth = 0, 0
        continue
    
    command, value = line.split()
    value = int(value)
    
    if command == "forward":
        horiz_pos += value
    elif command == "up":
        depth -= value
    elif command == "down":
        depth += value
    else:
        raise ValueError("Unknown command.")

print(horiz_pos * depth)  # Prints 0 because the last command was "reset".

0


So, in conclusion, for such a homogeneous set of commands, the `if` statement is preferable.
If the line structure were more heterogeneous, then structural pattern matching would start to show its benefits.

### How to end the `if` block

In the solution above, our `if` block compares `command` explicitly to each of the three possible commands, and uses the `else` to raise an error in the event that we receive a command we don't know.
We could have written, just as easily, the following `if` block:

```py
if command == "forward":
    horiz_pos += value
elif command == "up":
    depth -= value
else:
    depth += value
```

This block assumes that the variable `command` _always_ contains one of the three known commands, and thus uses the `else` to handle the `down` command.

However, there is a disadvantage to writing code like this:
one cannot look at the `if` block and _read_ what is the third case.
Is it a single one?
Are there multiple commands that map to the action of doing `depth += value`?

Thus, one can argue it is preferable to be explicit about the cases we are handling.
Of course, we can still choose to write the `if` block like so:

```py
if command == "forward":
    horiz_pos += value
elif command == "up":
    depth -= value
elif command == "down":
    depth += value
```

The difference, here, is that we do not include the `else` branch with the `raise` statement.
This says explicitly the commands that we are handling, while also showing that we do not expect to have to handle anything else.

Another slight variant would be to write

```py
if command == "forward":
    horiz_pos += value
elif command == "up":
    depth -= value
elif command == "down":
    depth += value
else:
    pass
```

This variant can be understood to mean “we assume something else might come through in the variable `command`, but we don't care about it”.

These are just minor variations of one another, and _your_ interpretation might not necessarily align with mine, but I find it to be an interesting exercise to think about the different ways in which similar pieces of code are read and understood.

---

As far as this problem is concerned, there isn't much we can do to improve our solution significantly.
The problem is straightforward enough that any attempts to be clever would do more harm than good.

Therefore, we will now cover the second part of the problem.
Then, because this is a fairly simple problem, it acts as a good toy example to introduce a couple of interesting tools that could be relevant for similar tasks, but that would represent too much overhead here.

## Part 2 problem statement

(Adapted from [Advent of Code 2021, day 2](https://adventofcode.com/2021/day/2))

Part 2 is a modification of part 1.
Now, not only do we have to keep track of the horizontal position and depth, we also have to keep track of the **aim**.
On top of that, the **same commands** now have a **different meaning**:


 - `down X` increases your aim by X units;
 - `up X` decreases your aim by X units; and
 - `forward X` does two things:
    - it increases your horizontal position by X units; and
    - it increases your depth by your aim multiplied by X.
    
Recall the previous example:

```txt
forward 5
down 5
forward 8
up 3
down 8
forward 2
```

Now, the final result is different:

 - `forward 5` adds 5 to your horizontal position, a total of 5. Because your aim is 0, your depth does not change.
 - `down 5` adds 5 to your aim, resulting in a value of 5.
 - `forward 8` adds 8 to your horizontal position, a total of 13. Because your aim is 5, your depth increases by 8*5=40.
 - `up 3` decreases your aim by 3, resulting in a value of 2.
 - `down 8` adds 8 to your aim, resulting in a value of 10.
 - `forward 2` adds 2 to your horizontal position, a total of 15. Because your aim is 10, your depth increases by 2*10=20 to a total of 60.

After following these new instructions, you would have a horizontal position of 15 and a depth of 60. (Multiplying these produces 900.)

Using this new interpretation of the commands, **calculate the horizontal position and depth** you would have after following the planned course.
**What do you get if you multiply your final horizontal position by your final depth?**

_Using the input file `inputs/02_dive.txt`, the answer should be 1544000595._

### Modifying the baseline solution

In order to solve this new version of the problem, we just have to adapt the handling of each command:

In [28]:
with open(INPUT_FILE, "r") as f:
    instructions = f.readlines()

horiz_pos, depth, aim = 0, 0, 0
for line in instructions:
    command, value = line.split()
    value = int(value)
    
    if command == "forward":
        horiz_pos += value
        depth += aim * value
    elif command == "up":
        aim -= value
    elif command == "down":
        aim += value
    else:
        raise ValueError("Unknown command.")

print(horiz_pos * depth)

1544000595


### Rudimentary space-time complexity analysis

#### Time

Let us analyse the the space and time complexities of our solution, as a function of the number `n` of instructions.

A rule of thumb to estimate the time complexity of an algorithm is to sum the time complexities of things that happen after each other, and to multiply the time complexities of loops with the time complexities of the code inside them.

In our particular example, we have an outer `for` loop that goes through all instructions once, so that loop by itself is linear, or $O(n)$.
Now, we need to check the time complexity of the loop body, because the loop body gets executed in _each_ iteration.

As we can see, all operations inside the loop body execute in constant time: they do not depend on the total amount of instructions.
Hence, the loop body, for each iteration, is $O(1)$.

Putting it all together (in a not-so-rigorous manner), we get that the whole algorithm is $O(n) \times O(1) = O(n)$.

This shouldn't be surprising, and it is impossible to improve: we cannot know what the final horizontal position/depth/aim is without reading all instructions, and to read all instructions we need to go through the whole set of instructions at least once, which is already $O(n)$ by itself.

#### Space

The space complexity of our current solution is also linear, because we store all the instructions in a list.
We can reduce the space complexity to be constant if we employ the strategy of lazily iterating over the input file:

In [29]:
horiz_pos, depth, aim = 0, 0, 0

with open(INPUT_FILE, "r") as f:
    for line in f:
        command, value = line.split()
        value = int(value)

        if command == "forward":
            horiz_pos += value
            depth += aim * value
        elif command == "up":
            aim -= value
        elif command == "down":
            aim += value
        else:
            raise ValueError("Unknown command.")

    print(horiz_pos * depth)

1544000595


The space complexity of the modified code is $O(1)$ because we only store three integers.

## Other thoughts

As mentioned previously, let us use this toy problem as an excuse to cover a couple of other tools that you could benefit from.

### Parsing input

People have different sensibilities, so you may not relate to what I am about to say, but there is one small thing that annoys me a little bit in the solution above, and that is the parsing of each line.

We know that each line has a very nice format, but we still need to break it into pieces and do some conversions here and there.
A very reasonable thing to do would be to create an auxiliary function whose only job is to parse a line of input into its appropriate pieces.
For our challenge, we might even assume that the line _will_ have the appropriate format:

In [30]:
def parse_instruction_line(line):
    command, value = line.split()
    return command, int(value)

horiz_pos, depth, aim = 0, 0, 0

with open(INPUT_FILE, "r") as f:
    for line in f:
        command, value = parse_instruction_line(line)

        if command == "forward":
            horiz_pos += value
            depth += aim * value
        elif command == "up":
            aim -= value
        elif command == "down":
            aim += value
        else:
            raise ValueError("Unknown command.")

    print(horiz_pos * depth)

1544000595


For our little problem, it might not look very advantageous to define an auxiliary function to do that work.
However, as problems become more complex and as input formats become more complex/less structured, input parsing becomes a significant endeavour.
When that is the case, it is generally advised that you _separate concerns_:
have a function to do input parsing and then another function to do the number crunching/problem-solving.

### Enumerations of constants

Another tool that is quite useful comes from the `enum` module.
`enum` is short for “enumeration”, and is useful when you have related constant variables that you would like to keep together.

In our example, those (three) constants are the string values of the three commands:

 - “forward”
 - “up”
 - “down”

Suppose that the input file was suddenly in a different language – say, Portuguese.
If that were the case, would you change your code to:

```py
# ...
if command == "frente":
    horiz_pos += value
    depth += aim * value
elif command == "cima":
    aim -= value
elif command == "baixo":
    aim += value
```

Maybe you would, or maybe you wouldn't, but one thing is clear: now, most English speakers don't know what's written within quotes.

Or, perhaps, all commands were abbreviated to save space in the file:

```py
# ...
if command == "f":
    horiz_pos += value
    depth += aim * value
elif command == "u":
    aim -= value
elif command == "d":
    aim += value
```

In our code, these changes mean we would have to change three strings.
But what if our code was longer and we made use of the command strings in more places?
Then, updating all commands would be boring and, most importantly, **error-prone**.

It is because of these reasons (and others!) that things like `enum` exist.
With enumerations, we can group variables that act as “global constants” and use them instead of the actual values:

In [37]:
from enum import Enum

# We define an Enum(eration) with the valid commands.
class Command(Enum):
    FORWARD = "forward"
    UP = "up"
    DOWN = "down"

horiz_pos, depth, aim = 0, 0, 0

with open(INPUT_FILE, "r") as f:
    for line in f:
        command, value = line.split()
        value = int(value)
        command = Command(command)  # We say that `command` is a `Command`, ...

        # ... and we compare it to each possible command:
        if command == Command.FORWARD:
            horiz_pos += value
            depth += aim * value
        elif command == Command.UP:
            aim -= value
        elif command == Command.DOWN:
            aim += value

    print(horiz_pos * depth)

1544000595


Now, if the input language changes to Portuguese, we only have to make three changes, and everything else will keep working:

```py
class Command(Enum):
    FORWARD = "frente"
    UP = "cima"
    DOWN = "baixo"
```

Or, if the commands are abbreviated, we change the enumeration to

```py
class Command(Enum):
    FORWARD = "f"
    UP = "u"
    DOWN = "d"
```

Because the commands are stored in the enumeration, we only need to change the right-hand side of the assignments.
As we can see above, `Command.FORWARD` didn't change, it's still spelled as `Command.Forward`.
This means that you do **not** need to change the code that makes use of the commands.

You can read about `enum` [in the docs](https://docs.python.org/3/library/enum.html).

## Don't try this at home

In the exclusive Discord server for the bootcamp participants, someone posted the following code:

In [4]:
from itertools import accumulate
print(sum(d:=[1j**'fd'.find(i[0])*int(i.split()[1]) for i in open(INPUT_FILE)]).real*sum(d).imag)
print(sum(i.real*j.imag for i,j in zip(d, accumulate(d)))*sum(d).real)

1727835.0
1544000595.0


While clearly correct, it's also hard to digest.
I challenge you to refactor this code bit by bit until it's in a more tractable format.
Then, I invite you to study this code and try to understand _why_ it works.
Sometimes, digesting these “weird” pieces of code can teach you a lot!

Good luck ;)

## Conclusion

When trying to refactor the solution to a problem, your objective is not to make it look more obscure or complex.
Likewise, importing functions from other modules just for the sake of importing them isn't your goal.

Therefore, sometimes the best solution really is the first you came up with.

If you have any questions, suggestions, remarks, recommendations, corrections, or anything else, you can reach out to me [on Twitter](https://twitter.com/mathsppblog) or via email to rodrigo at mathspp dot com.