# Giant Squid

The analysis that follows pertains to the fourth day of the [Python Problem-Solving Bootcamp](https://mathspp.com/pythonbootcamp).

In the analysis that follows you may be confronted with code that you do not understand, especially as you reach the end of the explanation of each part.

If you find functions that you didn't know before, remember to [check the docs](https://docs.python.org/3/) for those functions and play around with them in the REPL.
This is written to be increasing in difficulty (within each part of the problem), so it is understandable if it gets harder as you keep reading.
That's perfectly fine, you don't have to understand everything _right now_, especially because I can't know for sure what _your level_ is.

## Part 1 problem statement

(From [Advent of Code 2021, day 4](https://adventofcode.com/2021/day/4))

You're already almost 1.5km (almost a mile) below the surface of the ocean, already so deep that you can't see any sunlight. What you _can_ see, however, is a giant squid that has attached itself to the outside of your submarine.

Maybe it wants to play [bingo](https://en.wikipedia.org/wiki/Bingo_(American_version))?

Bingo is played on a set of boards each consisting of a 5x5 grid of numbers. Numbers are chosen at random, and the chosen number is _marked_ on all boards on which it appears. (Numbers may not appear on all boards.) If all numbers in any row or any column of a board are marked, that board _wins_. (Diagonals don't count.)

The submarine has a _bingo subsystem_ to help passengers (currently, you and the giant squid) pass the time. It automatically generates a random order in which to draw numbers and a random set of boards (your puzzle input). For example:

```
7,4,9,5,11,17,23,2,0,14,21,24,10,16,13,6,15,25,12,22,18,20,8,19,3,26,1

22 13 17 11  0
 8  2 23  4 24
21  9 14 16  7
 6 10  3 18  5
 1 12 20 15 19

 3 15  0  2 22
 9 18 13 17  5
19  8  7 25 23
20 11 10 24  4
14 21 16 12  6

14 21 17 24  4
10 16 15  9 19
18  8 23 26 20
22 11 13  6  5
 2  0 12  3  7
```

After the first five numbers are drawn (`7`, `4`, `9`, `5`, and `11`), there are no winners, but the boards are marked as follows (shown here adjacent to each other to save space):

```
22 13 17 11  0         3 15  0  2 22        14 21 17 24  4
 8  2 23  4 24         9 18 13 17  5        10 16 15  9 19
21  9 14 16  7        19  8  7 25 23        18  8 23 26 20
 6 10  3 18  5        20 11 10 24  4        22 11 13  6  5
 1 12 20 15 19        14 21 16 12  6         2  0 12  3  7
```

After the next six numbers are drawn (`17`, `23`, `2`, `0`, `14`, and `21`), there are still no winners:

```
22 13 17 11  0         3 15  0  2 22        14 21 17 24  4
 8  2 23  4 24         9 18 13 17  5        10 16 15  9 19
21  9 14 16  7        19  8  7 25 23        18  8 23 26 20
 6 10  3 18  5        20 11 10 24  4        22 11 13  6  5
 1 12 20 15 19        14 21 16 12  6         2  0 12  3  7
```

Finally, `24` is drawn:

```
22 13 17 11  0         3 15  0  2 22        14 21 17 24  4
 8  2 23  4 24         9 18 13 17  5        10 16 15  9 19
21  9 14 16  7        19  8  7 25 23        18  8 23 26 20
 6 10  3 18  5        20 11 10 24  4        22 11 13  6  5
 1 12 20 15 19        14 21 16 12  6         2  0 12  3  7
```

At this point, the third board _wins_ because it has at least one complete row or column of marked numbers (in this case, the entire top row is marked: `_14 21 17 24 4_`).

The _score_ of the winning board can now be calculated. Start by finding the _sum of all unmarked numbers_ on that board; in this case, the sum is `188`. Then, multiply that sum by _the number that was just called_ when the board won, `24`, to get the final score, `188 * 24 = _4512_`.

To guarantee victory against the giant squid, figure out which board will win first. _What will your final score be if you choose that board?_

_Using the input file `04_giant_squid.txt`, the result should be 27027._

In [1]:
# IMPORTANT: Set this to the correct path for you!
INPUT_FILE = "data/input.txt"

## Decomposing the problem

For this problem, we have a lot of moving parts:

 1. we have to parse the boards from the input file;
 2. we have to find what board will be complete first; and
 3. we have to compute the score of that board.

This is a high-level view of what we have to do.
For this problem, and for problems of a similar complexity, it may be a good idea to code each part separately in a little function.

Then, we can mix & match the different implementations for each subproblem.

Let's start from the beginning.

## Board as a list of lists

In order to parse the boards, we first have to determine how we want to represent the boards.
Let's start with the “obvious” representation, where each board is a list of lists.

We start off by reading the whole file and splitting it on `"\n\n"`, because the input format shows that consecutive boards are separated by a blank line.
Then, we use [unpacking with starred assignment](mathspp.com/blog/pydonts/unpacking-with-starred-assignments) to separate the first line from the rest of the data:

In [2]:
with open(INPUT_FILE) as f:
    contents = f.read().strip().split("\n\n")

numbers, *board_data = contents

In [3]:
numbers

'17,25,31,22,79,72,58,47,62,50,30,91,11,63,66,83,33,75,44,18,56,81,32,46,93,13,41,65,14,95,19,38,8,35,52,7,12,70,84,23,4,42,90,60,6,40,97,16,27,86,5,48,54,64,29,67,26,89,99,53,34,0,57,3,92,37,59,9,21,78,51,80,73,82,76,28,88,96,45,69,98,1,2,71,68,49,36,15,55,39,87,77,74,94,61,85,10,43,20,24'

In [4]:
board_data[:5]

['36 11 70 77 80\n63  3 56 75 28\n89 91 27 33 82\n53 79 52 96 32\n58 14 78 65 38',
 '26 15 50 56  2\n20 27 42 11 16\n93 44 38 28 68\n66 88 78 81 77\n91 46 55 86  6',
 '46 53 14 17 75\n71  4 70 99 48\n65 96 68 80 72\n 3 97 62 37 88\n82 35 36 23 39',
 '17  1 61 77  5\n74 60 12 24 48\n34 19 68 65 86\n44 59 38 40 95\n67 64  9 52 27',
 '44 60  8 81  3\n30 71 85 23 99\n68 88 38 97 48\n27 70 63 28 12\n67 57 34 13 93']

Now, we write a function to parse each element of the list `board_data`:

In [5]:
def parse_board_matrix(board_string):
    string_rows = board_string.splitlines()
    board_rows = []
    for string_row in string_rows:
        row = []
        for num in string_row.split():
            row.append(int(num))
        board_rows.append(row)
    return board_rows

parse_board_matrix(board_data[0])

[[36, 11, 70, 77, 80],
 [63, 3, 56, 75, 28],
 [89, 91, 27, 33, 82],
 [53, 79, 52, 96, 32],
 [58, 14, 78, 65, 38]]

Our code can be improved if we make use of [list comprehensions](https://mathspp.com/blog/pydonts/list-comprehensions-101).

Some people like list comprehensions more than others.
In general, I am a fan because list comprehensions show more clearly what we are doing to the data, whereas `for` loops tend to have too much boilerplate that make it harder to find the important code.

In our case, we have the same pattern twice:

```py
lst = []
for elem in iterable:
    lst.append(func(elem))
```

We initialise an empty list, we iterate over an iterable, and we append each element (after some processing) to the list.
This can be rewritten as the following list comprehension:

```py
lst = [func(elem) for elem in iterable]
```

Notice how the list comprehension put the data transformation in front of everything else.

For our code, we can start from inside:

In [6]:
def parse_board_matrix(board_string):
    string_rows = board_string.splitlines()
    board_rows = []
    for string_row in string_rows:
        # row = []
        # for num in string_row.split():
        #     row.append(int(num))
        row = [int(num) for num in string_row.split()]
        board_rows.append(row)
    return board_rows

parse_board_matrix(board_data[0])

[[36, 11, 70, 77, 80],
 [63, 3, 56, 75, 28],
 [89, 91, 27, 33, 82],
 [53, 79, 52, 96, 32],
 [58, 14, 78, 65, 38]]

And then, on to the outer loop:

In [7]:
def parse_board_matrix(board_string):
    string_rows = board_string.splitlines()
    # board_rows = []
    # for string_row in string_rows:
    #     row = [int(num) for num in string_row.split()]
    #     board_rows.append(row)
    board_rows = [[int(num) for num in string_row.split()] for string_row in string_rows]
    return board_rows

parse_board_matrix(board_data[0])

[[36, 11, 70, 77, 80],
 [63, 3, 56, 75, 28],
 [89, 91, 27, 33, 82],
 [53, 79, 52, 96, 32],
 [58, 14, 78, 65, 38]]

In general, you shouldn't let your list comprehensions get too long or too crazy.
If needed, you can split them over multiple lines:

In [8]:
def parse_board_matrix(board_string):
    string_rows = board_string.splitlines()
    board_rows = [
        [int(num) for num in string_row.split()]
        for string_row in string_rows
    ]
    return board_rows

parse_board_matrix(board_data[0])

[[36, 11, 70, 77, 80],
 [63, 3, 56, 75, 28],
 [89, 91, 27, 33, 82],
 [53, 79, 52, 96, 32],
 [58, 14, 78, 65, 38]]

When you do, it's good to change lines right before the `for`.
Also, notice the indentation pattern used: the opening and closing brackets are by themselves, and the two other lines are aligned and indented.

Now, the name `board_rows` isn't being put to good use, so we can get rid of it:

In [9]:
def parse_board_matrix(board_string):
    string_rows = board_string.splitlines()
    return [
        [int(num) for num in string_row.split()]
        for string_row in string_rows
    ]

parse_board_matrix(board_data[0])

[[36, 11, 70, 77, 80],
 [63, 3, 56, 75, 28],
 [89, 91, 27, 33, 82],
 [53, 79, 52, 96, 32],
 [58, 14, 78, 65, 38]]

Also, if we want to be more extreme, we can inline the call to `.splitlines`:

In [10]:
def parse_board_matrix(board_string):
    return [
        [int(num) for num in string_row.split()]
        for string_row in board_string.splitlines()
    ]

parse_board_matrix(board_data[0])

[[36, 11, 70, 77, 80],
 [63, 3, 56, 75, 28],
 [89, 91, 27, 33, 82],
 [53, 79, 52, 96, 32],
 [58, 14, 78, 65, 38]]

## Boolean status matrix

Now that we have our boards in a decent format, we need to figure how to keep track of the numbers that have been drawn, and the completion status of each board.

Maybe we can represent the numbers that have been drawn as a Boolean matrix.
Then, as we draw each number, we have to mark it off on each board, and then check whether a board is done or not.

Therefore, we need to implement a function to check if a Boolean matrix has a bingo:

In [11]:
def boolean_matrix_has_bingo(matrix):
    for i in range(len(matrix)):
        row, col = True, True  # Check a row and a column at the same time.
        for j in range(len(matrix)):
            row = row and matrix[i][j]
            col = col and matrix[j][i]
        if row or col:
            return True
    return False

boolean_matrix_has_bingo([
    [True, False, False],
    [True, False, True],
    [True, True, False],
])

True

Can we rewrite the function `boolean_matrix_has_bingo`?
To check if a row has bingo, we could just use the `all` function:

In [12]:
all([True, True, False])

False

In [13]:
all([True, True, True])

True

Similarly, if we use the transpose trick with `zip(*matrix)`, we could use `all` to check the columns as well!

Because our boards are fairly small, the `zip(*matrix)` trick isn't that bad:

In [18]:
def boolean_matrix_has_bingo(matrix):
    for row in matrix:
        if all(row):
            return True
    for col in zip(*matrix):
        if all(col):
            return True

boolean_matrix_has_bingo([
    [True, False, False],
    [True, False, True],
    [True, True, False],
])

True

On a similar note, we can use `any` to check if _any_ of the rows or _any_ of the columns are full:

In [19]:
def boolean_matrix_has_bingo(matrix):
    return (
        any(all(row) for row in matrix) or
        any(all(col) for col in zip(*matrix))
    )

boolean_matrix_has_bingo([
    [True, False, False],
    [True, False, True],
    [True, True, False],
])

True

And then, might as well create a function that takes a board, its status, and a new number, and updates the status matrix in case the number drawn appeared on that board:

In [16]:
def update_status(board, status, number):
    for row_s, row_b in zip(status, board):
        for idx, num in enumerate(row_b):
            row_s[idx] |= num == number

board = [[1, 2], [3, 4]]
status = [[False, False], [False, False]]
update_status(board, status, 3)
status  # Notice how the third Boolean value is now `True`.

[[False, False], [True, False]]

Notice that the `|=` used above is a modified assignment.
Using `var |= value` unfolds to `var = var | value`, which is the same as `var = var or value` for Boolean values.
In our case, this piece of code means “`row_s[idx]` will be `True` if it was already `True` **or** if `num == number`”.
In other words, if you think of `row_s[idx]` as a lightbulb, `row_s[idx] |= value` means that the lighbulb stays ON if it was ON, and it turns ON when `value` is `True`.

Now, we just need a big simulation that draws one number at a time and checks the status of each board:

In [17]:
def empty_boolean_matrix(size):
    return [[False] * size for _ in range(size)]

def bingo_simulation(nums_drawn, boards, statuses):
    # Go over each number drawn.
    for num_drawn in nums_drawn:
        # Iterate over all boards ...
        for board, status in zip(boards, statuses):
            update_status(board, status, num_drawn)
            # ... and if there is a bingo, ...
            if boolean_matrix_has_bingo(status):
                return num_drawn, board, status  # ... return all information.

We can get this to run with the code we have so far, to see if we can identify the board that wins.

We put all the utility functions together, parse the boards, and then crack on with the simulation:

In [20]:
def parse_board_matrix(board_string):
    return [
        [int(num) for num in string_row.split()]
        for string_row in board_string.splitlines()
    ]

def boolean_matrix_has_bingo(matrix):
    return (
        any(all(row) for row in matrix) or
        any(all(col) for col in zip(*matrix))
    )

def update_status(board, status, number):
    for row_s, row_b in zip(status, board):
        for idx, num in enumerate(row_b):
            row_s[idx] |= num == number
            
def empty_boolean_matrix(size):
    return [[False] * size for _ in range(size)]

def bingo_simulation(nums_drawn, boards, statuses):
    # Go over each number drawn.
    for num_drawn in nums_drawn:
        # Iterate over all boards ...
        for board, status in zip(boards, statuses):
            update_status(board, status, num_drawn)
            # ... and if there is a bingo, ...
            if boolean_matrix_has_bingo(status):
                return num_drawn, board, status  # ... return all information.
            
with open(INPUT_FILE) as f:
    contents = f.read().strip().split("\n\n")

numbers, *board_data = contents
numbers = [int(num) for num in numbers.split(",")]  # Make sure the numbers are integers.
boards = [parse_board_matrix(data) for data in board_data]

bingo_simulation(numbers, boards, [empty_boolean_matrix(len(boards[0])) for _ in boards])

(14,
 [[18, 30, 28, 50, 81],
  [67, 47, 41, 45, 59],
  [51, 14, 92, 6, 68],
  [8, 46, 69, 84, 13],
  [93, 25, 58, 26, 75]],
 [[True, True, False, True, True],
  [False, True, True, False, False],
  [False, True, False, False, False],
  [False, True, False, False, True],
  [True, True, True, False, True]])

This is looking promising, now we need to score the board!

## Scoring a board based on a status matrix

In order to score a board, we need to find out what numbers are still unmarked.
The unmarked numbers are associated with a `False` value on the status matrix, so those are the ones we care about.
A nested loop can be used to traverse the two matrices at the same time to sum the unmarked values:

In [21]:
def score_board_by_status_matrix(last_drawn, board, status):
    acc = 0
    for row_b, row_s in zip(board, status):
        for num, marked in zip(row_b, row_s):
            if not marked:
                acc += num
    return acc * last_drawn

score_board_by_status_matrix(*bingo_simulation(
    numbers,
    boards,
    [empty_boolean_matrix(len(boards[0])) for _ in boards],
))

8442

From a personal point of view, I like to avoid `if` statements whenever I can.
I prefer to use the data in a computation to make it behave the way I like, instead of using an `if` to check whether or not I should make a computation.

In the function above, we want to accumulate `num` whenever `not marked` is `True`.
You might be familiar with the fact that `True` and `False` can behave as `1` and `0`, respectively, when forced to:

In [22]:
True + 1

2

In [23]:
False + 1

1

Thus, `True` and `False` can be used to mask out a number in a sum if you multiply with the Boolean value first:

In [24]:
def score_board_by_status_matrix(last_drawn, board, status):
    acc = 0
    for row_b, row_s in zip(board, status):
        for num, marked in zip(row_b, row_s):
            acc += num * (not marked)
    return acc * last_drawn

score_board_by_status_matrix(*bingo_simulation(
    numbers,
    boards,
    [empty_boolean_matrix(len(boards[0])) for _ in boards],
))

8442

This still works because when a number is marked, `not marked` evaluates to `False`, and `num * (not marked)` will thus evaluate to `0`.

## Board status as a list of bitmasks

Inspired by a solution sent to me, we could represent each row of the board as a binary number, instead of having the explicit `True` and `False` values.
We call this a bitmask, because each bit in that number represents the `True`/`False` state of the corresponding position.

If we did that, then the initial status of a board would be just a list of zeroes:

In [25]:
def empty_bit_rows(size):
    return [0 for _ in range(size)]

The rationale is that if a board has a row of, say, `[34, 27, 18]`, and the numbers `27` and `18` have been drawn out, then the row is represented as `0b011`, or as `3` in decimal.

The next two functions we need to modify would be the function that checks whether there was a bingo and the function that updates the board.

Checking if there is a bingo across a row is the same as checking whether or not all positions in that row contributed with a `1` in the binary expansion.
For example, if the rows have length `3`, we'd want to see if the row was the number `0b111`, or `7`.

Checking if there is a bingo across a column amounts to doing a bitwise AND down the rows:

![](bitwise_and_downwards.png)

After we do that bitwise AND, we check if the result is greater than 0.
If it is, it's because the bit in one of the columns is 1.

In order to implement this bitwise AND “down the rows”, we make use of `functools.reduce`, a severely underappreciated function.
You can read up on `reduce` [on this article](https://mathspp.com/blog/pydonts/the-power-of-reduce).

Here is the implementation:

In [26]:
from functools import reduce
from operator import and_

def bit_rows_has_bingo(matrix):
    # The number with the appropriate number of 1s in the binary expansion:
    target = 2 ** len(matrix) - 1
    return (
        any(target == row for row in matrix) or
        reduce(and_, matrix) > 0
    )

Finally, to implement a function that updates the status of a given position, we need to use the left shifting operator:

In [27]:
def update_status(board, status, number):
    board_size = len(board[0])
    for idx, row in enumerate(board):
        try:
            pos = row.index(number)  # Is the number in this row?
            status[pos] |= 1 << (board_size - pos - 1)
        except ValueError:
            pass  # If `number` isn't in `row`, just skip this row.

To check if all of this is working, we can plug our new functions into the previous simulation, and see if our code is able to determine the board that has a bingo first:

In [28]:
from functools import reduce
from operator import and_

def parse_board_matrix(board_string):
    return [
        [int(num) for num in string_row.split()]
        for string_row in board_string.splitlines()
    ]

def bit_rows_has_bingo(matrix):
    # The number with the appropriate number of 1s in the binary expansion:
    target = 2 ** len(matrix) - 1
    return (
        any(target == row for row in matrix) or
        reduce(and_, matrix) > 0
    )

def update_status(board, status, number):
    board_size = len(board[0])
    for idx, row in enumerate(board):
        try:
            pos = row.index(number)
            status[idx] |= 1 << (board_size - pos - 1)
        except ValueError:
            pass
            
def empty_bit_rows(size):
    return [0 for _ in range(size)]

def bingo_simulation(nums_drawn, boards, statuses):
    # Go over each number drawn.
    for num_drawn in nums_drawn:
        # Iterate over all boards ...
        for board, status in zip(boards, statuses):
            update_status(board, status, num_drawn)
            # ... and if there is a bingo, ...
            if bit_rows_has_bingo(status):
                return num_drawn, board, status  # ... return all information.
            
with open(INPUT_FILE) as f:
    contents = f.read().strip().split("\n\n")

numbers, *board_data = contents
numbers = [int(num) for num in numbers.split(",")]  # Make sure the numbers are integers.
boards = [parse_board_matrix(data) for data in board_data]

bingo_simulation(numbers, boards, [empty_bit_rows(len(boards[0])) for _ in boards])

(14,
 [[18, 30, 28, 50, 81],
  [67, 47, 41, 45, 59],
  [51, 14, 92, 6, 68],
  [8, 46, 69, 84, 13],
  [93, 25, 58, 26, 75]],
 [27, 12, 8, 9, 29])

If we scroll up, we will see that this is the same board that was returned previously, so we are on the right track.

Now, we just need to score the board.

## Scoring a board from a list of bitmasks

Because we have a list of bitmasks, we have to adapt our scoring function ever so slightly.
For instance, we can use `n & 1` to check if the rightmost bit of `n` is active:

In [29]:
0b1010 & 1

0

In [30]:
0b1011 & 1

1

And we can use `n >> 1` (bitwise right shift) to get rid of the rightmost bit of `n`:

In [31]:
bin(0b1011 >> 1)

'0b101'

Thus, we can use these two to go over the rows of a board, and then over each row.
When we go over each row, we do it backwards, so that we can work with the rightmost bits of the status bitmask:

In [32]:
def score_board_by_status_matrix(last_drawn, board, status):
    acc = 0
    for row, bitmask in zip(board, status):
        while row:
            acc += row[-1] * (not bitmask & 1)
            row = row[:-1]  # Drop the last element of the row,
            bitmask >>= 1   # and the rightmost bit of the bitmask.
    return acc * last_drawn

score_board_by_status_matrix(*bingo_simulation(
    numbers,
    boards,
    [empty_bit_rows(len(boards[0])) for _ in boards],
))

8442

## Board as a list of numbers

We have been playing around with the way we represent the marked numbers, but they all built on top of the fact that we represented a board as a matrix.
However, flat lists are easier to traverse so, what if we represented a board as a list of all of its numbers?

In [33]:
def parse_board_flat(board_string):
    return [int(num) for num in board_string.split()]

flat_example = parse_board_flat(board_data[0])
print(flat_example)

[36, 11, 70, 77, 80, 63, 3, 56, 75, 28, 89, 91, 27, 33, 82, 53, 79, 52, 96, 32, 58, 14, 78, 65, 38]


With this representation, it is very useful to know how to convert from a linear index to a `(row, col)` index that matches the appropriate matrix.

For example, here is where `73` is on the flat representation:

In [34]:
flat_example.index(73)

ValueError: 73 is not in list

How do we relate that to the row/column index of the matrix representation?

In the matrix representation, `73` lives at row `3`, column `2`:

In [35]:
parse_board_matrix(board_data[0])

[[36, 11, 70, 77, 80],
 [63, 3, 56, 75, 28],
 [89, 91, 27, 33, 82],
 [53, 79, 52, 96, 32],
 [58, 14, 78, 65, 38]]

In fact, to do this we use the built-in function `divmod`:

In [36]:
divmod(17, 5)

(3, 2)

Notice how the result of using `divmod` on the linear index, with the size of the matrix, gives the row/column coordinates into the matrix.

This is a very useful relationship.

In case you are wondering, `divmod` is essentially this:

In [37]:
def divmod(n, m):
    return n // m, n % m

divmod(17, 5)

(3, 2)

Now that we have changed how we represent a board, we will experiment with a couple of other ways of representing the status of the board.

Note that the representations of the status we are going to play with right now **are not** incompatible with other board representations!
In particular, the status representations we are going to see next could have been used with the matrix board representation.

## Board status as a list of coordinates

The first thing we will do is store the status of the board as a list of `(row, col)` pairs that represent the positions that have been marked off.

Then, checking for a bingo means we need to look at those pairs and see if there is any row or column index that shows up enough times.

For example, if our boards have size `3`, then the following list represents a bingo:

In [38]:
[(0, 1), (1, 2), (2, 2), (1, 1), (0, 0), (2, 1)]

[(0, 1), (1, 2), (2, 2), (1, 1), (0, 0), (2, 1)]

Why?
Because the column `1` shows up `3` times.

Here is how we can compute if such a list has a bingo:

In [39]:
def flat_board_has_bingo(board_lst, n):
    return (
        any(sum(r == i for r, _ in board_lst) == n for i in range(n)) or
        any(sum(c == i for _, c in board_lst) == n for i in range(n))
    )

Then, updating the status is just a matter of appending the appropriate pair of coordinates (from `divmod`) **if** the number is in the board:

In [40]:
def update_status(board, status, number):
    board_size = int(len(board) ** 0.5)
    try:
        status.append(divmod(board.index(number), board_size))
    except ValueError:
        pass

Finally, we can run our simulation:

In [41]:
def bingo_simulation(nums_drawn, boards, statuses):
    board_size = int(len(boards[0]) ** 0.5)  # Square root of length gives square board size.
    # Go over each number drawn.
    for num_drawn in nums_drawn:
        # Iterate over all boards ...
        for board, status in zip(boards, statuses):
            update_status(board, status, num_drawn)
            # ... and if there is a bingo, ...
            if flat_board_has_bingo(status, board_size):
                return num_drawn, board, status  # ... return all information.
            
with open(INPUT_FILE) as f:
    contents = f.read().strip().split("\n\n")

numbers, *board_data = contents
numbers = [int(num) for num in numbers.split(",")]  # Make sure the numbers are integers.
boards = [parse_board_flat(data) for data in board_data]

print(bingo_simulation(numbers, boards, [[] for _ in boards]))

(14, [18, 30, 28, 50, 81, 67, 47, 41, 45, 59, 51, 14, 92, 6, 68, 8, 46, 69, 84, 13, 93, 25, 58, 26, 75], [(4, 1), (4, 2), (1, 1), (0, 3), (0, 1), (4, 4), (0, 0), (0, 4), (3, 1), (4, 0), (3, 4), (1, 2), (2, 1)])


## Scoring with flat list of coordinates

You know the drill!
We go over each possible set of coordinates, and add them to an accumulator **if** those coordinates haven't been used.

_Or_, we could try something different.
Because the board is a flat list, we can `sum` it directly, and then subtract the coordinates that are marked:

In [42]:
def score_board_by_list_of_coordinates(num_drawn, board, status):
    board_size = int(len(board) ** 0.5)
    acc = sum(board)
    for r, c in status:
        acc -= board[r * board_size + c]
    return num_drawn * acc

score_board_by_list_of_coordinates(*bingo_simulation(
    numbers,
    boards,
    [[] for _ in boards],
))

8442

In order to implement this, we had to make use of the “inverse transformation” of `divmod`.
With `divmod`, we can go from a single index into the pair of row/column indices.

With a multiplication and addition, we can revert this.

So, if `r, c = divmod(n, m)`, then `n == r * m + c`.
That is the identity that was used above.

## Part 2 problem statement

(From [Advent of Code 2021, day 4](https://adventofcode.com/2021/day/4))

On the other hand, it might be wise to try a different strategy: let the giant squid win.

You aren't sure how many bingo boards a giant squid could play at once, so rather than waste time counting its arms, the safe thing to do is to _figure out which board will win last_ and choose that one. That way, no matter which boards it picks, it will win for sure.

In the above example, the second board is the last to win, which happens after `13` is eventually called and its middle column is completely marked. If you were to keep playing until this point, the second board would have a sum of unmarked numbers equal to `148` for a final score of `148 * 13 = _1924_`.

Figure out which board will win last. _Once it wins, what would its final score be?_

_Using the input file `04_giant_squid.txt`, the result should be 36975._

## Tweaking the simulation

The second part of the problem statement is awfully similar to the first part, except now we want the board that is finished last.

To compute this, we can start off with a list of all boards, and successively get rid of them as they get completed.
When we figure out what the last board will be, we keep playing on it until we find out what is the winning number for that board.
This will work regardless of the way you choose to represent the board/its completion status:

In [43]:
def bingo_simulation_last(nums_drawn, boards, statuses):
    board_size = int(len(boards[0]) ** 0.5)

    while len(boards) > 1:
        num_drawn, *nums_drawn = nums_drawn
        next_boards, next_statuses = [], []
        
        # Update all boards, ...
        for board, status in zip(boards, statuses):
            update_status(board, status, num_drawn)
            
            # ... then keep those that don't have a bingo:
            if not flat_board_has_bingo(status, board_size):
                next_boards.append(board)
                next_statuses.append(status)
                
        boards, statuses = next_boards, next_statuses

    # Play the final board to completion.
    board, status = boards[0], statuses[0]
    while not flat_board_has_bingo(status, board_size):
        num_drawn, *nums_drawn = nums_drawn
        update_status(board, status, num_drawn)
    
    return num_drawn, board, status

score_board_by_list_of_coordinates(*bingo_simulation_last(
    numbers,
    boards,
    [[] for _ in boards],
))

4590

This works, but is a bit clumsy.
After all, the filtering that is taking place forced us to keep track of two helper variables.

We can make this cleaner if we `zip` the boards and statuses together in the beginning, and keep them together throughout the simulation:

In [44]:
def bingo_simulation_last(nums_drawn, boards, statuses):
    board_size = int(len(boards[0]) ** 0.5)
    data = list(zip(boards, statuses))

    while len(data) > 1:
        num_drawn, *nums_drawn = nums_drawn
        
        # Update all boards, ...
        for board, status in data:
            update_status(board, status, num_drawn)
        # ... then keep those that don't have a bingo.
        data = [
            (board, status) for board, status in data
            if not flat_board_has_bingo(status, board_size)
        ]

    board, status = data[0]
    while not flat_board_has_bingo(status, board_size):
        num_drawn, *nums_drawn = nums_drawn
        update_status(board, status, num_drawn)
    
    return num_drawn, board, status

score_board_by_list_of_coordinates(*bingo_simulation_last(
    numbers,
    boards,
    [[] for _ in boards],
))

4590

This is a good example of a place where a `zip` makes all the difference.

When two sequences of data go hand-in-hand like this, where the elements from one sequence are tightly coupled to another sequence, you often find yourself iterating over them in a `for` loop with `zip`.
In this case, because of the processing we were doing, it turned out that it was very useful to actually keep them `zip`ped all the time.

However, we have seen that the two parts of the problem look remarkably similar, and yet, the simulation function had to undergo quite some changes.
When there is a very clear symmetry in problem statements, and the code that solves them is not symmetric, that's often a good hint that you might be missing a better solution.

Let's try to find it.

## Processing each board in turn

The asymmetry we are faced with is in the code that simulates the game of bingo, so it is fair to assume that the issue isn't specifically related to the representations we are using.
So, a change of methodology is needed.

Instead of simulating the game of bingo, by successively drawing numbers, let's process each board to completion.
In other words, our simulations consisted (roughly) of a double loop of the form

```py
for number in numbers_to_draw:
    for board in boards:
        ...
```

and now we are looking to write some code that looks like

```py
for board in boards:
    for number in numbers_to_draw:
        ...
```

By doing this, we play each board to completion, and then we get a series of results that indicate when each board was finished.
From there, we can pick whatever board we like: the board that was finished first, last, second, tenth, whatever we want!

Let's rewrite our simulation function:

In [45]:
def bingo_simulation_by_board(nums_drawn, boards, statuses):
    board_size = int(len(boards[0]) ** 0.5)

    # Number of moves needed to finish each board:
    finishing_moves = []
    for board, status in zip(boards, statuses):
        # Create a copy of the list of numbers to draw from:
        draw_from, moves = nums_drawn[::], -1
        # Play board to completion:
        while not flat_board_has_bingo(status, board_size):
            num, *draw_from = draw_from
            update_status(board, status, num)
            moves += 1
        finishing_moves.append(moves)

    return finishing_moves, boards, statuses
            
finishing_moves, boards, statuses = bingo_simulation_by_board(
    numbers,
    boards,
    [[] for _ in boards],
)

With the code written so far, we have the final statuses of all the boards, and we have an extra piece of information:
the list `finishing_moves` tells us how many numbers had to be drawn for each board to be completed:

In [46]:
finishing_moves[:5]

[63, 75, 59, 51, 62]

This tells us that

 - the first board was completed after 44 moves, when `numbers[43]` was drawn;
 - the second board was completed after 55 moves, when `numbers[54]` was drawn;
 - etc.

Now, we can use the functions `min` and `max` to figure out the board that took the least/most amount of draws to be completed:

In [47]:
for picker in [min, max]:
    last_drawn_idx = picker(finishing_moves)
    board_idx = finishing_moves.index(last_drawn_idx)
    print(score_board_by_list_of_coordinates(
        numbers[last_drawn_idx],
        boards[board_idx],
        statuses[board_idx],
    ))

8442
4590


The assignment `board_idx = ...` is used to find out the index of the board that finished in the selected amount of moves.
That's because `picker(...)` picks the number of moves that we want, but then we need to associate a board with that number of moves.

## Direct computation of board completion

So far, we have been drawing numbers and checking if/how they affect each board.

We can reverse this train of thought as well: instead of drawing numbers and checking whether they change the status of a board, we can take a board and we can check the indices at which those numbers appear in the list of numbers to draw.

By doing this, we get a second representation of the board; this one will show how much time we need to wait for each number in the board to be drawn.
Then, we can figure out how many moves we need for each row/column to give a bingo, and we can pick the first of those row/columns bingos as the moment when the board gives a bingo.

For example, for this board:

In [48]:
print(boards[0])

[36, 11, 70, 77, 80, 63, 3, 56, 75, 28, 89, 91, 27, 33, 82, 53, 79, 52, 96, 32, 58, 14, 78, 65, 38]


this would be the related board of indices:

In [49]:
print([numbers.index(num) for num in boards[0]])

[86, 12, 37, 91, 71, 13, 63, 20, 17, 75, 57, 11, 48, 16, 73, 59, 4, 34, 77, 22, 6, 28, 69, 27, 31]


How do we read these results?
From left to right, and pairing up the two lists:

 - the number `14` of the board will be drawn in the move with index `16`;
 - the number `33` of the board will be drawn in the move with index `20`;
 - the number `79` of the board will be drawn in the move with index `50`;
 - ...

After figuring out this representation of indices, we need to figure out how early we can get a bingo.
For that, we check when each row and each column gives a bingo, and we pick the fastest time of all of them:

In [50]:
def bingo_at(board, numbers):
    indices = [numbers.index(num) for num in board]
    size = int(len(board) ** 0.5)
    return min(
        # How early can a row give a bingo?
        min(max(indices[r * size + c] for c in range(size)) for r in range(size)),
        # How early can a column give a bingo?
        min(max(indices[r * size + c] for r in range(size)) for c in range(size)),
    )

[bingo_at(board, numbers) for board in boards] == finishing_moves

True

We can see that our function `bingo_at` is able to reproduce the results we had gotten above, regarding the moment when each board gives a bingo.
So, this function can almost replace the simulation function we had.
The only reason why the replacement can't happen directly, is because the simulation produces a status variable that we do not have here.

For us to use the function `bingo_at`, we need a new way of scoring the board.

## Scoring a board from draw indices

In order to score a board, we need to access all the unmarked numbers on the board: those will be all the numbers whose index in the list `numbers` is greater than the index at which a bingo happens.

Thus, we can score a board as such:

In [51]:
def score_board_from_draw_indices(board, numbers, bingo_at):
    acc = 0
    for number in numbers[bingo_at + 1:]:
        if number in board:
            acc += number
    return acc * numbers[bingo_at]

bingos = [bingo_at(board, numbers) for board in boards]

for picker in [min, max]:
    last_drawn_idx = picker(finishing_moves)
    board_idx = finishing_moves.index(last_drawn_idx)
    print(score_board_from_draw_indices(boards[board_idx], numbers, last_drawn_idx))

8442
4590


## Conclusion

When you have a “large” problem, try decomposing it into smaller pieces.
You may have a hard time figuring out how to completely decouple the several pieces, but it already helps if you have smaller subproblems that you can tackle little by little.

If you have any questions, suggestions, remarks, recommendations, corrections, or anything else, you can reach out to me [on Twitter](https://twitter.com/mathsppblog) or via email to rodrigo at mathspp dot com.