# Seven Segment Search

The analysis that follows pertains to the eigth day of the [Python Problem-Solving Bootcamp](https://mathspp.com/pythonbootcamp).

In the analysis that follows you may be confronted with code that you do not understand, especially as you reach the end of the explanation of each part.

If you find functions that you didn't know before, remember to [check the docs](https://docs.python.org/3/) for those functions and play around with them in the REPL.
This is written to be increasing in difficulty (within each part of the problem), so it is understandable if it gets harder as you keep reading.
That's perfectly fine, you don't have to understand everything _right now_, especially because I can't know for sure what _your level_ is.

## Part 1 problem statement

(From [Advent of Code 2021, day 8](https://adventofcode.com/2021/day/8))

You barely reach the safety of the cave when the whale smashes into the cave mouth, collapsing it. Sensors indicate another exit to this cave at a much greater depth, so you have no choice but to press on.

As your submarine slowly makes its way through the cave system, you notice that the four-digit [seven-segment displays](https://en.wikipedia.org/wiki/Seven-segment_display) in your submarine are malfunctioning; they must have been damaged during the escape. You'll be in a lot of trouble without them, so you'd better figure out what's wrong.

Each digit of a seven-segment display is rendered by turning on or off any of seven segments named `a` through `g`:

```
  0:      1:      2:      3:      4:
 aaaa    ....    aaaa    aaaa    ....
b    c  .    c  .    c  .    c  b    c
b    c  .    c  .    c  .    c  b    c
 ....    ....    dddd    dddd    dddd
e    f  .    f  e    .  .    f  .    f
e    f  .    f  e    .  .    f  .    f
 gggg    ....    gggg    gggg    ....

  5:      6:      7:      8:      9:
 aaaa    aaaa    aaaa    aaaa    aaaa
b    .  b    .  .    c  b    c  b    c
b    .  b    .  .    c  b    c  b    c
 dddd    dddd    ....    dddd    dddd
.    f  e    f  .    f  e    f  .    f
.    f  e    f  .    f  e    f  .    f
 gggg    gggg    ....    gggg    gggg
```

So, to render a `1`, only segments `c` and `f` would be turned on; the rest would be off. To render a `7`, only segments `a`, `c`, and `f` would be turned on.

The problem is that the signals which control the segments have been mixed up on each display. The submarine is still trying to display numbers by producing output on signal wires `a` through `g`, but those wires are connected to segments _randomly_. Worse, the wire/segment connections are mixed up separately for each four-digit display! (All of the digits _within_ a display use the same connections, though.)

So, you might know that only signal wires `b` and `g` are turned on, but that doesn't mean _segments_ `b` and `g` are turned on: the only digit that uses two segments is `1`, so it must mean segments `c` and `f` are meant to be on. With just that information, you still can't tell which wire (`b`/`g`) goes to which segment (`c`/`f`). For that, you'll need to collect more information.

For each display, you watch the changing signals for a while, make a note of _all ten unique signal patterns_ you see, and then write down a single _four digit output value_ (your puzzle input). Using the signal patterns, you should be able to work out which pattern corresponds to which digit.

For example, here is what you might see in a single entry in your notes:

```
acedgfb cdfbe gcdfa fbcad dab cefabd cdfgeb eafb cagedb ab |
cdfeb fcadb cdfeb cdbaf
```

(The entry is wrapped here to two lines so it fits; in your notes, it will all be on a single line.)

Each entry consists of ten _unique signal patterns_, a `|` delimiter, and finally the _four digit output value_. Within an entry, the same wire/segment connections are used (but you don't know what the connections actually are). The unique signal patterns correspond to the ten different ways the submarine tries to render a digit using the current wire/segment connections. Because `7` is the only digit that uses three segments, `dab` in the above example means that to render a `7`, signal lines `d`, `a`, and `b` are on. Because `4` is the only digit that uses four segments, `eafb` means that to render a `4`, signal lines `e`, `a`, `f`, and `b` are on.

Using this information, you should be able to work out which combination of signal wires corresponds to each of the ten digits. Then, you can decode the four digit output value. Unfortunately, in the above example, all of the digits in the output value (`cdfeb fcadb cdfeb cdbaf`) use five segments and are more difficult to deduce.

For now, _focus on the easy digits_. Consider this larger example:

```
be cfbegad cbdgef fgaecd cgeb fdcge agebfd fecdb fabcd edb |
fdgacbe cefdb cefbgd gcbe
edbfga begcd cbg gc gcadebf fbgde acbgfd abcde gfcbed gfec |
fcgedb cgb dgebacf gc
fgaebd cg bdaec gdafb agbcfd gdcbef bgcad gfac gcb cdgabef |
cg cg fdcagb cbg
fbegcd cbd adcefb dageb afcb bc aefdc ecdab fgdeca fcdbega |
efabcd cedba gadfec cb
aecbfdg fbg gf bafeg dbefa fcge gcbea fcaegb dgceab fcbdga |
gecf egdcabf bgf bfgea
fgeab ca afcebg bdacfeg cfaedg gcfdb baec bfadeg bafgc acf |
gebdcfa ecba ca fadegcb
dbcfg fgd bdegcaf fgec aegbdf ecdfab fbedc dacgb gdcebf gf |
cefg dcbef fcge gbcadfe
bdfegc cbegaf gecbf dfcage bdacg ed bedf ced adcbefg gebcd |
ed bcgafe cdgba cbgef
egadfb cdbfeg cegd fecab cgb gbdefca cg fgcdab egfdb bfceg |
gbdfcae bgc cg cgb
gcafb gcf dcaebfg ecagb gf abcdeg gaef cafbge fdbac fegbdc |
fgae cfgab fg bagce
```

Because the digits `1`, `4`, `7`, and `8` each use a unique number of segments, you should be able to tell which combinations of signals correspond to those digits. Counting _only digits in the output values_ (the part after `|` on each line), in the above example, there are `26` instances of digits that use a unique number of segments (highlighted above).

_In the output values, how many times do digits `1`, `4`, `7`, or `8` appear?_

_Using the input file `input.txt`, the result should be `264`._

In [1]:
# IMPORTANT: Set this to the correct path for you!
INPUT_FILE = "input.txt"

## Baseline solution

Our baseline solution is going to be pretty simple, and we are going to be using much of what we learned so far, so there will not be much to talk about:

In [2]:
from collections import Counter

# Digit i uses SEGMENTS[i] segments.
SEGMENTS = [6, 2, 5, 5, 4, 5, 6, 3, 7, 6]

with open(INPUT_FILE, "r") as f:
    lines = f.readlines()

c = Counter()
for line in lines:
    c.update(len(hint) for hint in line.split(" | ")[1].split())
print(sum(c[SEGMENTS[i]] for i in [1, 4, 7, 8]))

264


Assuming you have taken a look at the previous analyses, this solution should be fairly straightforward to you.
Here are some of the key points:

 - we use a `collections.Counter` object to keep track of all the length counts we have found;
 - the method `.update` was used to update the `collections.Counter` object with the new counts;
 - we use the `.readlines` method inside the context manager to read all lines and only do the processing outside of the context manager – because the contents of the file are small enough that we can hold everything in memory, we can close the file as soon as possible;
 - we use the method `.split` and indexing to build a list of the segment strings to the right of `" | "` directly; and
 - we encode the number of segments each digit needs in a list, where the index of the value represents implicitly the digit it refers to.

## TMI

TMI is an acronym that stands for “too much information”.
In this particular case, it refers to the fact that, even though our solution was simple, it contains “too much” information that we could easily do without.

This is something that I find myself doing quite often: am I doing what was asked, or am I doing much more than needed?
Over-delivering is a nice touch, but you have to make sure you _also_ do what was asked of you.
For this alternative solution, I just want to show you what a solution that _only_ counts if a combination of signals is relevant or not:

In [3]:
from collections import Counter
from itertools import chain

LENGTHS = {2, 3, 4, 7}

with open(INPUT_FILE, "r") as f:
    lines = f.readlines()

acc = 0
for line in lines:
    for digit_signals in line.split(" | ")[1].split():
        acc += len(digit_signals) in LENGTHS
print(acc)

264


In this version we only keep track of what we need and we use our trick that converts a conditional increment into an increment with a Boolean value:
the line `acc += len(digit_signals) in LENGTHS` is functionally equivalent to

```py
if len(digit_signals) in LENGTHS:
    acc += 1
```

but let's you write your code with less flow control.
I am a personal fan of flatter code, so I leave this here for your consideration.

However, in writing this algorithm, we created a double `for` loop and an accumulator variable.
I have nothing against double `for` loops, but I am not a fan of accumulator variables.
Let's try to make `acc` go away by slowly reworking the `for` loops.

We start by rewriting the inner `for` loop:

In [4]:
acc = 0
for line in lines:
    acc += sum(len(signals) in LENGTHS for signals in line.split(" | ")[1].split())
print(acc)

264


And now we can remove the outer one, by having it inside the list comprehension as well:

In [5]:
acc = sum(len(signals) in LENGTHS for line in lines for signals in line.split(" | ")[1].split())
print(acc)

264


The issue here is that we have a _very_ long line and a list comprehension with two `for` loops.
We can make the line shorter by breaking it up:

In [6]:
acc = sum(
    len(signals) in LENGTHS
    for line in lines for signals in line.split(" | ")[1].split()
)
print(acc)

264


But this is stretching a bit the levels of decency for a list comprehension.
Let's try another approach.

## Make it (more) functional

Instead of using list comprehensions, let's use an explicit `map` (a staple of functional programming) and another tool from the module `itertools` to rewrite our code without accumulator variables:

In [7]:
from itertools import chain

parsed = map(lambda line: line.split(" | ")[1].split(), lines)
acc = sum(len(signals) in LENGTHS for signals in chain.from_iterable(parsed))
print(acc)

264


The idea behind the variable `parsed` is that we can use `map` to apply the basic parsing (or processing) to each line: all we want to do is split the line in two (along the separator `" | "`), take the right-hand side, and split that along the spaces.

Then, the usage of the function `chain.from_iterable` from the module `itertools` builds up on a simple, yet important, fact: all signal combinations from all lines are to be treated exactly the same.
So, `parsed` will build an iterable containing iterables:

In [8]:
parsed = map(lambda line: line.split(" | ")[1].split(), lines[:5])
list(parsed)

[['eg', 'eg', 'dfecag', 'ge'],
 ['deg', 'baced', 'fegac', 'aefbcg'],
 ['gcadef', 'eacgbf', 'egdfb', 'bgdfe'],
 ['cd', 'cd', 'agfbd', 'bdac'],
 ['faec', 'ceaf', 'bcedgaf', 'cfbga']]

But we want to iterate over all those strings, regardless of what line they were on, so we can use `chain.from_iterable` to flatten those up:

In [9]:
parsed = map(lambda line: line.split(" | ")[1].split(), lines[:5])
list(chain.from_iterable(parsed))

['eg',
 'eg',
 'dfecag',
 'ge',
 'deg',
 'baced',
 'fegac',
 'aefbcg',
 'gcadef',
 'eacgbf',
 'egdfb',
 'bgdfe',
 'cd',
 'cd',
 'agfbd',
 'bdac',
 'faec',
 'ceaf',
 'bcedgaf',
 'cfbga']

After flattening the iterable `parsed`, we can go over all the different signals and check their strings.

A nice nuance, though, is that both the `map` and the `chain.from_iterable` result in _lazy_ generators, thus the data is being computed on the fly, and you don't need to spend huge amounts of memory to hold everything in lists or other data structures of the like.

## Part 2 problem statement

Through a little deduction, you should now be able to determine the remaining digits. Consider again the first example above:

```
acedgfb cdfbe gcdfa fbcad dab cefabd cdfgeb eafb cagedb ab |
cdfeb fcadb cdfeb cdbaf
```

After some careful analysis, the mapping between signal wires and segments only make sense in the following configuration:

```
 dddd
e    a
e    a
 ffff
g    b
g    b
 cccc
```

So, the unique signal patterns would correspond to the following digits:

-   `acedgfb`: `8`
-   `cdfbe`: `5`
-   `gcdfa`: `2`
-   `fbcad`: `3`
-   `dab`: `7`
-   `cefabd`: `9`
-   `cdfgeb`: `6`
-   `eafb`: `4`
-   `cagedb`: `0`
-   `ab`: `1`

Then, the four digits of the output value can be decoded:

-   `cdfeb`: `5`
-   `fcadb`: `3`
-   `cdfeb`: `5`
-   `cdbaf`: `3`

Therefore, the output value for this entry is `5353`.

Following this same process for each entry in the second, larger example above, the output value of each entry can be determined:

-   `fdgacbe cefdb cefbgd gcbe`: `8394`
-   `fcgedb cgb dgebacf gc`: `9781`
-   `cg cg fdcagb cbg`: `1197`
-   `efabcd cedba gadfec cb`: `9361`
-   `gecf egdcabf bgf bfgea`: `4873`
-   `gebdcfa ecba ca fadegcb`: `8418`
-   `cefg dcbef fcge gbcadfe`: `4548`
-   `ed bcgafe cdgba cbgef`: `1625`
-   `gbdfcae bgc cg cgb`: `8717`
-   `fgae cfgab fg bagce`: `4315`

Adding all of the output values in this larger example produces `61229`.

For each entry, determine all of the wire/segment connections and decode the four-digit output values. _What do you get if you add up all of the output values?_

_Using the input file `input.txt`, the result should be `1063760`._

## Generating permutations

Our baseline solution to this problem follows a simple principle: if the signal letteres are all mixed up, we will sort them out by trying every single ordering of the segments, and then check if it works.

For us to try every single ordering of the segments, we need to able to generate those orderings.
A possible solution entails generating all strings of length `7`, and then only keeping those that make use of each of the letters in `"abcdefgh"`:

In [10]:
def all_orderings(letters):
    def aux(length):
        if not length:
            return [""]

        orderings = aux(length - 1)
        return [ordering + char for ordering in orderings for char in letters]
    
    orderings = aux(7)
    letters = set(letters)
    return [ordering for ordering in orderings if letters == set(ordering)]

orderings = all_orderings("abcdefg")
print(orderings[:7])

['abcdefg', 'abcdegf', 'abcdfeg', 'abcdfge', 'abcdgef', 'abcdgfe', 'abcedfg']


This works well, in the sense that it produces the correct result, but it is _really_ wasteful!

In fact, there are $5040$ possible orderings of the signals:

In [11]:
len(orderings)

5040

And yet, our function `all_orderings` starts by generating $7^7 = 823,543$ strings, only to filter out _most_ of them later.
In fact, we use less than $1\%$ of the work done, which is to say we waste more than $99\%$ of the things we computed.
Not to mention we have to hold those $823,543$ strings in memory.

Let's try to do better, by only making use of the characters that we have _not_ used so far:

In [12]:
def all_orderings(letters):
    if not letters:
        return [""]

    results = []
    for idx, letter in enumerate(letters):
        for ordering in all_orderings(letters[:idx] + letters[idx + 1:]):
            results.append(letter + ordering)
    return results

orderings = all_orderings("abcdefg")
print(orderings[:7])
print(len(orderings))

['abcdefg', 'abcdegf', 'abcdfeg', 'abcdfge', 'abcdgef', 'abcdgfe', 'abcedfg']
5040


Once more, we have an accumulator variable that we could do away without:

In [13]:
def all_orderings(letters):
    if not letters:
        return [""]

    return [
        letter + ordering for idx, letter in enumerate(letters)
        for ordering in all_orderings(letters[:idx] + letters[idx + 1:])
    ]

orderings = all_orderings("abcdefg")
print(orderings[:7])
print(len(orderings))

['abcdefg', 'abcdegf', 'abcdfeg', 'abcdfge', 'abcdgef', 'abcdgfe', 'abcedfg']
5040


We produced a function `all_orderings` that accepts a strings and returns all possible orderings of those letters.
In other words, we are producing all [permutations](https://en.wikipedia.org/wiki/Permutation) of a single string.
For our particular implementation, we went with a recursive function.

Can you write a function that generates all permutations of a string without recursion?
(I am of the personal opinion that combinatorics-related functions are great exercises.)

However, all of our work is moot if we just get to know the standard library a bit better, because the module `itertools` contains the right tool for the job:

In [14]:
from itertools import permutations

orderings = list(permutations("abcdefg"))
print(orderings[:7])
print(len(orderings))

[('a', 'b', 'c', 'd', 'e', 'f', 'g'), ('a', 'b', 'c', 'd', 'e', 'g', 'f'), ('a', 'b', 'c', 'd', 'f', 'e', 'g'), ('a', 'b', 'c', 'd', 'f', 'g', 'e'), ('a', 'b', 'c', 'd', 'g', 'e', 'f'), ('a', 'b', 'c', 'd', 'g', 'f', 'e'), ('a', 'b', 'c', 'e', 'd', 'f', 'g')]
5040


The result, as we can see, is slightly different from before, because `permutations` returns tuples with the several elements.
On top of that, `permutations` is lazy like `zip` or `enumerate` (or many others): it only generates the data as we ask for it.
If we just call `permutations`, nothing useful gets printed because no actual permutations were computed yet:

In [15]:
permutations("abcdefg")

<itertools.permutations at 0x1ae3d000db0>

## Brute-force solution

Now that we have a way of generating all the different permutations of the letters `"abcdefg"`, we can implement our brute-force solution.
It is going to work this way:

 - we pre-compute all possible ways of reordering the signals based off of the permutations of `"abcdefg"` (pre-computing them means we save time across lines of the input);
 - we are going to read the input line by line and split it into the left and right-hand sides of the `" | "`;
 - we are going to look for the correct ordering of the signals based off of the left-hand side of the input line; and
 - when the correct ordering is found, we use it to transform the right-hand side signals and we compute the code.

How do we know if a given ordering is the correct one?
Let's revisit the example above:

The input line

```
acedgfb cdfbe gcdfa fbcad dab cefabd cdfgeb eafb cagedb ab |
cdfeb fcadb cdfeb cdbaf
```

matches this reordering of the signals:

```
 dddd
e    a
e    a
 ffff
g    b
g    b
 cccc
```

This would be represented by the string `"deafgbc"`, which maps to `"abcdefg"` to produce the original ordering of the segments:

```
 aaaa
b    c
b    c
 dddd
e    f
e    f
 gggg
```

So, for each ordering, like `"deafgbc"`, we map it to `"abcdefg"` and figure out what digits are being represented.
For example, the original input line contained the signals `"cdfbe"`, which would represent the following digit:

```
 dddd
e    .
e    .
 ffff
.    b
.    b
 cccc
```

This is a 5, so this is a hint that the ordering `"deafgbc"` could be the correct ordering.
For an ordering to be correct, we must be able to build all _ten_ digits out of the original input line.
An ordering is incorrect when it maps to something that is not a valid digit.

For example, for the same input line, the ordering `"decfgba"` would map to this ordering of the segments:

```
 dddd
e    c
e    c
 ffff
g    b
g    b
 aaaa
```

which means that the signals `"cdfbe"` now represent the following:

```
 dddd
e    c
e    c
 ffff
.    b
.    b
 ....
```

which is not a valid digit.
So, this is what we have to do.

For example, we can use a helper function `translate` to take an ordering and the input signals, and return the set of translated segments:

In [16]:
def translate(ordering, digit):
    return {"abcdefg"[ordering.index(char)] for char in digit}

How does this work?

Well, first we need the segment information of all digits:

In [17]:
SEGMENTS = [
    set("abcefg"),
    set("cf"),
    set("acdeg"),
    set("acdfg"),
    set("bcdf"),
    set("abdfg"),
    set("abdefg"),
    set("acf"),
    set("abcdefg"),
    set("abcdfg"),
]
SEGMENTS

[{'a', 'b', 'c', 'e', 'f', 'g'},
 {'c', 'f'},
 {'a', 'c', 'd', 'e', 'g'},
 {'a', 'c', 'd', 'f', 'g'},
 {'b', 'c', 'd', 'f'},
 {'a', 'b', 'd', 'f', 'g'},
 {'a', 'b', 'd', 'e', 'f', 'g'},
 {'a', 'c', 'f'},
 {'a', 'b', 'c', 'd', 'e', 'f', 'g'},
 {'a', 'b', 'c', 'd', 'f', 'g'}]

We use sets here because we have seen that the input signals/segments aren't necessarily sorted and we don't have to keep sorting them.

Now, we can check that the ordering `"deafgbc"` does turn `"cdfbe"` into a five:

In [18]:
SEGMENTS[5] == translate("deafgbc", "cdfbe")

True

We can also check that the ordering `"decfgba"` turns `"cdfbe"` into an illegal digit:

In [19]:
translate("decfgba", "cdfbe") in SEGMENTS

False

If we put all our ideas together, this is what we end up with:

In [20]:
from itertools import permutations

BASE = "abcdefg"
SEGMENTS = [
    set("abcefg"),
    set("cf"),
    set("acdeg"),
    set("acdfg"),
    set("bcdf"),
    set("abdfg"),
    set("abdefg"),
    set("acf"),
    set("abcdefg"),
    set("abcdfg"),
]
ALL_ORDERINGS = list(permutations(BASE))

def translate(ordering, digit):
    return {BASE[ordering.index(char)] for char in digit}

with open(INPUT_FILE, "r") as f:
    lines = f.readlines()

acc = 0
for line in lines:
    hints, values = line.split(" | ")
    hints = hints.split()
    values = values.split()
    for ordering in ALL_ORDERINGS:
        found_all = True
        for hint in hints:
            if translate(ordering, hint) not in SEGMENTS:
                found_all = False
                break
        if found_all:
            # If we got here, this is the correct ordering.
            digit_ints = [SEGMENTS.index(translate(ordering, value)) for value in values]
            acc += int("".join(str(digit) for digit in digit_ints))
            break
print(acc)

1063760


This is one way of implementing what we discussed, but there are other ways of doing this and there are things we can do to improve the style of our brute-force solution.

## Refactoring the brute-force solution

The first thing we can do is look at the beginning of the outer `for` loop:

In [21]:
for line in lines:
    hints, values = line.split(" | ")
    hints = hints.split()
    values = values.split()
    ...

Notice how we want to `split` both variables we just got out of the `line.split(" | ")` expression.
We could put the two assignments together:

In [22]:
for line in lines:
    hints, values = line.split(" | ")
    hints, values = hints.split(), values.split()
    ...

But this just makes it even more clear that there is something we can do to simplify our parsing step.
In fact, after we do the initial split, we want to apply the `.split` method once more to each of the two results produced by `.split(" | ")`, so we can use `map` for just that.

What's really nice is that we can just keep the structural assignment in place:

In [23]:
for line in lines:
    hints, values = map(str.split, line.split(" | "))
    print(hints)
    print(values)
    break

['cdafg', 'dage', 'fgdaec', 'cdbfgae', 'cge', 'gcbdfa', 'fdceb', 'gfceab', 'ge', 'ecfgd']
['eg', 'eg', 'dfecag', 'ge']


It is worthwhile noting that we are using `str.split` to refer to the object that is the string method `.split`.
This way, we can use it in functional code.

Now, we can focus our attention on the inner `for` loop:

```py
for line in lines:
    hints, values = map(str.split, line.split(" | "))
    for ordering in ALL_ORDERINGS:
        found_all = True
        for hint in hints:
            if translate(ordering, hint) not in DIGIT_SEGMENTS:
                found_all = False
                break
        if found_all:
            # If we got here, this is the correct ordering.
            digit_ints = [SEGMENTS.index(translate(ordering, value)) for value in values]
            acc += int("".join(str(digit) for digit in digit_ints))
            break
```

Notice how the Boolean variable `found_all` is acting as an accumulator, although a Boolean one.
This time, instead of summing up integers, we are AND-ing a series of Booleans.
As soon as we find a `False` value, we break out of the loop and we never do the translation of the right-hand values.

What we are missing here is that, much like there is a `sum` function for summing up values, there are two functions that accumulate Boolean values: `any` and `all`.
`any` checks if there is any `True` value and `all` checks if all values are `True`.

We can use either one of them to check if the current `ordering` is the correct one, and convert the digit in that case.
We can use `all` to check if all the hints map to a legal digit:

In [24]:
acc = 0
for line in lines:
    hints, values = map(str.split, line.split(" | "))
    for ordering in ALL_ORDERINGS:
        if all(translate(ordering, hint) in SEGMENTS for hint in hints):
            digit_ints = [SEGMENTS.index(translate(ordering, value)) for value in values]
            acc += int("".join(str(digit) for digit in digit_ints))
            break
print(acc)

1063760


Or we can use `any` to check if any hint maps to an illegal digit:

In [25]:
acc = 0
for line in lines:
    hints, values = map(str.split, line.split(" | "))
    for ordering in ALL_ORDERINGS:
        if any(translate(ordering, hint) not in SEGMENTS for hint in hints):
            continue
        digit_ints = [SEGMENTS.index(translate(ordering, value)) for value in values]
        acc += int("".join(str(digit) for digit in digit_ints))
        break
print(acc)

1063760


I prefer using the built-in `all` because it is more direct here.

Another thing we can, and should, rework, is how we turn a list of digits into an integer.
In the code above, we just convert all digits into strings, join them together, and then use the built-in `int` to do the job.
This works, but is a rather roundabout way of doing it.

Consider this exercise:
how would you implement a function that accepts a list of integer digits in any base and returns the appropriate decimal integer?
Suppose your function is called `from_base`; this is what it should do:

```py
from_base([1, 2, 3, 4], 10)  # 1234
from_base([1, 1, 0, 1],  2)  # 13
from_base([15, 15, 10], 16)  # 4090 (This is 0xffa in hexadecimal.)
```

A nice, idiomatic Python expression follows:

In [26]:
from functools import reduce

def from_base(digits, base=10):
    return reduce(lambda l, r: l * base + r, digits, 0)

print(from_base([1, 2, 3, 4], 10))  # 1234
print(from_base([1, 1, 0, 1],  2))  # 13
print(from_base([15, 15, 10], 16))  # 4090 (This is 0xffa in hexadecimal.)

1234
13
4090


Notice the use of `reduce`, because we are reducing a list of digits down to a single number.

With this proper definition of `from_base`, we can finalise our brute-force solution:

In [27]:
from functools import reduce
from itertools import permutations

BASE = "abcdefg"
SEGMENTS = [
    set("abcefg"),
    set("cf"),
    set("acdeg"),
    set("acdfg"),
    set("bcdf"),
    set("abdfg"),
    set("abdefg"),
    set("acf"),
    set("abcdefg"),
    set("abcdfg"),
]
ALL_ORDERINGS = list(permutations(BASE))

def translate(ordering, digit):
    return {BASE[ordering.index(char)] for char in digit}

def from_base(digits, base=10):
    return reduce(lambda l, r: l * base + r, digits, 0)

with open(INPUT_FILE, "r") as f:
    lines = f.readlines()

acc = 0
for line in lines:
    hints, values = map(str.split, line.split(" | "))
    for ordering in ALL_ORDERINGS:
        if all(translate(ordering, hint) in SEGMENTS for hint in hints):
            digit_ints = [SEGMENTS.index(translate(ordering, value)) for value in values]
            acc += int("".join(str(digit) for digit in digit_ints))
            break
print(acc)

1063760


Finally, we can make use of [two of the most underappreciated string methods](https://mathspp.com/blog/pydonts/string-translate-and-maketrans-methods) to rewrite the `translate` method.
After all, what we are doing is literally employing the string method `.translate`:

In [28]:
from functools import reduce
from itertools import permutations

BASE = "abcdefg"
SEGMENTS = [
    set("abcefg"),
    set("cf"),
    set("acdeg"),
    set("acdfg"),
    set("bcdf"),
    set("abdfg"),
    set("abdefg"),
    set("acf"),
    set("abcdefg"),
    set("abcdfg"),
]
ALL_ORDERINGS = [str.maketrans("".join(perm), BASE) for perm in permutations(BASE)]

def from_base(digits, base=10):
    return reduce(lambda l, r: l * base + r, digits, 0)

with open(INPUT_FILE, "r") as f:
    lines = f.readlines()

acc = 0
for line in lines:
    hints, values = map(str.split, line.split(" | "))
    for ordering in ALL_ORDERINGS:
        if all(set(hint.translate(ordering)) in SEGMENTS for hint in hints):
            digit_ints = [SEGMENTS.index(set(value.translate(ordering))) for value in values]
            acc += from_base(digit_ints, 10)
            break
print(acc)

1063760


## Logically deducing the correct digits

A different approach to this problem entailed using logical reasoning beforehand to figure out how to deduce which hints pertain to which digits, starting off with the “easy” digits `1`, `4`, `7`, and `8`.
Then, we just had to write some code to check for that.

And again, recall that we only need to identify digits.
We don't even need to figure out the correct ordering of the segments!
So, can we just use the information about the segments of some digits to figure out what sets of segments pertain to other digits?

That is, starting with the information about the segments of the top row, how can we deduce the bottom row?

```
   1:      4:      7:      8:  
  ....    ....    XXXX    XXXX 
 .    X  X    X  .    X  X    X
 .    X  X    X  .    X  X    X
  ....    XXXX    ....    XXXX 
 .    X  .    X  .    X  X    X
 .    X  .    X  .    X  X    X
  ....    ....    ....    XXXX 
 
   0:      2:      3:      5:      6:      9:   
  XXXX    XXXX    XXXX    XXXX    XXXX    XXXX  
 X    X  .    X  .    X  X    .  X    .  X    X 
 X    X  .    X  .    X  X    .  X    .  X    X 
  ....    XXXX    XXXX    XXXX    XXXX    XXXX  
 X    X  X    .  .    X  .    X  X    X  .    X 
 X    X  X    .  .    X  .    X  X    X  .    X 
  XXXX    XXXX    XXXX    XXXX    XXXX    XXXX  
```

One thing we can do first is realise that the digits `1`, `4`, `7`, and `8`, are easy to deduce because they are the only digits that use a specific number of segments.
For example, a digit using two segments can only ever be digit `1`, never a `4` or a `9`.
Similarly, digits using five segments can only be `2`, `3`, or `5`; and digits using six segments can only be `0`, `6`, or `9`.

Thus, we need to use the top row to distinguish the numbers of the two rows that follow:

```
   1:      4:      7:      8:  
  ....    ....    XXXX    XXXX 
 .    X  X    X  .    X  X    X
 .    X  X    X  .    X  X    X
  ....    XXXX    ....    XXXX 
 .    X  .    X  .    X  X    X
 .    X  .    X  .    X  X    X
  ....    ....    ....    XXXX 
 
# Using 5 segments:
   2:      3:      5:   
  XXXX    XXXX    XXXX  
 .    X  .    X  X    . 
 .    X  .    X  X    . 
  XXXX    XXXX    XXXX  
 X    .  .    X  .    X 
 X    .  .    X  .    X 
  XXXX    XXXX    XXXX

# Using 6 segments:
   0:      6:      9:   
  XXXX    XXXX    XXXX  
 X    X  X    .  X    X 
 X    X  X    .  X    X 
  ....    XXXX    XXXX  
 X    X  X    X  .    X 
 X    X  X    X  .    X 
  XXXX    XXXX    XXXX
```

For example, among the digits that use five segments, `3` is the only one that contains the segments that `1` contains:

```
   3:  
  XXXX 
 .    O
 .    O
  XXXX 
 .    O
 .    O
  XXXX 
```

Then, if we look at the segments that `4` does **not** use (the complement of `4`), those are contained inside `2`:

```
   4:        not 4:        2:   
  ....        XXXX        OOOO  
 X    X      .    .      .    X 
 X    X      .    .      .    X 
  XXXX   ->   ....   ->   XXXX  
 .    X      X    .      O    . 
 .    X      X    .      O    . 
  ....        XXXX        OOOO 
```

Thus, `5` is the only digit left that uses five segments.

Now, we know about the digits `1`, `2`, `3`, `4`, `5`, `7`, and `8`, we only have to find out about `0`, `6`, and `9`:

```
   1:      2:      3:      4:      5:      7:      8:   
  ....    XXXX    XXXX    ....    XXXX    XXXX    XXXX  
 .    X  .    X  .    X  X    X  X    .  .    X  X    X 
 .    X  .    X  .    X  X    X  X    .  .    X  X    X 
  ....    XXXX    XXXX    XXXX    XXXX    ....    XXXX  
 .    X  X    .  .    X  .    X  .    X  .    X  X    X 
 .    X  X    .  .    X  .    X  .    X  .    X  X    X 
  ....    XXXX    XXXX    ....    XXXX    ....    XXXX  
  
   0:      6:      9:   
  XXXX    XXXX    XXXX  
 X    X  X    .  X    X 
 X    X  X    .  X    X 
  ....    XXXX    XXXX  
 X    X  X    X  .    X 
 X    X  X    X  .    X 
  XXXX    XXXX    XXXX  
```

For example, we can see that the segments that `1` does **not** use (the complement of `1`) are contained only inside `6`:

```
   1:        not 1:        6:
  ....        XXXX        OOOO  
 .    X      X    .      O    . 
 .    X      X    .      O    . 
  ....   ->   XXXX   ->   OOOO  
 .    X      X    .      O    X 
 .    X      X    .      O    X 
  ....        XXXX        OOOO  
```

Also, and perhaps more clearly, `9` is the only number that contains a `3` inside it:

```
   3:          9:   
  XXXX        OOOO  
 .    X      X    O 
 .    X      X    O 
  XXXX   ->   OOOO  
 .    X      .    O 
 .    X      .    O 
  XXXX        OOOO   
```

Finally, the `0` is the digit that is left.
These are not the only logical deductions that you can make, nor do they have to be in this order.

All the considerations above were based on properties such as “the segments are contained in” or its negation, which shows that we care about doing membership testing.
When we want to do membership testing a lot, and we do not care about the order of things, the built-in type `set` is very useful, thus we will use it.

In [48]:
from functools import reduce

BASE = frozenset("abcdefg")

def from_base(digits, base=10):
    return reduce(lambda l, r: l * base + r, digits, 0)

def comp(a_set):
    """Returns the segments not used by a_set."""
    return BASE - a_set

def with_length(sets, length):
    """Returns all sets with the given length."""
    return {s for s in sets if len(s) == length}

def deduce(hints):
    """Deduce which set is which digit from a set containing all digits."""
    # Start by finding the obvious digits
    one = with_length(hints, 2).pop()
    four = with_length(hints, 4).pop()
    seven = with_length(hints, 3).pop()
    eight = with_length(hints, 7).pop()
    
    # Distinguish between 2, 3, and 5
    two_three_five = with_length(hints, 5)
    three = {s for s in two_three_five if one < s}.pop()
    two_five = two_three_five - {three}
    two = {s for s in two_five if comp(four) < s}.pop()
    five = (two_five - {two}).pop()
    
    # Distinguish between 0, 6, and 9
    zero_six_nine = with_length(hints, 6)
    nine = {s for s in zero_six_nine if three < s}.pop()
    zero_six = zero_six_nine - {nine}
    six = {s for s in zero_six if comp(one) < s}.pop()
    zero = (zero_six - {six}).pop()
    
    return [zero, one, two, three, four, five, six, seven, eight, nine]


with open(INPUT_FILE, "r") as f:
    lines = f.readlines()

acc = 0
for line in lines:
    hints, values = [
        [frozenset(part) for part in half.split()]
        for half in line.split(" | ")
    ]
    correct_digits = deduce(hints)
    digits = [correct_digits.index(digit) for digit in values]
    acc += from_base(digits, 10)
print(acc)

1063760


In [52]:
s = {1, 2, 3}
print(s.discard(3))
s

None


{1, 2}

The code above is the Python equivalent of what we just described before, although there might be some `set` operations that need explaining.

For a more in-depth overview of Python sets, you can read [this article](https://mathspp.com/blog/pydonts/set-and-frozenset).
Here is a quick overview of the `set`-related concepts and operations we are using:

 1. set difference with `-`: `set1 - set2` is equivalent to `{elem for elem in s1 if elem not in s2}`;
 2. `frozenset`: a frozen set is an immutable set, and it's the only type of set that can be contained inside another set;
 3. `set.pop`: the method `.pop` of sets returns a random element from within the set and removes it. When used on sets with a single element (like above) it returns the only element in the set; and
 4. set inclusion with `<`: `set1 < set2` checks if `set1` is strictly contained within `set2` and is equivalent to `all(elem in set2 for elem in set1) and len(set2) > len(set1)`.

One small thing to note is the pattern `set - {elem}` that was used a couple of times in the above.
This is one way of removing an element from a set, but there are two methods that would achieve the same effect:

 - `a_set.remove(x)`, which removes the element `x` from `a_set` and raises an error if `x` isn't in the set; and
 - `a_set.discard(x)`, which tries to remove the element `x` from `a_set` but doesn't complain if the element isn't there.

However, because both these methods mutate the same set, and we were trying to compute new sets, we opted to go for the expression `set - {elem}`.

Other than modifying the order of the logical deductions, there isn't much to be changed in this specific implementation of our solution, but we can look at another way of identifying the digits that is even more straightforward.

## Unique identifier for the digits

As we have seen above, we can identify the digits without having to identify which segment is which.
What we will see now, is that we can identify each digit without even having to compare them amongst each other!

In order to see how that is going to be done, consider the original segments:

```
  0:      1:      2:      3:      4:
 aaaa    ....    aaaa    aaaa    ....
b    c  .    c  .    c  .    c  b    c
b    c  .    c  .    c  .    c  b    c
 ....    ....    dddd    dddd    dddd
e    f  .    f  e    .  .    f  .    f
e    f  .    f  e    .  .    f  .    f
 gggg    ....    gggg    gggg    ....

  5:      6:      7:      8:      9:
 aaaa    aaaa    aaaa    aaaa    aaaa
b    .  b    .  .    c  b    c  b    c
b    .  b    .  .    c  b    c  b    c
 dddd    dddd    ....    dddd    dddd
.    f  e    f  .    f  e    f  .    f
.    f  e    f  .    f  e    f  .    f
 gggg    gggg    ....    gggg    gggg
```

And a respective representation in Python:

In [54]:
SEGMENTS = [
    {'a', 'b', 'c', 'e', 'f', 'g'},
    {'c', 'f'},
    {'a', 'c', 'd', 'e', 'g'},
    {'a', 'c', 'd', 'f', 'g'},
    {'b', 'c', 'd', 'f'},
    {'a', 'b', 'd', 'f', 'g'},
    {'a', 'b', 'd', 'e', 'f', 'g'},
    {'a', 'c', 'f'},
    {'a', 'b', 'c', 'd', 'e', 'f', 'g'},
    {'a', 'b', 'c', 'd', 'f', 'g'},
]

Now, let's compute some statistics about each of the segments.
In particular, let's see how often each segment is used:

In [56]:
from collections import Counter
from itertools import chain
counts = Counter(chain.from_iterable(SEGMENTS))
counts

Counter({'c': 8, 'g': 7, 'b': 6, 'a': 8, 'f': 9, 'e': 4, 'd': 7})

Now we know how often each segment is used.
The next step, is computing a summary statistic for each of the digits.
How?
For each digit, we attribute a score: its score is the sum of all the times each segment of that digit is used across the board:

In [58]:
scores = [sum(counts[seg] for seg in digit) for digit in SEGMENTS]
scores

[42, 17, 34, 39, 30, 37, 41, 25, 49, 45]

As we can see, these “scores” are unique for each digit and, _what is more_, they don't depend on the actual segment letters used!
In fact, they just depend on the frequency with which each letter is used!

So, how can we use this to identify the values based on the hints?

Let's revisit a previous example:

```
acedgfb cdfbe gcdfa fbcad dab cefabd cdfgeb eafb cagedb ab |
cdfeb fcadb cdfeb cdbaf
```

The values here are `5`, `3`, `5`, and `3`.
We compute summary statistics for the hints on the left and then score each digit, even though the segments are jumbled up:

In [61]:
hints = "acedgfb cdfbe gcdfa fbcad dab cefabd cdfgeb eafb cagedb ab".split()
these_counts = Counter(chain.from_iterable(hints))
these_scores = [sum(these_counts[seg] for seg in hint) for hint in hints]
these_scores

[49, 37, 34, 39, 25, 45, 41, 30, 42, 17]

Notice how the values inside `these_scores` match the values in `scores` above, although the order is different.
We can index into `scores` to figure out which hint matches which digit:

In [62]:
[scores.index(score) for score in these_scores]

[8, 5, 2, 3, 7, 9, 6, 4, 0, 1]

And if we use a dictionary to map the original hints to the actual digits, we can map the values to the true digits.
We just have to be careful because each string can be jumbled up in itself, but we can use (frozen)sets to fix that:

In [64]:
mapping = {frozenset(hint): scores.index(score) for hint, score in zip(hints, these_scores)}
values = "cdfeb fcadb cdfeb cdbaf".split()
[mapping[frozenset(value)] for value in values]

[5, 3, 5, 3]

Putting all of this together, yields the following code:

In [65]:
from collections import Counter
from functools import reduce
from itertools import chain

def from_base(digits, base=10):
    return reduce(lambda l, r: l * base + r, digits, 0)

SEGMENTS = [
    {'a', 'b', 'c', 'e', 'f', 'g'},
    {'c', 'f'},
    {'a', 'c', 'd', 'e', 'g'},
    {'a', 'c', 'd', 'f', 'g'},
    {'b', 'c', 'd', 'f'},
    {'a', 'b', 'd', 'f', 'g'},
    {'a', 'b', 'd', 'e', 'f', 'g'},
    {'a', 'c', 'f'},
    {'a', 'b', 'c', 'd', 'e', 'f', 'g'},
    {'a', 'b', 'c', 'd', 'f', 'g'},
]
COUNTS = Counter(chain.from_iterable(SEGMENTS))
SCORES = [sum(COUNTS[seg] for seg in digit) for digit in SEGMENTS]

with open(INPUT_FILE, "r") as f:
    lines = f.readlines()

acc = 0
for line in lines:
    hints, values = [
        [frozenset(part) for part in half.split()]
        for half in line.split(" | ")
    ]
    counts = Counter(chain.from_iterable(hints))
    scores = [sum(counts[seg] for seg in digit) for digit in hints]
    mapping = dict(zip(hints, [SCORES.index(score) for score in scores]))
    digits = [mapping[value] for value in values]
    acc += from_base(digits, 10)
print(acc)

1063760


## Conclusion

Although it would be interesting, in this problem we did not implement a full logic deduction system that abstracts away the context of the problem and takes into account only the hints that it is given.
Practicality beats purity, and this problem was best solved on paper first, and coded after.
This also shows that it is worthwhile to consider the data that we have in all its dimensions, and across all its dimensions.

Finally, don't focus too much on the micro: maybe some of us felt that we _needed_ to figure out which segment was which.
Turns out we could get away without having to worry with that, at all!

If you have any questions, suggestions, remarks, recommendations, corrections, or anything else, you can reach out to me [on Twitter](https://twitter.com/mathsppblog) or via email to rodrigo at mathspp dot com.