# Day 19: Monster Messages

You land in an airport surrounded by dense forest. As you walk to your high-speed train, the Elves at the Mythical Information Bureau contact you again. They think their satellite has collected an image of a sea monster! Unfortunately, the connection to the satellite is having problems, and many of the messages sent back from the satellite have been corrupted.

They sent you a list of the rules valid messages should obey and a list of received messages they've collected so far (your puzzle input).

The rules for valid messages (the top part of your puzzle input) are numbered and build upon each other. For example:

```text
0: 1 2
1: "a"
2: 1 3 | 3 1
3: "b"
```

Some rules, like 3: "b", simply match a single character (in this case, b).

The remaining rules list the sub-rules that must be followed; for example, the rule 0: 1 2 means that to match rule 0, the text being checked must match rule 1, and the text after the part that matched rule 1 must then match rule 2.

Some of the rules have multiple lists of sub-rules separated by a pipe (|). This means that at least one list of sub-rules must match. (The ones that match might be different each time the rule is encountered.) For example, the rule 2: 1 3 | 3 1 means that to match rule 2, the text being checked must match rule 1 followed by rule 3 or it must match rule 3 followed by rule 1.

Fortunately, there are no loops in the rules, so the list of possible matches will be finite. Since rule 1 matches a and rule 3 matches b, rule 2 matches either ab or ba. Therefore, rule 0 matches aab or aba.

Here's a more interesting example:

```text
0: 4 1 5
1: 2 3 | 3 2
2: 4 4 | 5 5
3: 4 5 | 5 4
4: "a"
5: "b"
```

Here, because rule 4 matches a and rule 5 matches b, rule 2 matches two letters that are the same (aa or bb), and rule 3 matches two letters that are different (ab or ba).

Since rule 1 matches rules 2 and 3 once each in either order, it must match two pairs of letters, one pair with matching letters and one pair with different letters. This leaves eight possibilities: aaab, aaba, bbab, bbba, abaa, abbb, baaa, or babb.

Rule 0, therefore, matches a (rule 4), then any of the eight options from rule 1, then b (rule 5): aaaabb, aaabab, abbabb, abbbab, aabaab, aabbbb, abaaab, or ababbb.

The received messages (the bottom part of your puzzle input) need to be checked against the rules so you can determine which are valid and which are corrupted. Including the rules and the messages together, this might look like:

```text
0: 4 1 5
1: 2 3 | 3 2
2: 4 4 | 5 5
3: 4 5 | 5 4
4: "a"
5: "b"

ababbb
bababa
abbbab
aaabbb
aaaabbb
```

Your goal is to determine the number of messages that completely match rule 0. In the above example, ababbb and abbbab match, but bababa, aaabbb, and aaaabbb do not, producing the answer 2. The whole message must match all of rule 0; there can't be extra unmatched characters in the message. (For example, aaaabbb might appear to match rule 0 above, but it has an extra unmatched b on the end.)

How many messages completely match rule 0?

In [1]:
# Python imports
import re

from pathlib import Path

We break out reading of the input file into two functions: one that reads the file and processes messages (`load_data()`) and passes the more tricky rule-parsing out to the second function `parse_line()`:

In [2]:
def parse_line(line):
    rule = []
    for elem in line:
        try:
            rule.append(int(elem))
        except ValueError as err:
            rule.append(str(elem.replace('"', "")))
    return rule

In [3]:
def load_data(fpath):
    rules = {}
    messages = []
    state = "rules"
    with Path(fpath).open("r") as ifh:
        for line in [_.strip().split() for _ in ifh.readlines()]:
            if len(line) == 0:
                state = "messages"
                continue
            if state == "rules":
                rules[int(line[0][:-1])] = parse_line(line[1:])
            if state == "messages":
                messages.append(line[0])
    return rules, messages

Let's check that the rules and messages are being processed correctly.

In [4]:
rules, messages = load_data("day19_test.txt")
print(rules)
print(messages)

{0: [4, 1, 5], 1: [2, 3, '|', 3, 2], 2: [4, 4, '|', 5, 5], 3: [4, 5, '|', 5, 4], 4: ['a'], 5: ['b']}
['ababbb', 'bababa', 'abbbab', 'aaabbb', 'aaaabbb']


To solve the first puzzle, we compile each of the rules into a corresponding *regular expression*. Rule 0 is the "final" compiled expression, and the other rules are partial. Two rules correspond to letters: `a` and `b`. The rest are numerals. So long as there are numerals in any of the rules, our final compilation of the ultimate regular expression in rule zero is not complete, so we have a function `has_digit()` to test for this.

To compile rule 0, we use the function `compile_rules()`. This first goes through all of the rules, wrapping each in parentheses as its own group. Then, so long as we have not compiled rule 0 completely, we keep looping over the rules, replacing digits with the corresponding compiled rule (so long as that rule is compiled - i.e. has no digits in it). 

That eventually completes the compilation of rule 0 (and all other rules), and we then join the compiled rules into a string that can be understood as a regular expression by Python's `re` module.

The `count_valid_messages()` function tests each message in turn against rule 0, and returns the count of those that match.

In [5]:
def has_digit(rules):
    for key, rule in rules.items():
        for elem in rule:
            if isinstance(elem, int):
                return True
    return False

def compile_rules(rules):
    for key, rule in rules.items():
        if rule not in (["a"], ["b"]):
            rules[key] = ["("] + rule + [")"]
    
    while has_digit(rules):
        for key, rule in rules.items():
            new_rule = []
            for idx in rule:
                if str(idx) in "ab()|":
                    new_rule += [idx]
                else:
                    new_rule += rules[idx]
            rules[key] = new_rule
            
    for key, rule in rules.items():
        rules[key] = "".join(rule)
    return rules

def count_valid_messages(rules, messages):
    count = 0
    for message in messages:
        if re.match(rules[0]+"$", message) is not None:
            count += 1
    return count

We test these functions first on the test data:

In [6]:
rules, messages = load_data("day19_test.txt")
rules = compile_rules(rules)
count_valid_messages(rules, messages)

2

And then solve the puzzle.

In [7]:
rules, messages = load_data("day19_data.txt")
rules = compile_rules(rules)
count_valid_messages(rules, messages)

208

## Part Two

As you look over the list of messages, you realize your matching rules aren't quite right. To fix them, completely replace rules 8: 42 and 11: 42 31 with the following:

```text
8: 42 | 42 8
11: 42 31 | 42 11 31
```

This small change has a big impact: now, the rules do contain loops, and the list of messages they could hypothetically match is infinite. You'll need to determine how these changes affect which messages are valid.

Fortunately, many of the rules are unaffected by this change; it might help to start by looking at which rules always match the same set of values and how those rules (especially rules 42 and 31) are used by the new versions of rules 8 and 11.

(Remember, you only need to handle the rules you have; building a solution that could handle any hypothetical combination of rules would be significantly more difficult.)

For example:

```text
42: 9 14 | 10 1
9: 14 27 | 1 26
10: 23 14 | 28 1
1: "a"
11: 42 31
5: 1 14 | 15 1
19: 14 1 | 14 14
12: 24 14 | 19 1
16: 15 1 | 14 14
31: 14 17 | 1 13
6: 14 14 | 1 14
2: 1 24 | 14 4
0: 8 11
13: 14 3 | 1 12
15: 1 | 14
17: 14 2 | 1 7
23: 25 1 | 22 14
28: 16 1
4: 1 1
20: 14 14 | 1 15
3: 5 14 | 16 1
27: 1 6 | 14 18
14: "b"
21: 14 1 | 1 14
25: 1 1 | 1 14
22: 14 14
8: 42
26: 14 22 | 1 20
18: 15 15
7: 14 5 | 1 21
24: 14 1

abbbbbabbbaaaababbaabbbbabababbbabbbbbbabaaaa
bbabbbbaabaabba
babbbbaabbbbbabbbbbbaabaaabaaa
aaabbbbbbaaaabaababaabababbabaaabbababababaaa
bbbbbbbaaaabbbbaaabbabaaa
bbbababbbbaaaaaaaabbababaaababaabab
ababaaaaaabaaab
ababaaaaabbbaba
baabbaaaabbaaaababbaababb
abbbbabbbbaaaababbbbbbaaaababb
aaaaabbaabaaaaababaa
aaaabbaaaabbaaa
aaaabbaabbaaaaaaabbbabbbaaabbaabaaa
babaaabbbaaabaababbaabababaaab
aabbbbbaabbbaaaaaabbbbbababaaaaabbaaabba
```

Without updating rules 8 and 11, these rules only match three messages: bbabbbbaabaabba, ababaaaaaabaaab, and ababaaaaabbbaba.

However, after updating rules 8 and 11, a total of 12 messages match:

    bbabbbbaabaabba
    babbbbaabbbbbabbbbbbaabaaabaaa
    aaabbbbbbaaaabaababaabababbabaaabbababababaaa
    bbbbbbbaaaabbbbaaabbabaaa
    bbbababbbbaaaaaaaabbababaaababaabab
    ababaaaaaabaaab
    ababaaaaabbbaba
    baabbaaaabbaaaababbaababb
    abbbbabbbbaaaababbbbbbaaaababb
    aaaaabbaabaaaaababaa
    aaaabbaabbaaaaaaabbbabbbaaabbaabaaa
    aabbbbbaabbbaaaaaabbbbbababaaaaabbaaabba

After updating rules 8 and 11, how many messages completely match rule 0?

There's now a second set of test data. We try the original solution on this:

In [8]:
rules, messages = load_data("day19_test2.txt")
rules = compile_rules(rules)
count_valid_messages(rules, messages)

3

Adding the new rules introduces infinite loops, which *can* be represented in regular expressions. However, I could not find a way to specify matching an equal number of pattern repeats for rule 11 (essentially `a{k}b{k}` for any `k`, but not `a{k}b{m}` where `k` != `m`). The naive representation of this rule works for the test data but not the real puzzle, so I "cheated" by specifying rule 11 manually.

First though, we have to account for cases where the rule contains the number for itself, by rewriting `compile_rules()`:

In [9]:
def compile_rules(rules):
    for key, rule in rules.items():
        if rule not in (["a"], ["b"]):
            rules[key] = ["("] + rule + [")"]
    
    while has_digit(rules):
        for key, rule in rules.items():
            new_rule = []
            for idx in rule:
                if str(idx) in "ab()|+?":
                    new_rule += [idx]
                elif re.match("\{\d*\}", str(idx)):
                    new_rule += [idx]
                else:
                    new_rule += rules[idx]
            rules[key] = new_rule
            
    for key, rule in rules.items():
        rules[key] = "".join(rule)
    return rules

I specified the naive version of rule 11 for the test data, which worked:

In [10]:
rules, messages = load_data("day19_test2.txt")
rules[8] = ["(", 42, ")", "+", "?"]
rules[11] = ["(", 42, ")", "+", "?", "(", 31, ")", "+", "?"]
rules = compile_rules(rules)
count_valid_messages(rules, messages)

12

But it doesn't work for the real data. Happily, manual specification of an equal number of repeats solves the problem.

In [11]:
rules, messages = load_data("day19_data.txt")
rules[8] = ["(", 42, ")", "+", "?"]
rules[11] = ["(", 42, ")", "{1}", "(", 31, ")", "{1}", "|",
             "(", 42, ")", "{2}", "(", 31, ")", "{2}", "|",
             "(", 42, ")", "{3}", "(", 31, ")", "{3}", "|",
             "(", 42, ")", "{4}", "(", 31, ")", "{4}", "|",
             "(", 42, ")", "{5}", "(", 31, ")", "{5}", "|",
             "(", 42, ")", "{6}", "(", 31, ")", "{6}", "|",
             "(", 42, ")", "{7}", "(", 31, ")", "{7}",]
rules = compile_rules(rules)
count_valid_messages(rules, messages)

316