# Day 6: Custom Customs

As your flight approaches the regional airport where you'll switch to a much larger plane, customs declaration forms are distributed to the passengers.

The form asks a series of 26 yes-or-no questions marked a through z. All you need to do is identify the questions for which anyone in your group answers "yes". Since your group is just you, this doesn't take very long.

However, the person sitting next to you seems to be experiencing a language barrier and asks if you can help. For each of the people in their group, you write down the questions for which they answer "yes", one per line. For example:

```text
abcx
abcy
abcz
```

In this group, there are 6 questions to which anyone answered "yes": a, b, c, x, y, and z. (Duplicate answers to the same question don't count extra; each question counts at most once.)

Another group asks for your help, then another, and eventually you've collected answers from every group on the plane (your puzzle input). Each group's answers are separated by a blank line, and within each group, each person's answers are on a single line. For example:

```text
abc

a
b
c

ab
ac

a
a
a
a

b
```

This list represents answers from five groups:

    The first group contains one person who answered "yes" to 3 questions: a, b, and c.
    The second group contains three people; combined, they answered "yes" to 3 questions: a, b, and c.
    The third group contains two people; combined, they answered "yes" to 3 questions: a, b, and c.
    The fourth group contains four people; combined, they answered "yes" to only 1 question, a.
    The last group contains one person who answered "yes" to only 1 question, b.

In this example, the sum of these counts is 3 + 3 + 3 + 1 + 1 = 11.

## Puzzle 1

For each group, count the number of questions to which anyone answered "yes". What is the sum of those counts?

In [1]:
# Python imports
from collections import defaultdict
from pathlib import Path
from typing import Dict, List, Set, Tuple

We approach the problem by loading data from a file. As we're loading the data here, we take the opportunity to process the answers into *unique* answers by using the `set` collection. In a `set`, every element occurs exactly once, no matter how many times it is added, and we use one set per group.

This means that we can add every answer for each group, but end up only with a single copy of each unique answer.

The parser below exhibits a fairly common pattern - we loop over every line in the file, and:

- if the line is blank, we know it's the end of a group, so we process the group data and prepare for the next group
- if the line is not blank, it's an entry for the current group, and we process it as such
- if we reach the end and we're still holding some data, the file hasn't ended with a blank line, and we process what we have as a single group's data.

Before we start parsing, we have to set up two containers:

1. a container to hold all of the groups (a `list` here)
2. a container to hold the current group data (a `set` here)

Both start empty. As we proceed through the first group, the `set` starts to accumulate data and, when the end of the group is reached (a blank line) the current `set` is added to the `list`, and a new *empty* `set` is prepared for the next group.

In this way, we guarantee that we capture all of the group data into separate containers.

In [2]:
def load_answers(fpath: str) -> List[Set]:
    with Path(fpath).open("r") as ifh:
        """Return list of sets with answers given by each group
        
        :param fpath:  path to file containing answers
        """
        
        group_answers = []
        group_set = set()
        
        for line in [_.strip() for _ in ifh.readlines()]:
            if len(line) == 0:  # catch the end of a group
                group_answers.append(group_set)
                group_set = set()
            else:  # process lines that contain a group answer
                group_set = group_set.union(set(list(line)))
        if len(group_set):  # catch cases with no empty line at the end of a file
            group_answers.append(group_set)
            group_set = set()
                
        return group_answers

Solve the test puzzle.

In [3]:
answers = load_answers("day06_test.txt")
print(answers)
sum([len(ans) for ans in answers])

[{'a', 'b', 'c'}, {'a', 'b', 'c'}, {'a', 'b', 'c'}, {'a'}, {'b'}]


11

Solve the real puzzle.

In [4]:
answers = load_answers("day06_data.txt")
sum([len(ans) for ans in answers])

6161

## Puzzle 2

As you finish the last group's customs declaration, you notice that you misread one word in the instructions:

You don't need to identify the questions to which anyone answered "yes"; you need to identify the questions to which everyone answered "yes"!

Using the same example as above:

```text
abc

a
b
c

ab
ac

a
a
a
a

b
```

This list represents answers from five groups:

    In the first group, everyone (all 1 person) answered "yes" to 3 questions: a, b, and c.
    In the second group, there is no question to which everyone answered "yes".
    In the third group, everyone answered yes to only 1 question, a. Since some people did not answer "yes" to b or c, they don't count.
    In the fourth group, everyone answered yes to only 1 question, a.
    In the fifth group, everyone (all 1 person) answered "yes" to 1 question, b.

In this example, the sum of these counts is 3 + 0 + 1 + 1 + 1 = 6.

For each group, count the number of questions to which everyone answered "yes". What is the sum of those counts?

Our approach here has to be slightly different to the one above.

To keep separation of different actions, we have distinct `load_answers()` and `find_common_answers()` functions. The first of these works like the parser in the original `load_answers()` function, but keeps track of group size and the count of each answer given in the group.

The parser uses a `defaultdict` which is a special kind of dictionary that automatically assigns values to new items. For instance here, `defaultdict(int)` assigns a value `0` to each item when it is created. This allows us to use `group_dict[ans] += 1` to start the count of each answer `ans` at 1.

In [5]:
def load_answers(fpath: str) -> List[Tuple[int, Dict]]:
    """Return a list of tuples of group size, and dictionary of answer counts.
    
    :param fpath:  path to the answer data file
    
    Returns a list of (group_size, group_answers) tuples, where group_answers
    is a dictionary keyed by answer with values the number of times that
    answer was given.
    """
    with Path(fpath).open("r") as ifh:
        
        group_answers = []
        group_dict = defaultdict(int)
        group_size = 0
        
        for line in [_.strip() for _ in ifh.readlines()]:
            if len(line) == 0:
                group_answers.append((group_size, group_dict))
                group_dict = defaultdict(int)
                group_size = 0
            else:
                for ans in list(line):
                    group_dict[ans] += 1
                group_size += 1
        if group_size:
            group_answers.append((group_size, group_dict))
            group_dict = defaultdict(int)
            group_size = 0

        return group_answers

The second function takes the output from our parser, and loops over the pair of `(group_size, group_answers)` values, and counts all the answers that were given by all members in the group.

In [6]:
def count_common_answers(answers: List[Tuple[int, Dict]]) -> List[int]:
    """Return count of answers given by all members in each group
    
    :param answers:  output from load_answers(); a list of (group_size, group_answers)
                     tuples, where group_answers is a dictionary keyed by answer with
                     values the number of times that answer was given.
    """
    common_answers = []
    for gsize, ganswers in answers:
        common_answers.append(len([k for k, v in ganswers.items() if v == gsize]))

    return common_answers

Solve test puzzle.

In [7]:
answers = load_answers("day06_test.txt")
canswers = count_common_answers(answers)
print(canswers)
sum(canswers)

[3, 0, 1, 1, 1]


6

Solve real puzzle.

In [8]:
answers = load_answers("day06_data.txt")
canswers = count_common_answers(answers)
sum(canswers)

2971

Note that we can recover our original puzzle solution with a function that counts the number of keys in each group dictionary:

In [9]:
def count_unique_answers(answers: List[Tuple[int, Dict]]) -> List[int]:
    """Return count of unique answers given by members in each group
    
    :param answers:  output from load_answers(); a list of (group_size, group_answers)
                     tuples, where group_answers is a dictionary keyed by answer with
                     values the number of times that answer was given.
    """
    unique_answers = []
    for gsize, ganswers in answers:
        unique_answers.append(len(ganswers))

    return unique_answers

In [10]:
answers = load_answers("day06_test.txt")
canswers = count_unique_answers(answers)
print(canswers)
sum(canswers)

[3, 3, 3, 1, 1]


11

In [11]:
answers = load_answers("day06_data.txt")
canswers = count_unique_answers(answers)
sum(canswers)

6161

This kind of pattern, where we separate out parsing a file from processing the parsed output in different ways, is more flexible than combining the parsing with analysis in the same function. It is an example of each function doing one well-defined thing.