# Day 16: Ticket Translation

As you're walking to yet another connecting flight, you realize that one of the legs of your re-routed trip coming up is on a high-speed train. However, the train ticket you were given is in a language you don't understand. You should probably figure out what it says before you get to the train station after the next flight.

Unfortunately, you can't actually read the words on the ticket. You can, however, read the numbers, and so you figure out the fields these tickets must have and the valid ranges for values in those fields.

You collect the rules for ticket fields, the numbers on your ticket, and the numbers on other nearby tickets for the same train service (via the airport security cameras) together into a single document you can reference (your puzzle input).

The rules for ticket fields specify a list of fields that exist somewhere on the ticket and the valid ranges of values for each field. For example, a rule like class: 1-3 or 5-7 means that one of the fields in every ticket is named class and can be any value in the ranges 1-3 or 5-7 (inclusive, such that 3 and 5 are both valid in this field, but 4 is not).

Each ticket is represented by a single line of comma-separated values. The values are the numbers on the ticket in the order they appear; every ticket has the same format. For example, consider this ticket:

```text
.--------------------------------------------------------.
| ????: 101    ?????: 102   ??????????: 103     ???: 104 |
|                                                        |
| ??: 301  ??: 302             ???????: 303      ??????? |
| ??: 401  ??: 402           ???? ????: 403    ????????? |
'--------------------------------------------------------'
```

Here, ? represents text in a language you don't understand. This ticket might be represented as 101,102,103,104,301,302,303,401,402,403; of course, the actual train tickets you're looking at are much more complicated. In any case, you've extracted just the numbers in such a way that the first number is always the same specific field, the second number is always a different specific field, and so on - you just don't know what each position actually means!

Start by determining which tickets are completely invalid; these are tickets that contain values which aren't valid for any field. Ignore your ticket for now.

For example, suppose you have the following notes:

```text
class: 1-3 or 5-7
row: 6-11 or 33-44
seat: 13-40 or 45-50

your ticket:
7,1,14

nearby tickets:
7,3,47
40,4,50
55,2,20
38,6,12
```

It doesn't matter which position corresponds to which field; you can identify invalid nearby tickets by considering only whether tickets contain values that are not valid for any field. In this example, the values on the first nearby ticket are all valid for at least one field. This is not true of the other three nearby tickets: the values 4, 55, and 12 are are not valid for any field. Adding together all of the invalid values produces your ticket scanning error rate: 4 + 55 + 12 = 71.

Consider the validity of the nearby tickets you scanned. What is your ticket scanning error rate?

In [1]:
# Python imports
from math import prod
from pathlib import Path
from typing import Dict, List, Tuple

The puzzle has a more complex input file structure than usual, so we break out parsing field names/ranges into its own function for convenience:

In [2]:
def parse_field(line: str) -> Tuple[str, List[Tuple[int, int]]]:
    """Return (name, [ranges]) for a single definition line in puzzle input
    
    :param line:  definition of field ranges from puzzle input
    """
    vals = []
    name, valstr = line.strip().split(": ")
    valstr = valstr.split(" or ")
    for val in valstr:
        start, end = val.split("-")
        vals.append((int(start), int(end)))
    return name, vals

And for the data loading we actually have to switch parsing modes to get the data into the state we want: separate fields/field ranges, my ticket and nearby ticket elements.

We could turn this into a `namedtuple` or `Dataclass` but for now we keep it as a tuple.

In [3]:
def load_data(fpath: str) -> Tuple[Dict[str, List[Tuple[int]]], Tuple[int], List[Tuple[int]]]:
    """Return field range definitions, my ticket, and list of nearby tickets
    
    :param fpath:  path to the puzzle input file
    """
    fields = {}  # valid start, end ranges keyed by field name
    my_ticket = None  # list of field values on my ticket
    nearby_tickets = []  # lists of field  values on nearby tickets
    
    mode = "field"  # current parsing mode
    with Path(fpath).open("r") as ifh:
        for line in [_ for _ in ifh.readlines() if len(_.strip())]:
            if line.startswith("your"):  # switch parsing mode to "my ticket"
                mode = "your_ticket"
                continue
            if line.startswith("nearby"):  # switch parsing model to "nearby tickets"
                mode = "nearby_ticket"
                continue
                
            if mode == "field":  # parse field ranges
                name, vals = parse_field(line)
                fields[name] = vals
            elif mode == "your_ticket":  # parse my ticket
                my_ticket = tuple([int(_) for _ in line.strip().split(",")])
            elif mode == "nearby_ticket":  # parse nearby ticket
                nearby_tickets.append(tuple([int(_) for _ in line.strip().split(",")]))
            
    return fields, my_ticket, nearby_tickets

To solve the first part of the puzzle, we need to identify all values on each ticket that are not valid in any of the field ranges, then sum them.

This could be a useful function for later in the puzzle, so we create `get_invalid_ticket_values()` to return a set of values - from a single ticket - that do not correspond to any of the valid ranges.

In [4]:
def get_invalid_ticket_values(inticket: Tuple[int], fields: Dict[str, List[Tuple[int]]]):
    """Return invalid values on the ticket, according to fields
    
    :param inticket:  input ticket
    :param fields:  field ranges, keyed by field name
    """
    valid = set()  # valid numbers on a ticket
    ticket = set(inticket)  # convert ticket to set for easy processing
    
    # Check each value on the ticket against the ranges of each field and,
    # if the value is in the valid range, add it to the valid set
    for val in ticket:
        for key, ranges in fields.items():
            for start, end in ranges:
                if end >= val >= start:
                    valid.add(val)
                    continue  # skip on to next range
            if val in valid:
                continue  # skip on to next ticket value
                    
    # Return values on the ticket that are not in the valid set
    return ticket.difference(valid)

We check our own ticket for invalid values.

In [5]:
fields, my_ticket, nearby_tickets = load_data("day16_test.txt")
get_invalid_ticket_values(my_ticket, fields)

set()

Then we create a function to scan all nearby tickets for invalid values, and return their sum:

In [6]:
def scan_tickets(tickets: List[Tuple[int]], fields: Dict[str, List[Tuple[int]]]):
    """Return sum of invalid numbers on input tickets
    
    :param tickets:  input tickets
    :param fields:  field ranges, keyed by name
    """
    invalid = []  # invalid numbers from tickets
    
    # Check each ticket for invalid numbers; if number is invalid
    # add to the invalid list
    for ticket in tickets:
        vals = get_invalid_ticket_values(ticket, fields)
        if len(vals):
            invalid.extend(list(vals))
            
    # Return sum of invalid numbers
    return sum(invalid)

Solving the test data:

In [7]:
fields, my_ticket, nearby_tickets = load_data("day16_test.txt")
scan_tickets(nearby_tickets, fields)

71

And then the real data:

In [8]:
fields, my_ticket, nearby_tickets = load_data("day16_data.txt")
scan_tickets(nearby_tickets, fields)

19093

## Part Two

Now that you've identified which tickets contain invalid values, discard those tickets entirely. Use the remaining valid tickets to determine which field is which.

Using the valid ranges for each field, determine what order the fields appear on the tickets. The order is consistent between all tickets: if seat is the third field, it is the third field on every ticket, including your ticket.

For example, suppose you have the following notes:

```
class: 0-1 or 4-19
row: 0-5 or 8-19
seat: 0-13 or 16-19

your ticket:
11,12,13

nearby tickets:
3,9,18
15,1,5
5,14,9
```

Based on the nearby tickets in the above example, the first position must be row, the second position must be class, and the third position must be seat; you can conclude that in your ticket, class is 12, row is 11, and seat is 13.

Once you work out which field is which, look for the six fields on your ticket that start with the word departure. What do you get if you multiply those six values together?

We've got a new set of test data, so we sanity-check the input.

In [9]:
fields, my_ticket, nearby_tickets = load_data("day16_test2.txt")
fields, my_ticket, nearby_tickets

({'class': [(0, 1), (4, 19)],
  'row': [(0, 5), (8, 19)],
  'seat': [(0, 13), (16, 19)]},
 (11, 12, 13),
 [(3, 9, 18), (15, 1, 5), (5, 14, 9)])

To identify fields, we use a two-stage approach.

We set up the algorithm with a dictionary keyed by ticket column indices - initially, these could correspond to any of the fields, but we'll eliminate options as they are invalidated by each ticket.

Firstly, we take each ticket in turn, discarding invalid tickets. We check the number in each column of the (valid) ticket against each field range and, if the number is not valid for that field, eliminate the field from the dictionary above.

The second part of the process assumes that the dictionary is in a state where the problem can be solved. Any column in the dictionary that now contains only a single field is "solved" - the field belongs to that column. So, we can remove that field from all other columns. While there remain any unsolved columns, we loop over all keys and items in the dictionary, identifying solved columns. For each solved column we remove the corresponding field from all other columns. We repeat this until all columns are solved, then we return the solution.

In [10]:
def identify_fields(fields: Dict[str, List[Tuple[int]]], tickets: List[Tuple[int]]) -> Dict[int, List[str]]:
    """Identify the name associated with each field index on a ticket
    
    :param fields:  field ranges keyed by field name
    :param tickets:  input ticket values
    """
    # set up dictionary, keyed by ticket index
    # add all known fields to each index
    # we'll remove these fields one-by-one as they are invalidated
    fieldcols = {}
    for idx in range(len(tickets[0])):
        fieldcols[idx] = list(fields.keys())
    
    # exclude field from columns when an invalid value is found in that column
    for ticket in tickets:
        # Skip any tickets that are just invalid
        if len(get_invalid_ticket_values(ticket, fields)):
            continue

        # Check each valid ticket column
        for idx, val in enumerate(ticket):
            for key, ranges in fields.items():
                # If this column is valid for the field key, set valid_field to True
                valid_field = False
                for (start, end) in ranges:
                    if end >= val >= start:
                        valid_field = True
                # If the column is not a valid value for the field, remove that field
                # from the fieldcols dictionary
                if not valid_field:
                    try:
                        fieldcols[idx].remove(key)
                    except:
                        pass
    
    # identify any columns where there is only one possible field
    # this field cannot correspond to any other column, so remove it from all
    # other columns
    # repeat until all columns are assigned a field name
    # This is a hacky solution - better to avoid the rechecks of the same
    # field name! Could hold a list of "solved" columns, and remove fields
    # and columns from the dictionary when solved, for instance
    while any([len(_) > 1 for _ in fieldcols.values()]):  # False when solved
        for key, vals in fieldcols.items():
            if len(vals) == 1:  # Current field solved
                val = vals[0]
                # Remove this field name from all other columns
                for k, v in fieldcols.items():
                    if k != key:
                        try:
                            v.remove(val)
                        except:
                            pass
    
    # Return the solution
    return {k:v[0] for k, v in fieldcols.items()}

We check the test puzzle.

In [11]:
fields, my_ticket, nearby_tickets = load_data("day16_test2.txt")
identify_fields(fields, nearby_tickets)

{0: 'row', 1: 'class', 2: 'seat'}

Now we solve the real puzzle.

In [12]:
fields, my_ticket, nearby_tickets = load_data("day16_data.txt")
fieldnames = identify_fields(fields, nearby_tickets)
fieldnames

{0: 'arrival location',
 1: 'arrival track',
 2: 'row',
 3: 'seat',
 4: 'price',
 5: 'departure platform',
 6: 'departure date',
 7: 'arrival platform',
 8: 'duration',
 9: 'type',
 10: 'departure station',
 11: 'departure track',
 12: 'train',
 13: 'class',
 14: 'arrival station',
 15: 'wagon',
 16: 'departure time',
 17: 'route',
 18: 'zone',
 19: 'departure location'}

The value required is the product of the values on *my ticket* for the columns starting with the word "departure", and we can calculate this directly:

In [13]:
fieldids = [key for key, val in fieldnames.items() if val.startswith("departure")]
prod([my_ticket[idx] for idx in fieldids])

5311123569883