# **Day 3: Gear Ratios**

# Setup
The cells below will set up the rest of the notebook. 

I'll start by configuring my kernel:

In [1]:
# Changing the current working directory
%cd ..

# Enabling the autoreload extension
%load_ext autoreload
%autoreload 2

d:\data\programming\advent-of-code-2023


Now, I'm going to import some libraries:

In [2]:
# Import statements
import pandas as pd
import re

Finally, I'll load in the data for this puzzle. 

In [11]:
# Load in the data for the puzzle
with open("data/input-files/day-03-input.txt", "r") as txt_file:
    input_data = [x.strip() for x in txt_file.readlines()]

# Finding Part Numbers
It shouldn't be too hard to actually locate the "part numbers" in the `input_data`. According to the instructions, a **part number** is: 

```
...any number adjacent to a symbol, even diagonally, is a "part number" and should be included in your sum. (Periods (.) do not count as a symbol.)
```

So, I can just parse the input into a coordinate grid of sorts, and then check to see which values are adjacent to symbols.

In [35]:
def parse_input_into_coordinate_grid(engine_schematic_lines):
    """
    This method will parse some `engine_schematic_lines` (i.e., the puzzle's input data, where
    each line of the string is a member of an array) into a coordinate grid. The method will 
    return two different things: 
    
    - `coordinate_grid` - a 2D array representing the engine schematic as a coordinate grid
    - `number_coordinate_spans` - a list of the coordinate spans that represent numbers
    """
    
    # Parse the engine_schematic_lines into a coordinate grid
    coordinate_grid = [[char for char in line] for line in engine_schematic_lines]

    # Compile a regex pattern that can be used for finding numbers 
    regex_pattern = re.compile('[0-9]+')

    # Iterate through each line from the engine_schematic_lines and determine the 
    # coordinate spans for the different numbers
    number_coordinate_spans = []
    for row_idx, line in enumerate(engine_schematic_lines):
        
        # Determine all of the coordinate spans within this line that contain numbers
        for match in regex_pattern.finditer(line):
            
            # Parse the span from the match
            match_span = match.span()
            
            # Store some information about this number's coordinate span 
            number_coordinate_spans.append(
                {
                    "number": int(match.group()),
                    "coordinate_span": [(row_idx, col_idx) for col_idx in range(match_span[0], match_span[1])]
                }
            )
    
    # Return the coordinate grid and the coordinate spans 
    return coordinate_grid, number_coordinate_spans

With this method in hand, we can parse the input into a grid: 

In [37]:
# Parse the engine_schematic_lines into a coordinate grid
coordinate_grid, number_coordinate_spans = parse_input_into_coordinate_grid(input_data)

# Turn the number coordinate spans into a DataFrame
number_coordinate_spans_df = pd.DataFrame.from_records(number_coordinate_spans)

# Show off the first couple of rows of the number_coordinate_spans_df
number_coordinate_spans_df.head(3)

Unnamed: 0,number,coordinate_span
0,380,"[(0, 26), (0, 27), (0, 28)]"
1,143,"[(0, 52), (0, 53), (0, 54)]"
2,108,"[(0, 83), (0, 84), (0, 85)]"


Now: we need to iterate through each of the number coordinate spans and determine whether or not they're actually **part numbers**. I'll define a method below that'll check whether or not any of the indices in each of the coordinate spans are next to a symbol:

In [69]:
def determine_if_number_is_part_number(coordinate_grid, coordinate_span):
    """
    This method will determine whether or not the number found at a particular `coordinate_span`
    is adjacent to any symbol in the `coordinate_grid`.
    """

    # Determine how many rows are in this coordinate grid
    n_rows = len(coordinate_grid)

    # Indicate all of the transformations that can be made to check the adjacent tiles
    adjacent_tile_transformations = [
        [0, 1],
        [1, 1],
        [1, 0],
        [1, -1],
        [0, -1],
        [-1, -1],
        [-1, 0],
        [-1, 1],
    ]

    # Iterate through erach of the coordinates in the coordinate span
    for cur_coordinate in coordinate_span:
        # Iterate through each of the adjacent tile transformations and check if the coordinate
        # has a symbol at it
        for transformation in adjacent_tile_transformations:
            # Determine how many columns are in the current line
            n_cols_in_cur_row = len(coordinate_grid[cur_coordinate[0]])

            # Determine what coordinate we ought to be checking for a symbol
            coordinate_to_check = [
                cur_coordinate[0] + transformation[0],
                cur_coordinate[1] + transformation[1],
            ]

            # If the coordinate_to_check is outside of the boundaries of the coordinate grid, we'll skip it
            if (
                coordinate_to_check[0] < 0
                or coordinate_to_check[1] < 0
                or coordinate_to_check[0] >= n_cols_in_cur_row
                or coordinate_to_check[1] >= n_rows
            ):
                continue

            # Otherwise, we'll check if the coordinate is a symbol
            char_at_coordinate_to_check = coordinate_grid[coordinate_to_check[0]][
                coordinate_to_check[1]
            ]

            if (
                not char_at_coordinate_to_check.isalnum()
                and char_at_coordinate_to_check != "."
            ):
                return True

    # If we've made it to the end of this method without returning True, then the coordinate span
    # is NOT a part number (and we should return False)
    return False

With this method in hand, we'll add a column to the `number_coordinate_spans_df` indicating whether or not each number is a part number:

In [70]:
# Add a column indicating whether a number is a part number or not
number_coordinate_spans_df["is_part_number"] = number_coordinate_spans_df.apply(
    lambda row: determine_if_number_is_part_number(
        coordinate_grid, row.coordinate_span
    ),
    axis=1,
)

# Show off the value counts
number_coordinate_spans_df["is_part_number"].value_counts()

is_part_number
True     1047
False     146
Name: count, dtype: int64

Now that I've got this column, I can determine the sum of the part numbers!

In [73]:
# Determine the 
part_number_sum = number_coordinate_spans_df[number_coordinate_spans_df["is_part_number"]]["number"].sum()

# Print this information
print(f"The sum of the part numbers I've identified is '{part_number_sum}'")

The sum of the part numbers I've identified is '526404'


# Part 2: ___