# **Day 5: If You Give A Seed A Fertilizer**

This one seems a little convoluted, but ultimately not *too* hard - just need to do some careful parsing. 

# Setup
The cells below will set up the rest of the notebook. 

I'll start by configuring my kernel:

In [1]:
# Changing the current working directory
%cd ..

# Enabling the autoreload extension
%load_ext autoreload
%autoreload 2

/Users/thubbard/Documents/Personal/Programming/advent-of-code-2023


Now, I'm going to import some libraries:

In [97]:
# Import statements
import pandas as pd
import re
import random
import math
from tqdm import tqdm

Finally, I'll load in the data for this puzzle. 

In [3]:
# Load in the data for the puzzle
with open("data/input-files/day-05-input.txt", "r") as txt_file:
    input_data = txt_file.readlines()

# Parsing the Input Data
First thing's first: I need to iterate through each of the lines in the `input_data` and parse out the maps. 

In [30]:
# Parse the seeds from the input_data
seed_numbers = [int(x.strip()) for x in input_data[0].split(":")[1].strip().split(" ")]

# Make a list of only the lines that contain maps
input_data_map_lines = input_data[2:]

# Iterate through each of the lines, collecting the maps
maps = {}
cur_map = input_data_map_lines[0].split(" ")[0]
cur_map_range_rules = []
for line in input_data_map_lines[1:]:
    # Strip the line of newlines
    stripped_line = line.strip()

    # If this line is a blank string, skip it
    if stripped_line == "":
        continue

    # If we come across a new map, we're going to finish parsing the previous map and
    # set up the parsing of the next map
    if ":" in stripped_line:
        maps[cur_map] = cur_map_range_rules
        cur_map = stripped_line.split(" ")[0]
        cur_map_range_rules = []
        continue

    # Parse this line into the map range information
    dest_range_start, source_range_start, range_length = [
        int(x.strip()) for x in stripped_line.split(" ")
    ]

    # Add the range rules to the cur_map_range_rules
    cur_map_range_rules.append(
        {
            "source_range_start": source_range_start,
            "source_range_end": source_range_start + range_length,
            "dest_range_start": dest_range_start,
            "dest_range_end": dest_range_start + range_length,
            "range_length": range_length,
        }
    )

# Add the last ruleset
maps[cur_map] = cur_map_range_rules

# Now, we're going to create a DataFrame of the range rules for each map
map_range_rules_df_list = []
for map_name, range_rules in maps.items():
    cur_map_range_rules_df = pd.DataFrame.from_records(range_rules)
    cur_map_range_rules_df["map_type"] = map_name
    cur_map_range_rules_df["map_source"] = map_name.split("-")[0]
    cur_map_range_rules_df["map_dest"] = map_name.split("-")[-1]
    map_range_rules_df_list.append(cur_map_range_rules_df)
map_range_rules_df = pd.concat(map_range_rules_df_list)

# Show off a couple of random rows from the map_range_rules_df
map_range_rules_df.sample(5)

Unnamed: 0,source_range_start,source_range_end,dest_range_start,dest_range_end,range_length,map_type,map_source,map_dest
8,4104587615,4210612885,1588644556,1694669826,106025270,humidity-to-location,humidity,location
3,3536983174,3836977175,3502815281,3802809282,299994001,soil-to-fertilizer,soil,fertilizer
22,2790861223,2873230322,1847367009,1929736108,82369099,soil-to-fertilizer,soil,fertilizer
13,834540076,891714422,741140998,798315344,57174346,fertilizer-to-water,fertilizer,water
11,2415899186,2461900019,1571991394,1617992227,46000833,temperature-to-humidity,temperature,humidity


Now that we've parsed this, we can create a function that determines the different numbers associated with each of the seeds.

In [69]:
def determine_seed_path(seed_number, map_range_rules_df, starting_source_type="seed"):
    """
    This method will pass the `seed_number` through the various maps in the map_range_rules_df,
    determining which different numbers they ought to map to.
    """

    # We're going to create a dictionary of the different numbers
    numbers_dict = {}

    # Starting on seed, we'll iterate through the different maps
    cur_source_type = starting_source_type
    cur_source_number = seed_number
    while cur_source_type is not None:
        # Determine the rules associated with the cur_source_type
        cur_source_rules_df = map_range_rules_df.query("map_source==@cur_source_type")

        # Figure out the next source type
        try:
            next_source_type = cur_source_rules_df["map_dest"].unique()[0]
        # If we can't parse the next source type, we'll assume we're at the end of the "path"
        except:
            next_source_type = None

        # Determine whether there's a rule that applies to the current number
        cur_num_rule_df = cur_source_rules_df.query(
            "source_range_start <= @cur_source_number & source_range_end > @cur_source_number"
        )

        # If there's no rule, then the cur_dest_number will just be the same as the cur_source_number
        if len(cur_num_rule_df) == 0:
            cur_dest_number = cur_source_number

        # Otherwise, we'll determine what the cur_dest_number is by reading the range rule
        else:
            cur_num_rule_dict = cur_num_rule_df.iloc[0].to_dict()
            idx_in_range = cur_source_number - cur_num_rule_dict.get(
                "source_range_start"
            )
            cur_dest_number = cur_num_rule_dict.get("dest_range_start") + idx_in_range

        # Store all of the numbers for the current source type
        numbers_dict[cur_source_type] = cur_source_number

        # Now that we're done parsing the current source type, we'll iterate to the next one
        cur_source_type = next_source_type
        cur_source_number = cur_dest_number

    # Return the numbers_dict
    return numbers_dict

With this method in hand, we can parse the paths for each of the seed numbers:

In [70]:
# Create a DataFrame mapping each of the seed numbers to their corresponding numbers
seed_numbers_to_other_source_numbers_df = pd.DataFrame.from_records(
    [
        determine_seed_path(seed_number, map_range_rules_df)
        for seed_number in seed_numbers
    ]
)

# Show off some of the rows of the DataFrame
seed_numbers_to_other_source_numbers_df.head(5)

Unnamed: 0,seed,soil,fertilizer,water,light,temperature,humidity,location
0,5844012,2735666402,3285684146,1494404667,3316329223,4047860789,4047860789,4217167942
1,110899473,2840721863,1897227649,3517164697,2090766680,4254909717,4135623259,1619680200
2,1132285750,3426553621,3361682440,1570402961,3392327517,3879033928,1153467296,1230040740
3,58870036,2788692426,3338710170,1547430691,3369355247,3856061658,1130495026,1207068470
4,986162929,3280430800,2849630923,2430993569,2333026939,3334793292,3935687364,4104994517


Now to the main question: 

```
What is the lowest location number that corresponds to any of the initial seed numbers?
```

In [71]:
# Determine the lowest location number and print it
lowest_location_number = seed_numbers_to_other_source_numbers_df["location"].min()
print(f"The lowest location number is '{lowest_location_number}'")

The lowest location number is '825516882'


# Part 2: Parsing Seed Ranges
The "wrinkle" in part 2 is that the initial "seeds" array is actually ranges of numbers, represented by a `start_seed_number` and a `range_length`. I figured that I'd be able to just re-run each of the numbers through my method, but it seems like that may not totally work - there are *quite* a few seed numbers. 

In [72]:
# Determine how many seed numbers there will be in total
range_lens = [seed_num for idx, seed_num in enumerate(seed_numbers) if idx % 2 == 1]
print(f"There are {sum(range_lens):,} total seed numbers in the ranges.")

There are 1,624,044,411 total seed numbers in the ranges.


I *think* that this will be solvable via some sort of math trick, since brute forcing over 1.5 billion seed numbers through this map will take quite some time. 

I could iterate *back* through the maps, trying to identify which range contains the smallest number? That seems promising. Let's reverse the rules:

In [73]:
# Reverse the map_range_rules_df
reverse_map_range_rules_df_records = []
for row in map_range_rules_df.itertuples():
    reverse_map_range_rules_df_records.append(
        {
            "source_range_start": row.dest_range_start,
            "source_range_end": row.dest_range_end,
            "dest_range_start": row.source_range_start,
            "dest_range_end": row.source_range_end,
            "range_length": row.range_length,
            "map_type": f"{row.map_dest}-to-{row.map_source}",
            "map_source": row.map_dest,
            "map_dest": row.map_source,
        }
    )
reverse_map_range_rules_df = pd.DataFrame.from_records(
    reverse_map_range_rules_df_records
)

Now, with these rules reversed, we'll try and identify which of the ranges contains a possible number.

In [77]:
# Create a list of the location numbers
test_location_numbers = []
for row in (
    reverse_map_range_rules_df.query("map_source=='location'")
    .sort_values("source_range_start", ascending=True)
    .itertuples()
):
    test_location_numbers.append(row.source_range_start)
    test_location_numbers.append(row.source_range_end - 1)

# Create a DataFrame from the reverse traces
location_num_bounds_df = pd.DataFrame.from_records(
    [
        determine_seed_path(
            location_num, reverse_map_range_rules_df, starting_source_type="location"
        )
        for location_num in test_location_numbers
    ]
)

Now, we need to determine which of these location numbers produces seed numbers within the different ranges. 

In [91]:
# Determine all of the ranges
seed_ranges = []
for range_idx in range(int(len(seed_numbers) / 2)):
    seed_ranges.append((seed_numbers[range_idx * 2], seed_numbers[(range_idx * 2) + 1]))


def determine_if_seed_number_in_range(seed_number, seed_ranges):
    """
    This helper method will determine whether or not a seed number is in the possible seed ranges
    """
    for seed_range in seed_ranges:
        if seed_number >= seed_range[0] and seed_number < (
            seed_range[0] + seed_range[1]
        ):
            return True
    return False


# Add the "valid_seed_num" column using the method above
location_num_bounds_df["valid_seed_num"] = location_num_bounds_df["seed"].apply(
    lambda seed_num: determine_if_seed_number_in_range(seed_num, seed_ranges)
)

smallest_location_val = location_num_bounds_df[
    location_num_bounds_df["valid_seed_num"]
]["location"].min()
print(f"The smallest location value I found is '{smallest_location_val}'")

The smallest location value I found is '222429953'


Now: it seems like the smallest location number needs to be between 0 and 222429953. I could probably do some sort of binary search to figure out the lowest number. 

In [174]:
def determine_if_location_num_is_valid(location_num):
    """
    Helper method for determining if a location number is valid
    """
    return determine_if_seed_number_in_range(
        determine_seed_path(
            location_num, reverse_map_range_rules_df, starting_source_type="location"
        ).get("seed"),
        seed_ranges,
    )


def identify_lowest_valid_location_number(cur_left, cur_right):
    """
    This is a recursive method that will try and run binary search on ranges of location numbers.
    """

    # Determine the midpoint between these
    cur_mid = cur_left + math.floor((cur_right - cur_left) / 2)
    cur_range = cur_right - cur_left

    # Determine if the left is possible
    left_possible = determine_if_location_num_is_valid(cur_left)
    mid_possible = determine_if_location_num_is_valid(cur_mid)
    right_possible = determine_if_location_num_is_valid(cur_right)

    # If the middle is possible, we're going to look at the smaller range
    if cur_range <= 1 and right_possible:
        return cur_right
    elif mid_possible and not left_possible:
        return identify_lowest_valid_location_number(cur_left, cur_mid)
    elif right_possible and not mid_possible:
        return identify_lowest_valid_location_number(cur_mid, cur_right)
    elif left_possible:
        return cur_left
    

In [179]:
cur_left = 0
cur_right = 222429953
lowest_in_range = identify_lowest_valid_location_number(cur_left, cur_right)
while lowest_in_range != cur_right:
    cur_left = cur_left
    cur_right = lowest_in_range
    lowest_in_range = identify_lowest_valid_location_number(cur_left, cur_right)
lowest_in_range

136096660

My solution to this bit was pretty messy, but I did get it done! 