# **Day 5: If You Give A Seed A Fertilizer**

This one seems a little convoluted, but ultimately not *too* hard - just need to do some careful parsing. 

# Setup
The cells below will set up the rest of the notebook. 

I'll start by configuring my kernel:

In [1]:
# Changing the current working directory
%cd ..

# Enabling the autoreload extension
%load_ext autoreload
%autoreload 2

/Users/thubbard/Documents/Personal/Programming/advent-of-code-2023


Now, I'm going to import some libraries:

In [2]:
# Import statements
import pandas as pd
import re

Finally, I'll load in the data for this puzzle. 

In [3]:
# Load in the data for the puzzle
with open("data/input-files/day-05-input.txt", "r") as txt_file:
    input_data = txt_file.readlines()

# Parsing the Input Data
First thing's first: I need to iterate through each of the lines in the `input_data` and parse out the maps. 

In [125]:
# Parse the seeds from the input_data
seed_numbers = [int(x.strip()) for x in input_data[0].split(":")[1].strip().split(" ")]

# Make a list of only the lines that contain maps
input_data_map_lines = input_data[2:]

# Iterate through each of the lines, collecting the maps
maps = {}
cur_map = input_data_map_lines[0].split(" ")[0]
cur_map_range_rules = []
for line in input_data_map_lines[1:]:
    # Strip the line of newlines
    stripped_line = line.strip()

    # If this line is a blank string, skip it
    if stripped_line == "":
        continue

    # If we come across a new map, we're going to finish parsing the previous map and
    # set up the parsing of the next map
    if ":" in stripped_line:
        maps[cur_map] = cur_map_range_rules
        cur_map = stripped_line.split(" ")[0]
        cur_map_range_rules = []
        continue

    # Parse this line into the map range information
    dest_range_start, source_range_start, range_length = [
        int(x.strip()) for x in stripped_line.split(" ")
    ]

    # Add the range rules to the cur_map_range_rules
    cur_map_range_rules.append(
        {
            "dest_range_start": dest_range_start,
            "source_range_start": source_range_start,
            "source_range_end": source_range_start + range_length,
            "range_length": range_length,
        }
    )

# Add the last ruleset
maps[cur_map] = cur_map_range_rules

# Now, we're going to create a DataFrame of the range rules for each map
map_range_rules_df_list = []
for map_name, range_rules in maps.items():
    cur_map_range_rules_df = pd.DataFrame.from_records(range_rules)
    cur_map_range_rules_df["map_type"] = map_name
    cur_map_range_rules_df["map_source"] = map_name.split("-")[0]
    cur_map_range_rules_df["map_dest"] = map_name.split("-")[-1]
    map_range_rules_df_list.append(cur_map_range_rules_df)
map_range_rules_df = pd.concat(map_range_rules_df_list)

# Show off a couple of random rows from the map_range_rules_df
map_range_rules_df.sample(5)

Unnamed: 0,dest_range_start,source_range_start,source_range_end,range_length,map_type,map_source,map_dest
20,1929736108,2563409394,2623931701,60522307,soil-to-fertilizer,soil,fertilizer
28,3117493441,2792037356,2925825589,133788233,temperature-to-humidity,temperature,humidity
15,4096970661,1633036395,1691981480,58945085,light-to-temperature,light,temperature
11,2765973467,292784384,376201353,83416969,light-to-temperature,light,temperature
9,646363825,441617961,536395134,94777173,fertilizer-to-water,fertilizer,water


Now that we've parsed this, we can create a function that determines the different numbers associated with each of the seeds.

In [126]:
def determine_seed_path(seed_number, map_range_rules_df):
    """
    This method will pass the `seed_number` through the various maps in the map_range_rules_df, 
    determining which different numbers they ought to map to. 
    """
    
    # We're going to create a dictionary of the different numbers
    numbers_dict = {}

    # Starting on seed, we'll iterate through the different maps 
    cur_source_type = "seed"
    cur_source_number = seed_number
    while cur_source_type is not None:
        
        # Determine the rules associated with the cur_source_type
        cur_source_rules_df = map_range_rules_df.query("map_source==@cur_source_type")
        
        # Figure out the next source type
        try:
            next_source_type = cur_source_rules_df["map_dest"].unique()[0]
        # If we can't parse the next source type, we'll assume we're at the end of the "path"
        except: 
            next_source_type = None
        
        # Determine whether there's a rule that applies to the current number
        cur_num_rule_df = cur_source_rules_df.query("source_range_start <= @cur_source_number & source_range_end >= @cur_source_number")
        
        # If there's no rule, then the cur_dest_number will just be the same as the cur_source_number
        if len(cur_num_rule_df) == 0:
            cur_dest_number = cur_source_number
        
        # Otherwise, we'll determine what the cur_dest_number is by reading the range rule
        else:
            cur_num_rule_dict = cur_num_rule_df.iloc[0].to_dict()
            idx_in_range = cur_source_number - cur_num_rule_dict.get("source_range_start")
            cur_dest_number = cur_num_rule_dict.get("dest_range_start") + idx_in_range
        
        # Store all of the numbers for the current source type
        numbers_dict[cur_source_type] = cur_source_number
        
        # Now that we're done parsing the current source type, we'll iterate to the next one
        cur_source_type = next_source_type
        cur_source_number = cur_dest_number

    # Return the numbers_dict
    return numbers_dict

With this method in hand, we can parse the paths for each of the seed numbers:

In [127]:
# Create a DataFrame mapping each of the seed numbers to their corresponding numbers
seed_numbers_to_other_source_numbers_df = pd.DataFrame.from_records(
    [
        determine_seed_path(seed_number, map_range_rules_df)
        for seed_number in seed_numbers
    ]
)

# Show off some of the rows of the DataFrame
seed_numbers_to_other_source_numbers_df.head(5)

Unnamed: 0,seed,soil,fertilizer,water,light,temperature,humidity,location
0,5844012,2735666402,3285684146,1494404667,3316329223,4047860789,4047860789,4217167942
1,110899473,2840721863,1897227649,3517164697,2090766680,4254909717,4135623259,1619680200
2,1132285750,3426553621,3361682440,1570402961,3392327517,3879033928,1153467296,1230040740
3,58870036,2788692426,3338710170,1547430691,3369355247,3856061658,1130495026,1207068470
4,986162929,3280430800,2849630923,2430993569,2333026939,3334793292,3935687364,4104994517


Now to the main question: 

```
What is the lowest location number that corresponds to any of the initial seed numbers?
```

In [129]:
# Determine the lowest location number and print it
lowest_location_number = seed_numbers_to_other_source_numbers_df["location"].min()
print(f"The lowest location number is '{lowest_location_number}'")

The lowest location number is '825516882'
