# Introduction to Artificial Intelligence - Lab00 (5 pts.)

* Name:
* NETID:

This assignment covers the following topics:

* Loading Dataframes
* Modifying Dataframes
* Uninformed Search

It will consist of 3 tasks:

| Task ID  | Description                                      | Points |
|----------|--------------------------------------------------|--------|
| 00       | Cipher Classification and Dataset Creation                |        |
| &nbsp;&nbsp;&nbsp;&nbsp;00-1     | &nbsp;&nbsp;&nbsp;&nbsp;- Bag-of-Words                   | 1      |
| &nbsp;&nbsp;&nbsp;&nbsp;00-2     | &nbsp;&nbsp;&nbsp;&nbsp;- Bag-of-Characters                  | 1      |
| &nbsp;&nbsp;&nbsp;&nbsp;00-3     | &nbsp;&nbsp;&nbsp;&nbsp;- Cipher Classification                  | 1      |
| &nbsp;&nbsp;&nbsp;&nbsp;00-4     | &nbsp;&nbsp;&nbsp;&nbsp;- Dataset Creation                  | 0      |
| 01       | Recurrent Neural Network               |        |
| &nbsp;&nbsp;&nbsp;&nbsp;01-1     | &nbsp;&nbsp;&nbsp;&nbsp;- Linear Layer                  | 0      |
| &nbsp;&nbsp;&nbsp;&nbsp;01-2     | &nbsp;&nbsp;&nbsp;&nbsp;- Embedding Layer                  | 1      |
| &nbsp;&nbsp;&nbsp;&nbsp;01-3     | &nbsp;&nbsp;&nbsp;&nbsp;- tanh Activation                  | 1      |
| &nbsp;&nbsp;&nbsp;&nbsp;01-4     | &nbsp;&nbsp;&nbsp;&nbsp;- Recurrent Block                  | 1      |
| &nbsp;&nbsp;&nbsp;&nbsp;01-5     | &nbsp;&nbsp;&nbsp;&nbsp;- Recurrent Neural Network                  | 1      |
| &nbsp;&nbsp;&nbsp;&nbsp;01-6     | &nbsp;&nbsp;&nbsp;&nbsp;- RNN Training and Output                  | 1      |
| 02       | Torch Recurrent Neural Network Comparison               |        |
| &nbsp;&nbsp;&nbsp;&nbsp;02-1     | &nbsp;&nbsp;&nbsp;&nbsp;- Torch RNN Class, Training, and Comparison                 | 1      |
| &nbsp;&nbsp;&nbsp;&nbsp;02-2     | &nbsp;&nbsp;&nbsp;&nbsp;- RNN Short Answer Questions                 | 2      |
| 03       | Encoder-Only Transformer               |        |
| &nbsp;&nbsp;&nbsp;&nbsp;03-1     | &nbsp;&nbsp;&nbsp;&nbsp;- ReLU Activation                 | 1      |
| &nbsp;&nbsp;&nbsp;&nbsp;03-2     | &nbsp;&nbsp;&nbsp;&nbsp;- Self-Attention Block                 | 1      |
| &nbsp;&nbsp;&nbsp;&nbsp;03-3     | &nbsp;&nbsp;&nbsp;&nbsp;- Encoder-Only Transformer                 | 1      |
| &nbsp;&nbsp;&nbsp;&nbsp;03-4     | &nbsp;&nbsp;&nbsp;&nbsp;- Encoder-Only Transformer Training and Output                 | 1      |
| 04       | Torch Encoder-Only Transformer Comparison                              |       |
| &nbsp;&nbsp;&nbsp;&nbsp;04-1     | &nbsp;&nbsp;&nbsp;&nbsp;- Torch Positional Embeddings Class                 | 0      |
| &nbsp;&nbsp;&nbsp;&nbsp;04-2     | &nbsp;&nbsp;&nbsp;&nbsp;- Torch Transformer Class                 | 1      |
| &nbsp;&nbsp;&nbsp;&nbsp;04-3     | &nbsp;&nbsp;&nbsp;&nbsp;- Torch Transformer Training and Comparison                 | 1      |
| &nbsp;&nbsp;&nbsp;&nbsp;04-4     | &nbsp;&nbsp;&nbsp;&nbsp;- Transformer Short Answer Questions                 | 2      |
| 05       | Final Evidence Collection                             |      |
| &nbsp;&nbsp;&nbsp;&nbsp;05-1     | &nbsp;&nbsp;&nbsp;&nbsp;-  Selfie with Evidence                | 1      |

Please complete all sections. Some questions may require written answers, while others may involve coding. Be sure to run your code cells to verify your solutions.

### *Initial Story Progression*

Your team quickly arrives at central park, making sure to save all the travel reciepts so you can expense it to your Professor.
If you have to miss a football game for this you're going to give him the worst CIF score known to man.

As you all stand around the duck pond, a detective walks up to you and introduces himself as Detective Caulfield.

You quickly slip your flask into your jacket.

"No time for questions, we need you to get to the William Theisen-Floyd Estate as soon as possible, cost is no object."

The detective hands you a map and tells you to get going.

You thankfully sigh, knowing that those 12$ sweet treats have been draining your bank account. As you start walk away and try to figure out how to get out to the estate (unfortunately the office said a blade was out of the question), you recall an algorithm that may be able to help you get to the estate! You could use a search or traversal algorithm to pick the most efficient path between the duck pond in Central Park and the Estate out on LI.

Thinking back to class, the first thing that comes to mind is Djikstra's algorithm. You figure that's a pretty good place to start, especially given that the map the detective gave you doesn't have any heuristic information to make use of.


## Task 00: Data Loading
### Task 00-1: Description (0 pts.)
#### Downloading Homework Data from Github

In this section, you'll need to use the map given to you by the detective to get from Central Park to the William Theisen-Floyd Estate in the Hamptons in the least amount of time possible.

<img src="https://raw.githubusercontent.com/nd-cse-30124-sp25/nd-cse-30124-sp25.github.io/refs/heads/main/static/svg/hw01_map_graph.svg" alt="Map" style="width: 100%; max-width: 800px; height: auto;">

The map the detective gave you has a number of travel options you could use, depending on the location you're in. There's taxis of course, but you could also take the train (public transport, who knew). Additionally, several locations have outgoing flights and even ferries you could take. Luckily, after an absolutely staggering amount of painstaking research during his winter vaction to San Juan, the detective has provided you (in minutes) geographically accurate travel times for each mode of travel between each connected location as an adjacency list.

### Task 00-1: Code (0 pts.)


In [None]:
import os
import pandas as pd

try:
    import google.colab
    REPO_URL = "https://github.com/nd-cse-30124-fa25/cse-30124-homeworks.git"
    REPO_NAME = "cse-30124-homeworks"
    HW_FOLDER = "labs/lab00" 

    # Clone repo if not already present
    if not os.path.exists(REPO_NAME):
        !git clone {REPO_URL}

    # cd into the homework folder
    %cd {REPO_NAME}/{HW_FOLDER}

except ImportError:
    pass

nodes_df = pd.read_csv("nodes.csv")
edges_df = pd.read_csv("edges.csv")

nodes_df["node_id"] = nodes_df["node_id"].astype(int)
edges_df[["src", "dst"]] = edges_df[["src", "dst"]].astype(int)
edges_df[["taxi_min", "train_min", "ferry_min", "plane_min"]] = edges_df[
    ["taxi_min", "train_min", "ferry_min", "plane_min"]
].astype(int)

nodes_lookup = nodes_df.set_index("node_id")["name"]

In [None]:
import heapq

##### WRITE DIJKSTRAS_MULTI_MODE AND PRINT_BEST_PATH_TO_ESTATE FUNCTIONS #####
def dijkstra_multi_mode(nodes_df, edges_df, start):
    """
    Dijkstra's algorithm with support for multiple transport modes.

    Args:
    - nodes_df: DataFrame with columns [node_id, name].
    - edges_df: DataFrame with columns [src, dst, taxi_min, train_min, ferry_min, plane_min].
    - start: Starting node.

    Returns:
    - distances: Dictionary with the shortest distance to each node from the start.
    - paths: Dictionary with the best path to each node.
    - nodes_visited: Count of how many nodes were visited.
    - edges_evaluated: Count of how many edges were evaluated.
    """

    # Priority queue holds (distance, vertex)
    priority_queue = []
    heapq.heappush(priority_queue, (0, start))

    distances = {int(node_id): float('infinity') for node_id in nodes_df["node_id"]}
    distances[start] = 0

    paths = {int(node_id): [] for node_id in nodes_df["node_id"]}
    paths[start] = [start]

    visited = set()
    nodes_visited = 0
    edges_evaluated = 0

    while priority_queue:
        current_distance, current_vertex = heapq.heappop(priority_queue)

        if current_vertex in visited:
            continue

        visited.add(current_vertex)
        nodes_visited += 1

        neighbors = edges_df[edges_df["src"] == current_vertex]
        for edge in neighbors.itertuples(index=False):
            edges_evaluated += 1
            costs = [edge.taxi_min, edge.train_min, edge.ferry_min, edge.plane_min]
            valid_costs = [cost for cost in costs if cost > 0]
            if not valid_costs:
                continue
            min_cost = min(valid_costs)

            distance = current_distance + min_cost
            neighbor = int(edge.dst)

            if distance < distances[neighbor]:
                distances[neighbor] = distance
                paths[neighbor] = paths[current_vertex] + [neighbor]
                heapq.heappush(priority_queue, (distance, neighbor))

    return distances, paths, nodes_visited, edges_evaluated


def print_best_path_to_estate(paths, edges_df, nodes_lookup, start, destination):
    """
    Prints the best path from the start node to the destination node with transport modes and times,
    without repeating node names.

    Args:
    - paths: Dictionary of shortest paths to each node.
    - edges_df: DataFrame with columns [src, dst, taxi_min, train_min, ferry_min, plane_min].
    - nodes_lookup: Series mapping node_id -> name.
    - start: The start node number.
    - destination: The destination node number.
    """
    mode_names = ["TAXI", "TRAIN", "FERRY", "FLIGHT"]

    path = paths.get(destination, None)
    if not path or path[0] != start:
        print("No valid path found.")
        return

    path_segments = [nodes_lookup.loc[start]]
    total_time = 0

    for i in range(len(path) - 1):
        current = path[i]
        next_node = path[i + 1]
        edge = edges_df[(edges_df["src"] == current) & (edges_df["dst"] == next_node)].iloc[0]
        costs = [edge.taxi_min, edge.train_min, edge.ferry_min, edge.plane_min]
        valid_costs = [cost for cost in costs if cost > 0]
        if not valid_costs:
            continue
        min_cost = min(valid_costs)
        mode_index = costs.index(min_cost)
        mode = mode_names[mode_index]
        time = costs[mode_index]

        path_segments.append(f" ---[{mode}, {time} min]--> {nodes_lookup.loc[next_node]}")
        total_time += time

    print(f"Best path from {nodes_lookup.loc[start]} to {nodes_lookup.loc[destination]}:")
    print("".join(path_segments))
    print(f"Total travel time: {total_time} minutes")


In [18]:
##### RUN FUNCTIONS TO FIND PATH #####
# Using the previously defined `nodes_df` and `edges_df`
start_node = 0     # Central Park

# Find the shortest path
distances, paths, nodes_visited, edges_evaluated = dijkstra_multi_mode(nodes_df, edges_df, start_node)
print(f"Number of nodes visited: {nodes_visited}")
print(f"Number of edges evaluated: {edges_evaluated}")

# Print the best path
print_best_path_to_estate(paths, edges_df, nodes_lookup, start=0, destination=13)


Number of nodes visited: 14
Number of edges evaluated: 25
Best path from Central Park to William Theisen-Floyd Estate:
Central Park ---[TAXI, 8 min]--> Grand Central Station ---[TAXI, 16 min]--> LGA ---[FLIGHT, 35 min]--> KISP ---[TAXI, 13 min]--> Patchogue ---[TAXI, 23 min]--> William Theisen-Floyd Estate
Total travel time: 95 minutes


#### *Expected Output*
> Number of nodes visited: 14
>
> Number of edges evaluated: 25
>
> Best path from Central Park to William Theisen-Floyd Estate:
>
> Central Park ---[TAXI, 8 min]--> Grand Central Station ---[TAXI, 16 min]--> LGA ---[FLIGHT, 35 min]--> KISP ---[TAXI, 13 min]--> Patchogue ---[TAXI, 23 min]--> William Theisen-Floyd Estate
>
> Total travel time: 95 minutes