<h1>Introduction</h1>

This notebook processes the raw data to prepare it for the analysis. The raw data is in JSON format and contains information about courses, classrooms, and professors. It is divided into two parts: data cleaning and data processing. The data cleaning part removes unnecessary information and cleans up the data. The data processing part extracts useful information and structures the data in a way that is suitable for analysis.
please just run all the cells in this notebook, not necessary function is commented out.

In [1]:
input_file = '../data/details.json'
output_file = '../data/details_cleaned.json'
classroom_json = '../data/classroom_data.json'
import numpy as np
import json
from datetime import datetime

<h1>Data Cleaning</h1>

This part is for cleaning data that is not necessary, or even harmful to the analysis. If you already have a clean dataset, you can skip this part.

<h2>Remove unnecessary keys</h2>

In [None]:
def clean_large_json(input_file, output_file, keys_to_remove):
    with open(input_file, 'r', encoding='utf-8') as infile:
        data = json.load(infile)
    def remove_keys(obj, keys):
        if isinstance(obj, dict):
            for key in keys:
                obj.pop(key, None)
            for value in obj.values():
                remove_keys(value, keys)
        elif isinstance(obj, list):
            for item in obj:
                remove_keys(item, keys)
    remove_keys(data, keys_to_remove)
    with open(output_file, 'w', encoding='utf-8') as outfile:
        json.dump(data, outfile, indent=4, ensure_ascii=False)
keys_to_remove = ["additionalLinks", "bookstore", "cfg", "catalog_descr", "materials", "enrollment_information", "reserve_caps", "catalog_descr", "messages", "notes"]
clean_large_json(input_file, output_file, keys_to_remove)


<h2>Remove class_capacity 999</h2>

In [None]:
def remove_class_capacity_999():
    with open(output_file, 'r', encoding='utf-8') as infile:
        data = json.load(infile)

    def remove_capacity_999(obj):
        if isinstance(obj, dict):
            if obj.get("class_capacity") == "999" or obj.get("class_capacity") == 999:
                return None
            new_obj = {}
            for key, value in obj.items():
                result = remove_capacity_999(value)
                if result is not None:
                    new_obj[key] = result
            return new_obj
        elif isinstance(obj, list):
            return [remove_capacity_999(item) for item in obj if remove_capacity_999(item) is not None]
        else:
            return obj
    cleaned_data = remove_capacity_999(data)

    with open(output_file, 'w', encoding='utf-8') as outfile:
        json.dump(cleaned_data, outfile, indent=4, ensure_ascii=False)
remove_class_capacity_999()


<h2>Remove online instruction mode</h2>

In [None]:
def remove_online_instruction_mode():
    with open(output_file, 'r', encoding='utf-8') as infile:
        data = json.load(infile)
    def remove_online_mode(obj):
        if isinstance(obj, dict):
            if obj.get("instruction_mode") == 'Online':
                return None
            new_obj = {}
            for key, value in obj.items():
                result = remove_online_mode(value)
                if result is not None:
                    new_obj[key] = result
            return new_obj
        elif isinstance(obj, list):
            return [remove_online_mode(item) for item in obj if remove_online_mode(item) is not None]
        else:
            return obj
    cleaned_data = remove_online_mode(data)

    with open(output_file, 'w', encoding='utf-8') as outfile:
        json.dump(cleaned_data, outfile, indent=4, ensure_ascii=False)
remove_online_instruction_mode()


<h2>Remove TBA instructors</h2>

In [None]:
def remove_tba_instructors():
    with open(output_file, 'r', encoding='utf-8') as infile:
        data = json.load(infile)
    def clean_instructors(obj):
        if isinstance(obj, dict):
            if "meetings" in obj:
                for meeting in obj["meetings"]:
                    if "instructors" in meeting:
                        meeting["instructors"] = [
                            instructor for instructor in meeting["instructors"]
                            if instructor.get("name") != "To Be Announced"
                        ]
            for value in obj.values():
                clean_instructors(value)
        elif isinstance(obj, list):
            for item in obj:
                clean_instructors(item)
    clean_instructors(data)
    with open(output_file, 'w', encoding='utf-8') as outfile:
        json.dump(data, outfile, indent=4, ensure_ascii=False)
remove_tba_instructors()


<h2>Remove dash instructors</h2>

In [11]:
def remove_dash_instructors():
    with open(output_file, 'r', encoding='utf-8') as infile:
        data = json.load(infile)

    def clean_instructors(obj):
        if isinstance(obj, dict):
            if "meetings" in obj:
                for meeting in obj["meetings"]:
                    if "instructors" in meeting:
                        meeting["instructors"] = [
                            instructor for instructor in meeting["instructors"]
                            if instructor.get("name") != "-"
                        ]
            for value in obj.values():
                clean_instructors(value)
        elif isinstance(obj, list):
            for item in obj:
                clean_instructors(item)

    clean_instructors(data)

    with open(output_file, 'w', encoding='utf-8') as outfile:
        json.dump(data, outfile, indent=4, ensure_ascii=False)
remove_dash_instructors()

<h2>Remove TBA meets</h2>

In [None]:
def clean_empty_instructors_tba_meets_and_empty_times():
    with open(output_file, 'r', encoding='utf-8') as infile:
        data = json.load(infile)
    def clean_data(obj):
        if isinstance(obj, dict):
            if "instructors" in obj and isinstance(obj["instructors"], list) and not obj["instructors"]:
                return None
            if obj.get("meets") == "TBA":
                return None
            if "meeting_time_start" in obj and obj["meeting_time_start"] == "":
                return None
            if "meeting_time_end" in obj and obj["meeting_time_end"] == "":
                return None
            new_obj = {}
            for key, value in obj.items():
                result = clean_data(value)
                if result is not None:
                    new_obj[key] = result
            return new_obj
        elif isinstance(obj, list):
            return [clean_data(item) for item in obj if clean_data(item) is not None]
        else:
            return obj
    cleaned_data = clean_data(data)
    with open(output_file, 'w', encoding='utf-8') as outfile:
        json.dump(cleaned_data, outfile, indent=4, ensure_ascii=False)
clean_empty_instructors_tba_meets_and_empty_times()


<h2>Remove invalid classrooms</h2>

In [None]:
def clean_classroom_names():
    with open(classroom_json, 'r', encoding='utf-8') as infile:
        data = json.load(infile)
    def remove_invalid_classrooms(obj):
        if isinstance(obj, dict):
            if "Name" in obj and '/' in obj["Name"]:
                return None
            new_obj = {}
            for key, value in obj.items():
                result = remove_invalid_classrooms(value)
                if result is not None:
                    new_obj[key] = result
            return new_obj
        elif isinstance(obj, list):
            return [remove_invalid_classrooms(item) for item in obj if remove_invalid_classrooms(item) is not None]
        else:
            return obj
    cleaned_data = remove_invalid_classrooms(data)
    with open(classroom_json, 'w', encoding='utf-8') as outfile:
        json.dump(cleaned_data, outfile, indent=4, ensure_ascii=False)
clean_classroom_names()


<h2>Remove classrooms name</h2>

In [None]:
import re
def clean_classroom_names():
    with open(classroom_json, 'r', encoding='utf-8') as infile:
        data = json.load(infile)
    def process_classroom_names(obj):
        if isinstance(obj, dict):
            if "Name" in obj:
                cleaned_name = obj["Name"].replace('-', ' ')
                cleaned_name = re.sub(r'\(.*$', '', cleaned_name).strip()
                obj["Name"] = cleaned_name
            for key, value in obj.items():
                process_classroom_names(value)
        elif isinstance(obj, list):
            for item in obj:
                process_classroom_names(item)
    process_classroom_names(data)
    with open(classroom_json, 'w', encoding='utf-8') as outfile:
        json.dump(data, outfile, indent=4, ensure_ascii=False)
clean_classroom_names()


<h2>Remove no room classrooms</h2>

In [None]:
def clean_none_and_no_room_classrooms():
    with open(output_file, 'r', encoding='utf-8') as infile:
        data = json.load(infile)
    cleaned_data = []
    for course in data:
        section_info = course.get("section_info", {})
        meetings = section_info.get("meetings", [])
        if any(meeting.get("room") in [None, "NO ROOM"] for meeting in meetings):
            continue
        else:
            cleaned_data.append(course)
    with open(output_file, 'w', encoding='utf-8') as outfile:
        json.dump(cleaned_data, outfile, indent=4, ensure_ascii=False)
clean_none_and_no_room_classrooms()


<h2>Remove empty meetings </h2>

In [None]:
def clean_empty_meetings():
    with open(output_file, 'r', encoding='utf-8') as infile:
        data = json.load(infile)
    cleaned_data = []
    for course in data:
        section_info = course.get("section_info", {})
        meetings = section_info.get("meetings", [])
        if meetings:
            cleaned_data.append(course)
    with open(output_file, 'w', encoding='utf-8') as outfile:
        json.dump(cleaned_data, outfile, indent=4, ensure_ascii=False)
clean_empty_meetings()


<h2>Remove invalid buildings (fenway and med)</h2>

In [74]:
import json
with open(output_file, 'r') as file:
    data = json.load(file)
remove_bldg_cd = ["ALB", "CTC", "INS", "SPH", "XBG", "MED", "FAB", "FCB", "FCC", "HAW", "GDS", "FPH", "EVN"]
for item in data:
    if 'section_info' in item and 'meetings' in item['section_info']:
        item['section_info']['meetings'] = [
            meeting for meeting in item['section_info']['meetings']
            if meeting.get('bldg_cd') not in remove_bldg_cd
        ]
    if 'similar_classes' in item:
        for similar_class in item['similar_classes']:
            if 'meeting_patterns' in similar_class:
                similar_class['meeting_patterns'] = [
                    pattern for pattern in similar_class['meeting_patterns']
                    if pattern.get('bldg_cd') not in remove_bldg_cd
                ]
with open(output_file, 'w') as file:
    json.dump(data, file, indent=4)


In [75]:
import json
with open(classroom_json, 'r') as file:
    classrooms = json.load(file)
cleaned_classrooms = [
    classroom for classroom in classrooms
    if not any(classroom.get('Name', '').startswith(prefix) for prefix in remove_bldg_cd)
]
with open(classroom_json, 'w') as file:
    json.dump(cleaned_classrooms, file, indent=4)


In [2]:
def clean_schedule(professor_schedule):
    cleaned_schedule = {}
    for key, schedule in professor_schedule.items():
        merged_dict = {}
        for item in schedule:
            start_end = (item[0], item[1])
            if start_end in merged_dict:
                merged_dict[start_end] += item[2]
            else:
                merged_dict[start_end] = item[2]
        cleaned_schedule[key] = [(start, end, count) for (start, end), count in merged_dict.items()]

    return cleaned_schedule

In [105]:
import json

with open(classroom_json, 'r') as file:
    classrooms = json.load(file)

cleaned_classrooms = [
    classroom for classroom in classrooms
    if not ("Details" in classroom and
            "Classroom Tag" in classroom["Details"] and
            "Medical Campus" in classroom["Details"]["Classroom Tag"])
]

with open(classroom_json, 'w') as file:
    json.dump(cleaned_classrooms, file, indent=4)


In [107]:
import json

with open(classroom_json, 'r') as file:
    classrooms = json.load(file)

cleaned_classrooms = [
    classroom for classroom in classrooms
    if not ("Details" in classroom and
            "Classroom Tag" in classroom["Details"] and
            any("Fenway Campus" in tag for tag in classroom["Details"]["Classroom Tag"]))
]

with open(classroom_json, 'w') as file:
    json.dump(cleaned_classrooms, file, indent=4)


<h1>Debugger, ignore this part</h1>

In [None]:
# def count_and_print_top_ten_class_capacities():
#     with open(output_file, 'r', encoding='utf-8') as infile:
#         data = json.load(infile)
#
#     capacities = []
#     capacity_999_count = 0
#
#     def find_class_capacities(obj):
#         nonlocal capacity_999_count
#         if isinstance(obj, dict):
#
#             if "class_capacity" in obj:
#                 try:
#                     capacity = int(obj["class_capacity"])
#                     capacities.append(capacity)
#                     if capacity == 999:
#                         capacity_999_count += 1
#                 except ValueError:
#                     pass
#             for value in obj.values():
#                 find_class_capacities(value)
#         elif isinstance(obj, list):
#             for item in obj:
#                 find_class_capacities(item)
#
#     find_class_capacities(data)
#
#     print("前十节课的 class_capacity:", capacities[:10])
#     print("class_capacity 为 999 的数量:", capacity_999_count)
# count_and_print_top_ten_class_capacities()


In [None]:
# def count_unique_instruction_modes():
#     with open(output_file, 'r', encoding='utf-8') as outfile:
#         data = json.load(outfile)
#     instruction_modes = set()
#     def find_instruction_modes(obj):
#         if isinstance(obj, dict):
#             if "instruction_mode" in obj:
#                 instruction_modes.add(obj["instruction_mode"])
#             for value in obj.values():
#                 find_instruction_modes(value)
#         elif isinstance(obj, list):
#             for item in obj:
#                 find_instruction_modes(item)
#
#     find_instruction_modes(data)
#
#     print("不同的 instruction_mode 数量:", len(instruction_modes))
#     print("不同的 instruction_mode 值:", instruction_modes)
# count_unique_instruction_modes()


In [None]:
# def display_sample_professor_schedule(professor_schedule, sample_size=10):
#     print(f"Displaying schedule for the first {sample_size} professors:")
#     for professor_id in range(sample_size):
#         if professor_id in professor_schedule:
#             print(f"\nProfessor ID {professor_id}:")
#             for schedule in professor_schedule[professor_id]:
#                 start_time, end_time, capacity = schedule
#                 print(f"  Start Time (in 5-min units): {start_time}, "
#                       f"End Time (in 5-min units): {end_time}, "
#                       f"Capacity: {capacity}")
#         else:
#             print(f"\nProfessor ID {professor_id}: No schedule available")
# display_sample_professor_schedule(professor_schedule)


In [None]:
# def decode_time(value):
#     minutes_per_day = 24 * 60 // 5
#     day = value // minutes_per_day
#     time_in_day = value % minutes_per_day
#
#     hours = (time_in_day * 5) // 60
#     minutes = (time_in_day * 5) % 60
#     day_mapping = ["周一", "周二", "周三", "周四", "周五", "周六", "周日"]
#     day_name = day_mapping[day]
#
#     return f"{day_name} {hours:02}:{minutes:02}"
# print(decode_time(978))

In [None]:
# def find_courses_for_instructor(instructor_name):
#     with open(output_file, 'r', encoding='utf-8') as infile:
#         data = json.load(infile)
#
#     instructor_courses = []
#
#     def search_for_instructor(obj):
#         if isinstance(obj, dict):
#             if "meetings" in obj:
#                 for meeting in obj["meetings"]:
#                     if "instructors" in meeting:
#                         for instructor in meeting["instructors"]:
#                             if instructor.get("name") == instructor_name:
#                                 instructor_courses.append(obj)
#                                 return
#
#             for value in obj.values():
#                 search_for_instructor(value)
#         elif isinstance(obj, list):
#             for item in obj:
#                 search_for_instructor(item)
#
#
#     search_for_instructor(data)
#
#
#     print(f" '{instructor_name}' ")
#     for course in instructor_courses:
#         print(json.dumps(course, indent=4, ensure_ascii=False))
#
#     return instructor_courses
# find_courses_for_instructor("Min Ye")


In [None]:
# def find_room(keyword):
#     with open(output_file, 'r', encoding='utf-8') as infile:
#         data = json.load(infile)
#
#     rooms_with_keyword = []
#
#     def search_meetings(obj):
#         if isinstance(obj, dict):
#             if "meetings" in obj:
#                 for meeting in obj["meetings"]:
#                     if "room" in meeting and keyword in meeting["room"]:
#                         rooms_with_keyword.append(meeting)
#             for value in obj.values():
#                 search_meetings(value)
#         elif isinstance(obj, list):
#             for item in obj:
#                 search_meetings(item)
#     search_meetings(data)
#     print(f"包含 '{keyword}' 的房间信息:", rooms_with_keyword)
#     return rooms_with_keyword
# rooms = find_room("CFA 154")


In [101]:
# import numpy as np
#
# indices = np.argwhere(walking_cost > 4000.0)
#
# index_to_classroom = {v: k for k, v in classroom_mapping.items()}
#
# classroom_pairs = [(index_to_classroom[i], index_to_classroom[j]) for i, j in indices]
#
# unique_pairs = set((a[:3], b[:3]) for a, b in classroom_pairs)
#
# unique_pairs = list(unique_pairs)
# print("Unique classroom prefix pairs:", unique_pairs)
#
# from collections import Counter
#
# prefixes = [a[:3] for a, b in classroom_pairs] + [b[:3] for a, b in classroom_pairs]
#
# prefix_counts = Counter(prefixes)
#
# sorted_prefix_counts = prefix_counts.most_common()
#
# # 输出结果
# print("Prefix frequencies sorted from high to low:", sorted_prefix_counts)


Unique classroom prefix pairs: []
Prefix frequencies sorted from high to low: []


<h1>Data Processing</h1>

<h2>capacities: int[]</h2>

In [28]:
def extract_capacity_from_additional_info():
    # Open the JSON file for reading and load its content into the 'data' variable
    with open(classroom_json, 'r', encoding='utf-8') as infile:
        data = json.load(infile)

    # Initialize an empty list to store the capacities found
    capacities = []

    # Define a helper function to find capacities within the JSON structure
    def find_capacity(obj):
        # Check if the current object is a dictionary
        if isinstance(obj, dict):
            # If 'AdditionalInfo' and 'Capacity' are keys in the object, attempt to extract the capacity
            if "AdditionalInfo" in obj and "Capacity" in obj["AdditionalInfo"]:
                try:
                    # Convert the capacity to an integer and add it to the capacities list
                    capacities.append(int(obj["AdditionalInfo"]["Capacity"]))
                except ValueError:
                    # If conversion to integer fails, ignore this capacity
                    pass

            # Recursively search for capacities in each value of the dictionary
            for value in obj.values():
                find_capacity(value)
        # If the current object is a list, iterate over its items and apply the function on each
        elif isinstance(obj, list):
            for item in obj:
                find_capacity(item)

    # Start the capacity finding process on the main data structure loaded from JSON
    find_capacity(data)

    # Return the list of capacities extracted from the JSON data
    return capacities

# Call the function to extract capacities from the JSON file
capacities = extract_capacity_from_additional_info()

<h2>name_capacity_dict: dict</h2>

In [29]:
# Python

def extract_name_capacity_dict():
    # This function reads a JSON file containing classroom data and extracts a dictionary mapping classroom names to their capacities.

    with open(classroom_json, 'r', encoding='utf-8') as infile:
        # Open the JSON file and load its content into a Python dictionary.
        data = json.load(infile)

    # Initialize a dictionary to store name-capacity pairs.
    name_capacity_dict = {}

    # Define a recursive function to traverse through JSON objects and find name-capacity pairs.
    def find_name_capacity(obj):
        if isinstance(obj, dict):
            # If the object is a dictionary, check for the presence of "Name" and "Capacity" in the "AdditionalInfo" field.
            if "Name" in obj and "AdditionalInfo" in obj and "Capacity" in obj["AdditionalInfo"]:
                try:
                    name_capacity_dict[obj["Name"]] = int(obj["AdditionalInfo"]["Capacity"])
                except ValueError:
                    # If the conversion to an integer fails, ignore this entry.
                    pass
            # Recursively process all values in the dictionary.
            for value in obj.values():
                find_name_capacity(value)
        elif isinstance(obj, list):
            # If the object is a list, iterate over each item and apply the function recursively.
            for item in obj:
                find_name_capacity(item)

    # Start the recursive process to populate the name_capacity_dict with data from the JSON.
    find_name_capacity(data)

    # Return the dictionary containing classroom names and their corresponding capacities.
    return name_capacity_dict

# Call the function to extract name-capacity pairs from the JSON file.
name_capacity_dict = extract_name_capacity_dict()

<h2>professor_mapping: dict</h2>

In [30]:
# The purpose of this function is to extract and map professor names to unique IDs from a JSON file.
def extract_professor_mapping():
    # Open the JSON file specified by output_file and load its contents into a Python dictionary.
    with open(output_file, 'r', encoding='utf-8') as infile:
        data = json.load(infile)

    # Initialize a dictionary to store the mapping of professor names to unique IDs.
    professor_mapping = {}
    # A counter to assign a unique ID to each professor.
    professor_id_counter = 0

    # Define a recursive function to find instructors within the data.
    # The use of nonlocal indicates that professor_id_counter is used from the parent function's scope.
    def find_instructors(obj):
        nonlocal professor_id_counter
        if isinstance(obj, dict):
            # If the object is a dictionary, look for "meetings".
            if "meetings" in obj:
                # If "meetings" exists, iterate over each meeting.
                for meeting in obj["meetings"]:
                    # Check if the meeting has "instructors".
                    if "instructors" in meeting:
                        # For each instructor, retrieve their "name".
                        for instructor in meeting["instructors"]:
                            name = instructor.get("name")
                            # If the name is valid and not already in the mapping,
                            # assign a new unique ID and update the counter.
                            if name and name not in professor_mapping:
                                professor_mapping[name] = professor_id_counter
                                professor_id_counter += 1
            # Recursively process all values in the dictionary, traversing deeper.
            for value in obj.values():
                find_instructors(value)
        elif isinstance(obj, list):
            # If the object is a list, apply the function recursively to each item.
            for item in obj:
                find_instructors(item)

    # Call the recursive function on the main data object loaded from JSON.
    find_instructors(data)

    # Return the dictionary which contains the mapping of professor names to unique IDs.
    return professor_mapping

# Execute the function to populate the professor_mapping with data from the JSON.
professor_mapping = extract_professor_mapping()

<h2>professor_schedule: dict</h2>

In [31]:
def build_professor_schedule():
    # Open the JSON file that contains the cleaned data and load it into a variable
    with open(output_file, 'r', encoding='utf-8') as infile:
        data = json.load(infile)

    # Initialize an empty dictionary to store the schedule of each professor
    professor_schedule = {}

    # Map days of the week to corresponding indices
    day_mapping = {
        "Mo": 0, "Tu": 1, "We": 2, "Th": 3, "Fr": 4, "Sa": 5, "Su": 6
    }

    # Helper function to convert a time string into a time slot index
    def parse_time(time_str, day):
        # Parse the time string into a datetime object
        time_obj = datetime.strptime(time_str, "%I:%M%p")
        # Convert hours and minutes to total minutes
        minutes = time_obj.hour * 60 + time_obj.minute
        # Calculate a unique time slot index considering both time and day
        return (minutes // 5) + day * (24 * 60 // 5)

    # Recursive function to find meetings in JSON data and build the schedule
    def find_meetings(obj):
        if isinstance(obj, dict):
            # Check if the object contains meeting information necessary for building the schedule
            if "meetings" in obj and "class_availability" in obj:
                # Retrieve and convert the class capacity to integer
                capacity = obj["class_availability"].get("class_capacity")
                if capacity is not None:
                    capacity = int(capacity)

                # Iterate over each meeting to extract details
                for meeting in obj["meetings"]:
                    if "instructors" in meeting and "days" in meeting and "meeting_time_start" in meeting and "meeting_time_end" in meeting:
                        # Convert the days string to a list of corresponding indices using day_mapping
                        days_str = meeting["days"]
                        days = [day_mapping[days_str[i:i+2]] for i in range(0, len(days_str), 2) if days_str[i:i+2] in day_mapping]

                        # Iterate over each instructor and update their schedule
                        for instructor in meeting["instructors"]:
                            professor_name = instructor.get("name")
                            professor_id = professor_mapping.get(professor_name)

                            if professor_id is not None:
                                # Ensure there's a list for storing the professor's meetings
                                if professor_id not in professor_schedule:
                                    professor_schedule[professor_id] = []

                                # Add the meeting details to the professor's schedule for each day
                                for day in days:
                                    start_time = parse_time(meeting["meeting_time_start"], day)
                                    end_time = parse_time(meeting["meeting_time_end"], day)
                                    professor_schedule[professor_id].append((start_time, end_time, capacity))

            # Recursively process all values in the dictionary
            for value in obj.values():
                find_meetings(value)
        elif isinstance(obj, list):
            # Recursively apply the function to each item in a list
            for item in obj:
                find_meetings(item)

    # Begin the process of finding meetings
    find_meetings(data)

    # Return the completed schedule for each professor
    return professor_schedule

# Call the build_professor_schedule function and then clean the schedule
professor_schedule = build_professor_schedule()
professor_schedule = clean_schedule(professor_schedule)

In [16]:
def build_monday_professor_schedule():
    # Load data from the specified JSON file. The purpose of this step is to read the structured class data
    # to extract relevant meeting information.
    with open(output_file, 'r', encoding='utf-8') as infile:
        data = json.load(infile)

    # Initialize an empty dictionary to store professors' schedules.
    # Keys will be unique professor IDs and values will be lists of their scheduled meetings.
    professor_schedule = {}

    # Mapping of day abbreviations to indices, helping translate string representations of days into numeric indices.
    day_mapping = {
        "Mo": 0, "Tu": 1, "We": 2, "Th": 3, "Fr": 4, "Sa": 5, "Su": 6
    }

    # Function to convert a time string into the number of 5-minute units from the start of the day.
    # It helps in creating a uniform representation of time for easy comparison and scheduling.
    def parse_time(time_str, day):
        time_obj = datetime.strptime(time_str, "%I:%M%p")
        minutes = time_obj.hour * 60 + time_obj.minute
        return minutes // 5

    # Recursive function to traverse the JSON data structure and find meetings scheduled on Monday.
    # The focus is on extracting meeting times and associating them with instructors, updating their schedules.
    def find_monday_meetings(obj):
        if isinstance(obj, dict):
            # Check if the current dict contains meeting and capacity information.
            if "meetings" in obj and "class_availability" in obj:
                capacity = obj["class_availability"].get("class_capacity")
                if capacity is not None:
                    capacity = int(capacity)

                for meeting in obj["meetings"]:
                    # Look for meetings with defined days, start time, end time, and instructors.
                    if (
                        "instructors" in meeting
                        and "days" in meeting
                        and "meeting_time_start" in meeting
                        and "meeting_time_end" in meeting
                    ):
                        # Determine which days the meeting occurs, converting them to indices using day_mapping.
                        days_str = meeting["days"]
                        days = [
                            day_mapping[days_str[i:i+2]]
                            for i in range(0, len(days_str), 2)
                            if days_str[i:i+2] in day_mapping
                        ]

                        # If the meeting takes place on Monday (index 0), process the meeting details.
                        if 0 in days:
                            for instructor in meeting["instructors"]:
                                professor_name = instructor.get("name")
                                professor_id = professor_mapping.get(professor_name)

                                if professor_id is not None:
                                    # Initialize the schedule list for the professor if it does not exist.
                                    if professor_id not in professor_schedule:
                                        professor_schedule[professor_id] = []

                                    # Parse the start and end time of the meeting and add them to the schedule.
                                    start_time = parse_time(meeting["meeting_time_start"], 0)
                                    end_time = parse_time(meeting["meeting_time_end"], 0)
                                    professor_schedule[professor_id].append((start_time, end_time, capacity))

            # Continue traversing through other elements of the JSON structure.
            for value in obj.values():
                find_monday_meetings(value)
        elif isinstance(obj, list):
            # Apply the search function to each item if the current object is a list.
            for item in obj:
                find_monday_meetings(item)

    # Start the recursive search for Monday meetings in the loaded data.
    find_monday_meetings(data)

    # Return the fully constructed dictionary of Monday schedules for each professor.
    return professor_schedule

# Execute the function and optionally clean the schedule if needed.
monday_professor_schedule = build_monday_professor_schedule()
monday_professor_schedule = clean_schedule(monday_professor_schedule)



<h2>classroom_mapping: dict</h2>

In [32]:
# Function to create a mapping of classroom names to unique IDs
def create_classroom_mapping():
    # Load JSON data from the specified file into the 'data' variable
    with open(classroom_json, 'r', encoding='utf-8') as infile:
        data = json.load(infile)

    # Dictionary to hold the mapping of classroom names to unique identifiers
    classroom_mapping = {}

    # Counter to generate unique identifiers for each classroom
    classroom_id_counter = 0

    # Recursive function to traverse the JSON structure to find classrooms
    def find_classrooms(obj):
        nonlocal classroom_id_counter  # Allow the nested function to modify this variable
        if isinstance(obj, dict):  # Check if the object is a dictionary
            if "Name" in obj:  # If the classroom name is found in the object
                classroom_name = obj["Name"]

                # If the classroom name is not already in the mapping, add it
                if classroom_name not in classroom_mapping:
                    classroom_mapping[classroom_name] = classroom_id_counter
                    classroom_id_counter += 1  # Increment the counter for the next unique ID

            # Iterate over dictionary values for further processing
            for value in obj.values():
                find_classrooms(value)  # Recursive call for nested dictionaries
        elif isinstance(obj, list):  # Check if the object is a list
            # Iterate over list items to look for classrooms
            for item in obj:
                find_classrooms(item)  # Recursive call for list items

    find_classrooms(data)  # Initial call to the recursive function with loaded data

    # Return the final mapping of classroom names to IDs
    return classroom_mapping

# Create the classroom mapping and store it in a variable
classroom_mapping = create_classroom_mapping()

<h2>professor_courses: int[][][]</h2>

In [34]:
# Determine the number of unique professors using the professor_mapping dictionary length
N = len(professor_mapping)

# Determine the number of unique rooms using the classroom_mapping dictionary length
M = len(classroom_mapping)

# Calculate the total number of 5-minute intervals in a week (7 days, 24 hours each, divided by 5 minutes)
T = 7 * 24 * 60 // 5

# Initialize a 3D NumPy array to hold Professor-Room-Time allocations, defaulted to zero
# Dimensions: N x M x T, where:
# N = number of professors, M = number of rooms, T = time slots in a week
professor_courses = np.zeros((N, M, T), dtype=int)

# Function to convert a time string in "%I:%M%p" format to the number of 5-minute units past midnight
def parse_time_to_5_min_units(time_str):
    # Parse the time string into a datetime object
    time_obj = datetime.strptime(time_str, "%I:%M%p")

    # Calculate the total number of minutes since midnight
    minutes = time_obj.hour * 60 + time_obj.minute

    # Return the number of 5-minute intervals since midnight
    return minutes // 5

# Open and read the output JSON file, loading the data into a variable
with open(output_file, 'r', encoding='utf-8') as infile:
    data = json.load(infile)

    # Iterate over each course in the JSON data
    for course in data:
        # Extract section_info dictionary from the course
        section_info = course.get("section_info", {})

        # Get meetings information from section_info, defaulting to an empty list
        meetings = section_info.get("meetings", [])

        # Iterate over each meeting object in the meetings list
        for obj in meetings:
            # Extract room name from the meeting's room field
            room_field = obj.get("room", "")
            room_parts = room_field.split()
            room_name = room_parts[-2] + " " + room_parts[-1] if len(room_parts) >= 2 else None

            # Skip processing if the room name is None or "NO ROOM"
            if room_name in [None, "NO ROOM"]:
                continue

            # Extract instructors list from the meeting object
            instructors = obj.get("instructors", [])

            # Iterate over each instructor in the instructors list
            for instructor in instructors:
                # Get the professor's name from the instructor entry
                professor_name = instructor.get("name")

                # Retrieve the professor ID using the professor's name
                professor_id = professor_mapping.get(professor_name)

                # Retrieve the room ID using the room name
                room_id = classroom_mapping.get(room_name)

                # Print a warning if the professor ID was not found
                if professor_id is None:
                    print(f"Professor '{professor_name}' not found")

                # Print a warning if the room ID was not found
                if room_id is None:
                    print(f"Room '{room_name}' can't find ID")

                # If both professor ID and room ID are found, proceed to allocate the schedule
                if professor_id is not None and room_id is not None:
                    # Extract the days string for the meeting
                    days_str = obj.get("days", "")

                    # Convert meeting start and end times to 5-minute intervals
                    start_time = parse_time_to_5_min_units(obj["meeting_time_start"])
                    end_time = parse_time_to_5_min_units(obj["meeting_time_end"])

                    # Mapping of day abbreviations to indices
                    day_mapping = {"Mo": 0, "Tu": 1, "We": 2, "Th": 3, "Fr": 4, "Sa": 5, "Su": 6}

                    # Iterate over day abbreviations extracted from the days string
                    for day_abbr in [days_str[i:i+2] for i in range(0, len(days_str), 2)]:
                        day = day_mapping.get(day_abbr)

                        # If the day is recognized, calculate start and end indices within the weekly intervals
                        if day is not None:
                            start_k = start_time + day * (24 * 60 // 5)
                            end_k = end_time + day * (24 * 60 // 5)

                            # Mark the professor-room-time slots as occupied in the 3D array
                            for k in range(start_k, end_k):
                                professor_courses[professor_id][room_id][k] = 1

Room 'REL 404' can't find ID
Room 'CGS 427' can't find ID
Room 'WED 411' can't find ID
Room 'EVN 201' can't find ID
Room 'CFA 354' can't find ID
Room 'Health Ctr/Underserved' can't find ID
Room 'Health Ctr/Underserved' can't find ID
Room 'LAW 508' can't find ID
Room 'LAW 508' can't find ID
Room 'SAR 236' can't find ID
Room 'CGS 427' can't find ID
Room 'YAW 419' can't find ID
Room 'HAR 658' can't find ID
Room 'REL 404' can't find ID
Room 'PHO 207' can't find ID
Room 'Medical Center' can't find ID
Room 'Medical Center' can't find ID
Room 'CFA 352' can't find ID
Room 'LAW 513' can't find ID
Room 'LSE 904' can't find ID
Room 'MCH 102' can't find ID
Room 'CGS 417' can't find ID
Room 'HAR 419' can't find ID
Room 'PLS 512' can't find ID
Room 'LAW 203' can't find ID
Room 'LAW 420' can't find ID
Room 'LAW 420' can't find ID
Room 'STH 541' can't find ID
Room 'PHO 207' can't find ID
Room 'Auburn Hospital' can't find ID
Room 'Auburn Hospital' can't find ID
Room 'Shore Hospital' can't find ID
Room 

In [19]:
N = len(professor_mapping)
M = len(classroom_mapping)
T_monday = 24 * 60 // 5
professor_courses_monday = np.zeros((N, M, T_monday), dtype=int)

def parse_time_to_5_min_units(time_str):
    time_obj = datetime.strptime(time_str, "%I:%M%p")
    minutes = time_obj.hour * 60 + time_obj.minute
    return minutes // 5

with open(output_file, 'r', encoding='utf-8') as infile:
    data = json.load(infile)
    for course in data:
        section_info = course.get("section_info", {})
        meetings = section_info.get("meetings", [])
        for obj in meetings:
            room_field = obj.get("room", "")
            room_parts = room_field.split()
            room_name = room_parts[-2] + " " + room_parts[-1] if len(room_parts) >= 2 else None
            if room_name in [None, "NO ROOM"]:
                continue
            instructors = obj.get("instructors", [])
            for instructor in instructors:
                professor_name = instructor.get("name")
                professor_id = professor_mapping.get(professor_name)
                room_id = classroom_mapping.get(room_name)
                if professor_id is None:
                    print(f"教授 '{professor_name}' 未找到对应的 ID")
                if room_id is None:
                    print(f"Room '{room_name}' can't find ID")
                if professor_id is not None and room_id is not None:
                    days_str = obj.get("days", "")
                    if "Mo" not in days_str:  # 只保留包含 "Mo" 的记录
                        continue
                    start_time = parse_time_to_5_min_units(obj["meeting_time_start"])
                    end_time = parse_time_to_5_min_units(obj["meeting_time_end"])
                    for day_abbr in [days_str[i:i+2] for i in range(0, len(days_str), 2)]:
                        day_mapping = {"Mo": 0, "Tu": 1, "We": 2, "Th": 3, "Fr": 4, "Sa": 5, "Su": 6}
                        day = day_mapping.get(day_abbr)
                        if day == 0:
                            for k in range(start_time, end_time):
                                professor_courses_monday[professor_id][room_id][k] = 1


Room 'REL 404' can't find ID
Room 'CGS 427' can't find ID
Room 'WED 411' can't find ID
Room 'EVN 201' can't find ID
Room 'CFA 354' can't find ID
Room 'Health Ctr/Underserved' can't find ID
Room 'Health Ctr/Underserved' can't find ID
Room 'LAW 508' can't find ID
Room 'LAW 508' can't find ID
Room 'SAR 236' can't find ID
Room 'CGS 427' can't find ID
Room 'YAW 419' can't find ID
Room 'HAR 658' can't find ID
Room 'REL 404' can't find ID
Room 'PHO 207' can't find ID
Room 'Medical Center' can't find ID
Room 'Medical Center' can't find ID
Room 'CFA 352' can't find ID
Room 'LAW 513' can't find ID
Room 'LSE 904' can't find ID
Room 'MCH 102' can't find ID
Room 'CGS 417' can't find ID
Room 'HAR 419' can't find ID
Room 'PLS 512' can't find ID
Room 'LAW 203' can't find ID
Room 'LAW 420' can't find ID
Room 'LAW 420' can't find ID
Room 'STH 541' can't find ID
Room 'PHO 207' can't find ID
Room 'Auburn Hospital' can't find ID
Room 'Auburn Hospital' can't find ID
Room 'Shore Hospital' can't find ID
Room 

In [20]:
professor_courses_monday.shape

(2656, 505, 288)

<h2>walking_cost: float[][], time cost matrix</h2>

In [35]:
import pandas as pd

# Step 1: Read the 'b2b_walking_distance.csv' file into a pandas DataFrame.
b2b_distance = pd.read_csv("../data/b2b_walking_distance.csv")

# Step 2: Create a dictionary of building names mapped to their respective indices from the classroom_mapping.
buildings = {name.split()[0]: idx for name, idx in classroom_mapping.items()}

# Step 3: Define the number of classrooms using the size of the classroom mapping.
num_classrooms = len(classroom_mapping)

# Initialize the walking_cost matrix with 'inf', denoting initially unknown walking distances.
walking_cost = np.full((num_classrooms, num_classrooms), np.inf)

# Iterate over each possible classroom pairing to set the initial walking costs.
for i in range(num_classrooms):
    for j in range(num_classrooms):
        # Set the walking cost to 0 for the same room (i.e., no walking required).
        if i == j:
            walking_cost[i][j] = 0
        # Assign a generic cost of 10 for moving within the same building.
        elif list(classroom_mapping.keys())[i].split()[0] == list(classroom_mapping.keys())[j].split()[0]:
            walking_cost[i][j] = 10

# Step 4: Iterate over each row in the 'b2b_distance' DataFrame to update the walking_cost matrix.
for _, row in b2b_distance.iterrows():
    building_a, building_b, distance = row['abbreviationA'], row['abbreviationB'], row['distance']
    # Update the matrix for pairs of classrooms, between each pair of buildings.
    for classroom_a, idx_a in classroom_mapping.items():
        for classroom_b, idx_b in classroom_mapping.items():
            if classroom_a.split()[0] == building_a and classroom_b.split()[0] == building_b:
                # Set the walking distance for the respective indices in the walking_cost matrix.
                walking_cost[idx_a][idx_b] = distance
                walking_cost[idx_b][idx_a] = distance

# Step 5: Identify and print positions where walking_cost remains 'inf' (meaning unreachable). If none, confirm.
inf_positions = np.where(walking_cost == np.inf)
if inf_positions[0].size > 0:
    for i, j in zip(inf_positions[0], inf_positions[1]):
        print(f"Inf found at walking_cost[{i}][{j}]")
else:
    print("No Inf values found in walking_cost matrix.")

KeyboardInterrupt: 

<h1>Visualizaiton</h1>

In [22]:
classroom_mapping

{'AGG C205': 0,
 'AGG G105': 1,
 'AGG G171': 2,
 'BAB 121': 3,
 'BAB 140': 4,
 'BAB 141': 5,
 'BAB 148': 6,
 'BCN 115': 7,
 'BCN 208': 8,
 'BRB 113': 9,
 'BRB 121': 10,
 'BRB 122': 11,
 'BRB B25': 12,
 'CAS 114A': 13,
 'CAS 114B': 14,
 'CAS 116': 15,
 'CAS 201': 16,
 'CAS 203': 17,
 'CAS 204A': 18,
 'CAS 204B': 19,
 'CAS 208': 20,
 'CAS 211': 21,
 'CAS 212': 22,
 'CAS 213': 23,
 'CAS 214': 24,
 'CAS 216': 25,
 'CAS 218': 26,
 'CAS 220': 27,
 'CAS 222': 28,
 'CAS 223': 29,
 'CAS 224': 30,
 'CAS 225': 31,
 'CAS 226': 32,
 'CAS 227': 33,
 'CAS 228': 34,
 'CAS 229': 35,
 'CAS 233': 36,
 'CAS 235': 37,
 'CAS 237': 38,
 'CAS 303A': 39,
 'CAS 306': 40,
 'CAS 310': 41,
 'CAS 312': 42,
 'CAS 313': 43,
 'CAS 314': 44,
 'CAS 315': 45,
 'CAS 316': 46,
 'CAS 318': 47,
 'CAS 320': 48,
 'CAS 322': 49,
 'CAS 323A': 50,
 'CAS 323B': 51,
 'CAS 324': 52,
 'CAS 325': 53,
 'CAS 326': 54,
 'CAS 327': 55,
 'CAS 330': 56,
 'CAS 335': 57,
 'CAS 415': 58,
 'CAS 424': 59,
 'CAS 425': 60,
 'CAS 426': 61,
 'CAS 42

In [23]:
name_capacity_dict

{'AGG G171': 105,
 'BAB 121': 50,
 'BAB 140': 90,
 'BAB 141': 63,
 'BAB 148': 60,
 'BCN 115': 120,
 'BCN 208': 110,
 'BRB 113': 48,
 'BRB 121': 30,
 'BRB 122': 35,
 'BRB B25': 24,
 'CAS 114A': 17,
 'CAS 114B': 17,
 'CAS 116': 40,
 'CAS 201': 40,
 'CAS 203': 50,
 'CAS 204A': 38,
 'CAS 204B': 34,
 'CAS 208': 35,
 'CAS 211': 112,
 'CAS 212': 22,
 'CAS 213': 51,
 'CAS 214': 35,
 'CAS 216': 63,
 'CAS 218': 35,
 'CAS 220': 32,
 'CAS 222': 36,
 'CAS 224': 133,
 'CAS 226': 58,
 'CAS 303A': 20,
 'CAS 306': 45,
 'CAS 310': 20,
 'CAS 312': 20,
 'CAS 313': 112,
 'CAS 314': 20,
 'CAS 315': 54,
 'CAS 316': 20,
 'CAS 318': 20,
 'CAS 320': 28,
 'CAS 322': 24,
 'CAS 323A': 20,
 'CAS 323B': 18,
 'CAS 324': 39,
 'CAS 325': 24,
 'CAS 326': 57,
 'CAS 327': 38,
 'CAS 330': 24,
 'CAS 335': 30,
 'CAS 415': 16,
 'CAS 424': 18,
 'CAS 425': 20,
 'CAS 426': 48,
 'CAS 427': 18,
 'CAS 430': 25,
 'CAS 502': 40,
 'CAS 521': 16,
 'CAS 522': 208,
 'CAS 530': 20,
 'CAS 534': 18,
 'CAS 538': 18,
 'CAS B06A': 38,
 'CAS B0

In [24]:
professor_mapping

{'Min Ye': 0,
 'Sorcha Martin': 1,
 'Max Anzede': 2,
 'Roberto Tron': 3,
 'Libang Wang': 4,
 'Kathleen Corriveau': 5,
 'Ava Greene': 6,
 'Aiman Abilova': 7,
 'Assaf Kfoury': 8,
 'William Letizia': 9,
 'Diana Lobel': 10,
 'Nilay Kafali': 11,
 'Joshua Benton': 12,
 'John McGinnis': 13,
 'Mark Stanley': 14,
 'Manuel Ramirez': 15,
 'Alina Ene': 16,
 'Tiago Januario': 17,
 'Patrice Oppliger': 18,
 'Katelyn Bird': 19,
 'Joseph Russo': 20,
 'Sandra Buerger': 21,
 'Bjorn Persson': 22,
 'Rebecca Gebert': 23,
 'Christine Hamel': 24,
 'Pipier Smith-Mumford': 25,
 'Tanima Chatterjee': 26,
 'Christine Papadakis-Kanaris': 27,
 'Hongwei Xi': 28,
 'Yi Grace Ji': 29,
 'Ara Sarkissian': 30,
 'Edward Kearns': 31,
 'Rachel Mesch': 32,
 'Jerome Mertz': 33,
 'Sree Kumar Valath Bhuan Das': 34,
 'Sally Sedgwick': 35,
 'Ronald Czik': 36,
 'Ken Chung': 37,
 'Gregg Jaeger': 38,
 'Brian Kellum': 39,
 'Jeffrey Leonard': 40,
 'Rebecca Roesler': 41,
 'Lorenzo Sanchez-Gatt': 42,
 'Lance Galletti': 43,
 'Peter Garik':

In [25]:
professor_schedule

{0: [(1014, 1047, 23), (402, 417, 46), (978, 993, 46)],
 1: [(150, 168, 31),
  (96, 117, 90),
  (672, 693, 90),
  (122, 143, 31),
  (186, 204, 30),
  (168, 186, 29),
  (978, 993, 32)],
 2: [(737, 747, 25), (750, 760, 25)],
 3: [(474, 495, 50), (1050, 1071, 50)],
 4: [(187, 197, 10), (763, 773, 10)],
 5: [(384, 417, 30)],
 6: [(1274, 1284, 25), (1248, 1258, 25), (1261, 1271, 25)],
 7: [(148, 158, 26), (161, 171, 26), (135, 145, 26)],
 8: [(402, 417, 35), (978, 993, 35), (750, 760, 35)],
 9: [(402, 417, 35), (978, 993, 35)],
 10: [(161, 171, 14),
  (737, 747, 14),
  (1313, 1323, 14),
  (456, 471, 14),
  (1032, 1047, 14)],
 11: [(161, 171, 75),
  (737, 747, 75),
  (1313, 1323, 76),
  (122, 132, 40),
  (698, 708, 40),
  (1274, 1284, 40),
  (135, 145, 76),
  (711, 721, 76),
  (1287, 1297, 76)],
 12: [(222, 255, 15)],
 13: [(1265, 1298, 20),
  (198, 213, 22),
  (774, 789, 22),
  (113, 146, 24),
  (798, 831, 22),
  (222, 255, 22)],
 14: [(698, 719, 9),
  (438, 471, 30),
  (750, 783, 20),
  (6

In [119]:
classroom_mapping

{'AGG C205': 0,
 'AGG G105': 1,
 'AGG G171': 2,
 'BAB 121': 3,
 'BAB 140': 4,
 'BAB 141': 5,
 'BAB 148': 6,
 'BCN 115': 7,
 'BCN 208': 8,
 'BRB 113': 9,
 'BRB 121': 10,
 'BRB 122': 11,
 'BRB B25': 12,
 'CAS 114A': 13,
 'CAS 114B': 14,
 'CAS 116': 15,
 'CAS 201': 16,
 'CAS 203': 17,
 'CAS 204A': 18,
 'CAS 204B': 19,
 'CAS 208': 20,
 'CAS 211': 21,
 'CAS 212': 22,
 'CAS 213': 23,
 'CAS 214': 24,
 'CAS 216': 25,
 'CAS 218': 26,
 'CAS 220': 27,
 'CAS 222': 28,
 'CAS 223': 29,
 'CAS 224': 30,
 'CAS 225': 31,
 'CAS 226': 32,
 'CAS 227': 33,
 'CAS 228': 34,
 'CAS 229': 35,
 'CAS 233': 36,
 'CAS 235': 37,
 'CAS 237': 38,
 'CAS 303A': 39,
 'CAS 306': 40,
 'CAS 310': 41,
 'CAS 312': 42,
 'CAS 313': 43,
 'CAS 314': 44,
 'CAS 315': 45,
 'CAS 316': 46,
 'CAS 318': 47,
 'CAS 320': 48,
 'CAS 322': 49,
 'CAS 323A': 50,
 'CAS 323B': 51,
 'CAS 324': 52,
 'CAS 325': 53,
 'CAS 326': 54,
 'CAS 327': 55,
 'CAS 330': 56,
 'CAS 335': 57,
 'CAS 415': 58,
 'CAS 424': 59,
 'CAS 425': 60,
 'CAS 426': 61,
 'CAS 42

In [38]:
professor_courses

array([[[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       ...,

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 

In [26]:
walking_cost

array([[   0.  ,   10.  ,   10.  , ..., 1805.07, 1805.07, 1805.07],
       [  10.  ,    0.  ,   10.  , ..., 1805.07, 1805.07, 1805.07],
       [  10.  ,   10.  ,    0.  , ..., 1805.07, 1805.07, 1805.07],
       ...,
       [1805.07, 1805.07, 1805.07, ...,    0.  ,   10.  ,   10.  ],
       [1805.07, 1805.07, 1805.07, ...,   10.  ,    0.  ,   10.  ],
       [1805.07, 1805.07, 1805.07, ...,   10.  ,   10.  ,    0.  ]])

<h1>Export</h1>

In [27]:
import pickle
data_to_export = {
    "monday_professor_schedule": monday_professor_schedule,
    "professor_mapping": professor_mapping,
    "classroom_mapping": classroom_mapping,
    "professor_courses_monday ": professor_courses_monday ,
    "capacities": capacities,
    "walking_cost": walking_cost
}

with open("data_export.pkl", "wb") as file:
    pickle.dump(data_to_export, file)


<h1>Solution</h1>

In [39]:
import pickle

with open("../data/exported_data.pkl", "rb") as file:
    data = pickle.load(file)

print(data)


{'capacities_top10': [269, 12, 12, 12, 12, 16, 12, 8, 105, 320], 'professor_courses_top5': {1: [(150, 168, 31), (96, 117, 90), (672, 693, 90), (122, 143, 31), (186, 204, 30), (168, 186, 29), (978, 993, 32)], 2: [(180, 202, 21), (156, 178, 31), (1272, 1294, 55)], 4: [(474, 495, 50), (1050, 1071, 50)], 5: [(187, 197, 10), (763, 773, 10)], 6: [(384, 417, 30)]}, 'walking_cost_df_top10':           ABG 101  ABG 301A  ABG 406  ABG 408  ABG 409  ABG 409A  ABG 410  \
ABG 101      0.00      0.00     0.00     0.00     0.00      0.00     0.00   
ABG 301A     0.00      0.00     0.00     0.00     0.00      0.00     0.00   
ABG 406      0.00      0.00     0.00     0.00     0.00      0.00     0.00   
ABG 408      0.00      0.00     0.00     0.00     0.00      0.00     0.00   
ABG 409      0.00      0.00     0.00     0.00     0.00      0.00     0.00   
ABG 409A     0.00      0.00     0.00     0.00     0.00      0.00     0.00   
ABG 410      0.00      0.00     0.00     0.00     0.00      0.00     0.00  

In [39]:
total_walking_cost = 0.0

num_professors = len(professor_courses_monday)
num_classrooms = len(professor_courses_monday[0])
num_time_slots = len(professor_courses_monday[0][0])

# for prof_index in range(num_professors):
#
#     prev_classroom = None
#     for time_slot in range(num_time_slots):
#         current_classroom = None
#         for classroom_index in range(num_classrooms):
#             if professor_courses_monday[prof_index][classroom_index][time_slot] == 1:
#                 current_classroom = classroom_index
#                 break
#         if current_classroom is not None:
#             if prev_classroom is not None:
#                 cost = walking_cost[prev_classroom][current_classroom]
#                 total_walking_cost += cost
#             prev_classroom = current_classroom

# print(f"Total walking cost for Monday (excluding professor 38): {total_walking_cost}")


Total walking cost for Monday (excluding professor 38): 108286.71600000006


In [40]:
professor_walking_costs = [0.0] * num_professors

for prof_index in range(num_professors):
    prev_classroom = None
    for time_slot in range(num_time_slots):
        current_classroom = None
        for classroom_index in range(num_classrooms):
            if professor_courses_monday[prof_index][classroom_index][time_slot] == 1:
                current_classroom = classroom_index
                break
        if current_classroom is not None:
            if prev_classroom is not None:
                cost = walking_cost[prev_classroom][current_classroom]
                professor_walking_costs[prof_index] += cost
            prev_classroom = current_classroom
        else:
            prev_classroom = None

for prof_index, cost in enumerate(professor_walking_costs):
    print(f"教授 {professor_mapping[prof_index]} 的总行走成本：{cost}")

total_walking_cost = sum(professor_walking_costs)
print(f"周一的总行走成本：{total_walking_cost}")

KeyError: 0

In [12]:
import pickle

file_path = '../data/best_solution.pkl'

with open(file_path, 'rb') as file:
    data = pickle.load(file)
solution = data[0]


In [None]:
total_walking_cost = 0.0

for prof_id, classrooms in solution.items():
    if len(classrooms) <= 1:
        continue
    for i in range(len(classrooms) - 1):
        from_room = classrooms[i]
        to_room = classrooms[i + 1]
        cost = walking_cost[from_room][to_room]
        total_walking_cost += cost

print(f"Solution的总行走成本：{total_walking_cost}")

In [14]:

id_to_professor = {v: k for k, v in professor_mapping.items()}
id_to_classroom = {v: k for k, v in classroom_mapping.items()}

formatted_solution = {}

for prof_id, room_ids in solution.items():
    prof_name = id_to_professor.get(prof_id, f"Unknown Professor {prof_id}")
    formatted_solution[prof_name] = [id_to_classroom.get(room_id, f"Unknown Room {room_id}") for room_id in room_ids]

for prof_name, rooms in formatted_solution.items():
    print(f"{prof_name}: {', '.join(rooms)}")

Sorcha Martin: FLR 404L, FLR 404D, FLR 254, FLR 303, FLR 254
Libang Wang: MCS B33
Aiman Abilova: CAS 315, CAS 502, CAS B06A
Diana Lobel: PTH 114
Nilay Kafali: IEC B13, IEC B13, IEC B01
Joshua Benton: PHO 201
Patrice Oppliger: SOC B63
Sandra Buerger: SCI 153, SCI 128, SCI 113
Bjorn Persson: CDS 265, CDS 265
Tanima Chatterjee: FLR 404B, FLR 404D, FLR 404L, FLR 121, FLR 206
Yi Grace Ji: STH B22
Ara Sarkissian: FLR 404D
Edward Kearns: CAS B25A, CAS 218
Jerome Mertz: HAR 105
Sally Sedgwick: CAS B12, CAS 212
Gregg Jaeger: CDS 463, CDS 262, CDS B64B
Doug Gould: CAS 214
Weijia Huang: SAR B08, SAR 101, SAR 101
-: FLR 207, FLR 121, CFA 557, CFA 557, CFA 102, CFA 102, STH 115, LAW 212, STH 115, LAW 212, LAW 212, LAW 212, STH B23, CFA 557, FRC L137, CGS 121, CGS 323, CGS 121, FLR 404L, FLR 124, FLR 404L, FLR 405, FLR 404L, FLR 409, FLR 409, CFA 102, CGS 323
Masanao Yajima: CAS 213
Lisa Wobbes: IEC B10, EOP 274, IEC B11, IEC B06, IEC B06, IEC B09A
Ata Turk: AGG G171
Jaemin Roh: SCI 268A, SCI B19, S