# **Merging BOSS Results with Class Timings**

<div style="background-color:#FFF8DC; padding:12px; border-radius:5px; border: 1px solid #DAA520;">
    
  <h2 style="color:#8B8000;">✨ Looking for the Latest Model? Consider V3! ✨</h2>
  <p>👉 <a href="V3_example_prediction.ipynb"><strong>Check out V3 Here</strong></a></p>

</div>

### **Objective**
This script merges SMU's **Overall BOSS Results** data with **Class Timings** data for better insights. The merged files are saved in a specified output folder. The process includes:
1. Defining folder paths and reading files.
2. Filtering relevant columns from the class timings data.
3. Merging the datasets based on matching "Section" and "Course Code".
4. Saving the merged files to the output directory.

### **Script Structure**
1. **Setup**: Define folders and prepare file lists.
2. **Filtering and Processing**:
    - Extract relevant columns from the `classTimings` files.
    - Merge each file with its corresponding `overallBossResults` file.
3. **Save Outputs**: Write the merged data to CSV files.


---
### 1. Setup

In [1]:
import os
import pandas as pd

In [16]:
# Define folder paths
class_timings_folder = "classTimings"
overall_boss_results_folder = "overallBossResults"
output_folder = "overallBossResultsWTimings"

In [17]:
# Create output folder if it doesn't exist
os.makedirs(output_folder, exist_ok=True)

# List all files in the respective folders
class_timings_files = {file.replace("AddedInfo", "").replace(".csv", ""): os.path.join(class_timings_folder, file)
                       for file in os.listdir(class_timings_folder) if file.endswith(".csv")}

overall_boss_results_files = {file.replace(".xls", ""): os.path.join(overall_boss_results_folder, file)
                              for file in os.listdir(overall_boss_results_folder) if file.endswith(".xls")}

In [18]:
class_timings_files

{'2021-22_T1': 'classTimings\\2021-22_T1AddedInfo.csv',
 '2021-22_T2': 'classTimings\\2021-22_T2AddedInfo.csv',
 '2021-22_T3A': 'classTimings\\2021-22_T3AAddedInfo.csv',
 '2021-22_T3B': 'classTimings\\2021-22_T3BAddedInfo.csv',
 '2022-23_T1': 'classTimings\\2022-23_T1AddedInfo.csv',
 '2022-23_T2': 'classTimings\\2022-23_T2AddedInfo.csv',
 '2022-23_T3A': 'classTimings\\2022-23_T3AAddedInfo.csv',
 '2022-23_T3B': 'classTimings\\2022-23_T3BAddedInfo.csv',
 '2023-24_T1': 'classTimings\\2023-24_T1AddedInfo.csv',
 '2023-24_T2': 'classTimings\\2023-24_T2AddedInfo.csv',
 '2023-24_T3A': 'classTimings\\2023-24_T3AAddedInfo.csv',
 '2023-24_T3B': 'classTimings\\2023-24_T3BAddedInfo.csv',
 '2024-25_T1': 'classTimings\\2024-25_T1AddedInfo.csv',
 '2024-25_T2': 'classTimings\\2024-25_T2AddedInfo.csv',
 '2024-25_T3A': 'classTimings\\2024-25_T3AAddedInfo.csv',
 '2024-25_T3B': 'classTimings\\2024-25_T3BAddedInfo.csv'}

In [19]:
overall_boss_results_files

{'2021-22_T2': 'overallBossResults\\2021-22_T2.xls',
 '2021-22_T3B': 'overallBossResults\\2021-22_T3B.xls',
 '2022-23_T1': 'overallBossResults\\2022-23_T1.xls',
 '2022-23_T2': 'overallBossResults\\2022-23_T2.xls',
 '2022-23_T3A': 'overallBossResults\\2022-23_T3A.xls',
 '2022-23_T3B': 'overallBossResults\\2022-23_T3B.xls',
 '2023-24_T1': 'overallBossResults\\2023-24_T1.xls',
 '2023-24_T2': 'overallBossResults\\2023-24_T2.xls',
 '2023-24_T3A': 'overallBossResults\\2023-24_T3A.xls',
 '2023-24_T3B': 'overallBossResults\\2023-24_T3B.xls',
 '2024-25_T1': 'overallBossResults\\2024-25_T1.xls',
 '2024-25_T2': 'overallBossResults\\2024-25_T2.xls'}


---
### 2. Filtering and Processing

In [23]:
# Relevant columns to keep from classTimings
relevant_columns = [
    "Section", "Course Code", "Grading Basis", "class1_day", "class1_starttime", "class1_venue",
    "class2_day", "class2_starttime", "class2_venue", "class3_day", "class3_starttime", 
    "class3_venue", "exam_startdate", "exam_day", "exam_starttime"
] #Ignored "SelectedClassNumber" and "SelectedAcadTerm" as these are randomised values used by BOSS.

# Process and merge the files
for term, boss_file_path in overall_boss_results_files.items():
    if term in class_timings_files:
        # Load data from the files
        boss_data = pd.read_excel(boss_file_path)
        class_data = pd.read_csv(class_timings_files[term])
        
        # Keep only relevant columns for classTimings
        class_data_filtered = class_data[relevant_columns]
        
        # Merge the data on "Section" and "Course Code"
        merged_data = pd.merge(boss_data, class_data_filtered, on=["Section", "Course Code"], how="left")
        
        # Save the merged file to the output folder
        output_path = os.path.join(output_folder, f"{term}_Merged.csv")
        merged_data.to_csv(output_path, index=False)
        print(f"Merged file created for term: {term}")
    else:
        print(f"No matching class timings file for term: {term}")

Merged file created for term: 2021-22_T2
Merged file created for term: 2021-22_T3B
Merged file created for term: 2022-23_T1
Merged file created for term: 2022-23_T2
Merged file created for term: 2022-23_T3A
Merged file created for term: 2022-23_T3B
Merged file created for term: 2023-24_T1
Merged file created for term: 2023-24_T2
Merged file created for term: 2023-24_T3A
Merged file created for term: 2023-24_T3B
Merged file created for term: 2024-25_T1
Merged file created for term: 2024-25_T2
Merging complete. Files are stored in the 'overallBossResultsWTimings' folder.



---
### 3. Save Outputs

In [None]:
# Confirm completion
print(f"Merging complete. Files are stored in the '{output_folder}' folder.")

---

### **Output**
The merged files are saved in the `overallBossResultsWTimings` folder with the naming convention: `<term>_Merged.csv`.

---

### **Notes**
1. Ensure that folder paths (`classTimings` and `overallBossResults`) contain the necessary files.
2. The script assumes that files in both folders have consistent naming conventions for matching terms.
3. Any unmatched terms between the two folders will be logged in the output console.