# **BidlySMU Prediction Example using V4**

<h2><span style="color:red">NOTE: use at your own descretion.</span></h2>

### **Objective**
This notebook predicts the minimum and median bid required for courses in the SMU bidding system using a **CatBoost** regression model. The following code is an example of how to run it on your personal computer. **No additional experience is needed** other than setting up the environment.

---

### **Methodology**
The notebook is structured as follows:
1. **Clone the Repository**
2. **Install Dependencies**
3. **Load the Pre-trained Models**
4. **Define New Data for Prediction**
5. **Predict Using the Models**

---

### **Data Required (case sensitive)**

| **Column Name**                | **Description** |
|--------------------------------|-----------------------------------------------------------|
| **`Term`**                     | Academic term of the course (1, 2, 3A or 3B). |
| **`Description`**              | Name of the course. |
| **`Section`**                  | Course section identifier. |
| **`Vacancy`**                  | Total available spots in the course. |
| **`Before Process Vacancy`**   | Number of available spots **before** the bidding process. |
| **`Instructor`**               | Name of the instructor. |
| **`Grading Basis`**            | Type of grading (e.g., Graded, Pass/Fail). |
| **`class1_day`**               | Day of the week for the first class session. |
| **`class1_starttime`**         | Start time for the first class session. |
| **`class1_venue`**             | Venue for the first class session. |
| **`class2_day`**               | Day of the week for the second class session (if applicable). |
| **`class2_starttime`**         | Start time for the second class session. |
| **`class2_venue`**             | Venue for the second class session. |
| **`class3_day`**               | Day of the week for the third class session (if applicable). |
| **`class3_starttime`**         | Start time for the third class session. |
| **`class3_venue`**             | Venue for the third class session. |
| **`exam_startdate`**           | Date of exam in dd-mmm-yyyy format (e.g. 21-Apr-2025)|
| **`exam_day`**                 | Exam day of the week. |
| **`exam_starttime`**           | Exam start time. |
| **`AY`**                       | Academic year in which the course is offered. |
| **`Incoming Freshman`**        | Whether the course is for incoming freshmen (`yes` or `no`). |
| **`Incoming Exchange`**        | Whether the course is for incoming exchange students (`yes` or `no`). |
| **`Round`**                    | Bidding round (1, 1A, 1B, 1C, 2, 2A). |
| **`Window`**                   | Bidding window within the round (1, 2, 3, 4, 5). |
| **`SubjectArea`**              | Subject area of the course (e.g., IS, ECON). |
| **`CatalogueNo`**              | Course code (e.g., 453). |
| **🎯 Target Variables 🎯**      | **Predicted bid prices** |
| **`Min Bid`**                  | Minimum bid price required for the course. |
| **`Median Bid`**               | Median bid price required for the course. |

---

### **🛠️ Step 1: Clone the Repository**
To get started, **clone the GitHub repository** containing the pre-trained models:

```bash
git clone https://github.com/tanzhongyan/BidlySMU/
cd smu-course-bidding
```

---

### **📥 Step 2: Install Dependencies**
Ensure you have **Python 3.8 or newer** installed. Check with:

```bash
python --version
```

Then, install all required dependencies from `requirements.txt`:

```bash
pip install -r requirements.txt
```

This will install:
- **Pandas**
- **CatBoost**
- **NumPy**
- **Other necessary dependencies**

---

### **📂 Step 3: Load the Pre-trained Models**
Since the `.cbm` models are already included in the repo, you can load them directly:

In [4]:
from catboost import CatBoostRegressor

# Load the Min Bid Model
model_min_bid = CatBoostRegressor()
model_min_bid.load_model("catboost_min_bid.cbm")

# Load the Median Bid Model
model_median_bid = CatBoostRegressor()
model_median_bid.load_model("catboost_median_bid.cbm")

print("✅ Models loaded successfully!")

✅ Models loaded successfully!


---

### **📊 Step 4: Define New Data for Prediction**
Create a new **course entry** in the same format as the training data:

In [5]:
import pandas as pd

# Define the new data instance
new_data = pd.DataFrame({
    'Term': [2],
    'Description': ["Enterprise Solution Management"],
    'Section': ["G1"],
    'Vacancy': [40],
    'Before Process Vacancy': [38],
    'Instructor': ["RAFAEL J. BARROS"],
    'Grading Basis': ["Graded"],
    'class1_day': ["Mon"],
    'class1_starttime': ["08:15"],
    'class1_venue': ["SOE/SCIS2 Seminar Room B1-2"],
    'class2_day': ["NA"],
    'class2_starttime': ["NA"],
    'class2_venue': ["NA"],
    'class3_day': ["NA"],
    'class3_starttime': ["NA"],
    'class3_venue': ["NA"],
    'exam_startdate': ["21-Apr-2025"], 
    'exam_day': ["Mon"],
    'exam_starttime': ["13:00"],
    'AY': [2024],
    'Incoming Freshman': ["no"],
    'Incoming Exchange': ["no"],
    'Round': ["1"],
    'Window': [1],
    'SubjectArea': ["IS"],
    'CatalogueNo': ["214"],
})

Run the **transformation script** below to help with standardisation

In [6]:
# Ensure proper data types for all columns
def standardise_data_types(data):
    # Ensure categorical columns are properly typed

    categorical_cols = [
        'Term','Description','Section',
        'Instructor','Grading Basis','class1_day','class1_starttime','class1_venue',
        'class2_day','class2_starttime','class2_venue','class3_day','class3_starttime',
        'class3_venue','exam_startdate','exam_day','exam_starttime',
        'Incoming Freshman','Incoming Exchange','Round','SubjectArea','CatalogueNo'
    ]
    for col in categorical_cols:
        data[col] = data[col].astype('object')

    # Convert date columns to datetime
    data['exam_startdate'] = pd.to_datetime(data['exam_startdate'], errors='coerce')

    # Extract year and month from `exam_startdate`
    data['exam_date'] = data['exam_startdate'].dt.day
    data['exam_month'] = data['exam_startdate'].dt.month

    # Drop the original `exam_startdate` column
    data = data.drop(columns=['exam_startdate'])

    # Extract year and month from `exam_startdate`
    data['exam_date'] = data['exam_date'].fillna(0).astype(int)
    data['exam_month'] = data['exam_month'].fillna(0).astype(int)

    return data

In [7]:
new_data_preprocessed = standardise_data_types(new_data)

As all "NA" is automatically converted to 'NaN', we will have to convert it back to "NA" again as **catboost does not accept null values**.

In [8]:
new_data_preprocessed = new_data_preprocessed.fillna("NA")

---

### **📌 Step 5: Predict Using the Model**
Ensure the column order matches the training data:

In [9]:
# Predict Min Bid Price
predicted_min_bid = model_min_bid.predict(new_data_preprocessed)[0]

# Predict Median Bid Price
predicted_median_bid = model_median_bid.predict(new_data_preprocessed)[0]

# Print Results
print(f"Predicted Min Bid: {predicted_min_bid:.2f}")
print(f"Predicted Median Bid: {predicted_median_bid:.2f}")

Predicted Min Bid: 24.65
Predicted Median Bid: 34.79


---

### **✅ Summary**
🎯 **What we did:**
1. **Cloned the repository**, which includes the `.cbm` model files.
2. **Installed dependencies** (`requirements.txt`).
3. **Loaded the pre-trained models** (`min_bid_model.cbm` & `median_bid_model.cbm`).
4. **Created a sample course entry** for prediction.
5. **Ran Min Bid & Median Bid predictions**.

🔹 **Now you’re ready to predict SMU course bidding prices instantly!** 🚀