# Delivery Time Prediction — XGBoost Tuned Model (Inference)

## 1. Import Library
```python
import pandas as pd
import pickle


In [1]:
import pandas as pd
import pickle

In [2]:
# Load model hasil training
with open("best_xgboost_delivery_time_tuned.pkl", "rb") as file:
    model = pickle.load(file)

# Load dataset bersih
df = pd.read_csv("Food_Delivery_Times_Clean.csv")

df.head()


Unnamed: 0,order_id,distance_km,weather,traffic_level,time_of_day,vehicle_type,preparation_time_min,courier_experience_yrs,delivery_time_min
0,522,7.93,Windy,Low,Afternoon,Scooter,12,1.0,43
1,738,16.42,Clear,Medium,Evening,Bike,20,2.0,84
2,741,9.52,Foggy,Low,Night,Scooter,28,1.0,59
3,661,7.44,Rainy,Medium,Afternoon,Scooter,5,1.0,37
4,412,19.03,Clear,Low,Morning,Bike,16,5.0,68


# Predict Delivery Time for New Order


In [3]:
# Buat data baru (misalnya 3 pesanan baru)
# Kolom harus sama seperti model training
new_data = pd.DataFrame({
    "distance_km": [5.2, 1.8, 12.0],
    "weather": ["Clear", "Rainy", "Cloudy"],
    "traffic_level": ["Medium", "High", "Low"],
    "time_of_day": ["Lunch", "Evening", "Morning"],
    "vehicle_type": ["Motorbike", "Bicycle", "Car"],
    "preparation_time_min": [15, 8, 25],
    "courier_experience_yrs": [2, 0.5, 5]
})


In [4]:
# --- 1. Feature Engineering untuk data baru ---
# Tambahkan fitur turunan seperti yang digunakan saat training

new_data["prep_to_deliv_ratio"] = new_data["preparation_time_min"] / (new_data["distance_km"] + 1)  # mencegah pembagian nol
new_data["speed_km_per_min"] = new_data["distance_km"] / (new_data["preparation_time_min"] + 1)

# Klasifikasi level pengalaman kurir dengan batas kiri inklusif
new_data["experience_level"] = pd.cut(
    new_data["courier_experience_yrs"],
    bins=[0, 2, 5, 10, 20],
    labels=["Newbie", "Intermediate", "Experienced", "Veteran"],
    right=False  # agar nilai 2.0 masuk ke Intermediate
)


### Catatan: Perbedaan Feature Engineering antara Training dan Inference

Pada tahap *training*, kita memiliki data aktual `delivery_time_min`, sehingga fitur seperti:
- **`speed_km_per_min`** dihitung dari `distance_km / delivery_time_min`
- **`prep_to_deliv_ratio`** dihitung dari `preparation_time_min / delivery_time_min`

Namun, saat *inference* (prediksi), nilai `delivery_time_min` belum diketahui karena itu yang akan diprediksi.
Oleh karena itu, kedua fitur tersebut dibuat menggunakan pendekatan perkiraan yang tidak menggunakan label target, misalnya:
- `prep_to_deliv_ratio = preparation_time_min / (distance_km + 1)`
- `speed_km_per_min = distance_km / (preparation_time_min + 1)`

Tujuannya adalah untuk **mencegah data leakage** dan tetap menjaga konsistensi struktur fitur agar model bisa melakukan prediksi dengan benar.


In [5]:
# Prediksi waktu pengantaran (menit)
predictions = model.predict(new_data)
new_data["predicted_delivery_time_min"] = predictions
new_data


Unnamed: 0,distance_km,weather,traffic_level,time_of_day,vehicle_type,preparation_time_min,courier_experience_yrs,prep_to_deliv_ratio,speed_km_per_min,experience_level,predicted_delivery_time_min
0,5.2,Clear,Medium,Lunch,Motorbike,15,2.0,2.419355,0.325,Intermediate,24.380713
1,1.8,Rainy,High,Evening,Bicycle,8,0.5,2.857143,0.2,Newbie,19.655823
2,12.0,Cloudy,Low,Morning,Car,25,5.0,1.923077,0.461538,Experienced,34.576996


In [6]:
#==========================
# Insight hasil prediksi
#==========================

for i, row in new_data.iterrows():
    print(
        f"Pesanan #{i+1}: {row['vehicle_type']} jarak {row['distance_km']} km "
        f"dengan cuaca {row['weather']} → estimasi waktu antar ≈ {row['predicted_delivery_time_min']:.1f} menit"
    )


Pesanan #1: Motorbike jarak 5.2 km dengan cuaca Clear → estimasi waktu antar ≈ 24.4 menit
Pesanan #2: Bicycle jarak 1.8 km dengan cuaca Rainy → estimasi waktu antar ≈ 19.7 menit
Pesanan #3: Car jarak 12.0 km dengan cuaca Cloudy → estimasi waktu antar ≈ 34.6 menit
