
# Predict with XGBoost (Parquet, categorical dtypes)

This notebook mirrors `predict_xgb_from_parquet.py`:
- Loads `xgb_model.joblib`, `X_submit_clean.parquet` (categorical dtypes), and `data/test.csv` for the `Id` column.
- Produces `submission.csv` (`Id, SalePrice`).
- If the model was trained on `log1p(y)`, set `LOG_TARGET=True` to apply `expm1` to predictions.


In [1]:

# Optional installs if your kernel misses deps:
# %pip install -U xgboost joblib pandas pyarrow


In [6]:

import os, json
import numpy as np
import pandas as pd
from joblib import load

# ---- Paths (relative) ----
MODEL_PATH   = "../xgb_model/xgb_model.joblib"                         # trained model path
X_SUBMIT_PQ  = "../xgb_clean_outputs/X_submit_clean.parquet" # cleaned test features
TEST_RAW_CSV = "../data/test.csv"                            # raw test for Id
OUT_SUB      = "../data/submission_XGB.csv"                      # output submission

# If model was trained on log1p(y), set True to inverse-transform predictions
LOG_TARGET = False

def ensure_dir_for_file(p):
    d = os.path.dirname(p)
    if d:
        os.makedirs(d, exist_ok=True)


## Load model & data, then predict

In [7]:

print("[INFO] Loading model:", MODEL_PATH)
model = load(MODEL_PATH)

print("[INFO] Reading cleaned features (parquet):", X_SUBMIT_PQ)
X_submit = pd.read_parquet(X_SUBMIT_PQ)

print("[INFO] Reading raw test.csv for Id:", TEST_RAW_CSV)
df_test = pd.read_csv(TEST_RAW_CSV)
if "Id" not in df_test.columns:
    raise ValueError("Raw test.csv must contain an 'Id' column.")

print("[INFO] Predicting...")
pred_log = model.predict(X_submit)
pred = np.expm1(pred_log) if LOG_TARGET else pred_log

submission = pd.DataFrame({"Id": df_test["Id"].values, "SalePrice": pred})
ensure_dir_for_file(OUT_SUB)
submission.to_csv(OUT_SUB, index=False)
print(f"[OK] Saved submission to: {OUT_SUB}")

submission.head()


[INFO] Loading model: ../xgb_model/xgb_model.joblib
[INFO] Reading cleaned features (parquet): ../xgb_clean_outputs/X_submit_clean.parquet
[INFO] Reading raw test.csv for Id: ../data/test.csv
[INFO] Predicting...
[OK] Saved submission to: ../data/submission_XGB.csv


configuration generated by an older version of XGBoost, please export the model by calling
`Booster.save_model` from that version first, then load it back in current version. See:

    https://xgboost.readthedocs.io/en/stable/tutorials/saving_model.html

for more details about differences between saving model and serializing.



Unnamed: 0,Id,SalePrice
0,1461,128570.609375
1,1462,167721.09375
2,1463,189531.875
3,1464,187859.734375
4,1465,191698.6875
