Skip to content

YiPhoebe/house_prices

Repository files navigation

๐Ÿ  House Price Prediction - Kaggle Competition

๐Ÿ“Œ ํ”„๋กœ์ ํŠธ ๊ฐœ์š”

์ด ํ”„๋กœ์ ํŠธ๋Š” Kaggle์˜ House Prices - Advanced Regression Techniques ๋Œ€ํšŒ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ง„ํ–‰๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋ชฉํ‘œ๋Š” ๋‹ค์–‘ํ•œ ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•˜์—ฌ ์ฃผํƒ ๊ฐ€๊ฒฉ์„ ์˜ˆ์ธกํ•˜๋Š” ๋ชจ๋ธ์„ ์ตœ์ ํ™”ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ด ํ”„๋กœ์ ํŠธ์—์„œ๋Š” ๋‘ ๊ฐ€์ง€ ๋…ธํŠธ๋ถ(house_price.ipynb, house_price_add.ipynb)์„ ํ™œ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ๊ฐœ์„ ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

  • house_price.ipynb: ๊ธฐ๋ณธ์ ์ธ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๋ฐ ๋ชจ๋ธ๋ง ์ˆ˜ํ–‰ (Baseline Model)
  • house_price_add.ipynb: Feature Engineering ์ถ”๊ฐ€, ๋‹ค์ค‘๊ณต์„ ์„ฑ ํ•ด๊ฒฐ, XGBoost ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ ์ ์šฉ

๐Ÿ—‚ ๋ฐ์ดํ„ฐ ์„ค๋ช…

  • train.csv: ๋ชจ๋ธ ํ•™์Šต์„ ์œ„ํ•œ ๋ฐ์ดํ„ฐ์…‹ (1460๊ฐœ ์ƒ˜ํ”Œ)
  • test.csv: ์ฃผํƒ ๊ฐ€๊ฒฉ ์˜ˆ์ธก์„ ์œ„ํ•œ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ
  • sample_submission.csv: Kaggle ์ œ์ถœ ํ˜•์‹ ์˜ˆ์‹œ

๐Ÿ”ง ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ (Preprocessing)

1๏ธโƒฃ ๊ฒฐ์ธก์น˜ ์ฒ˜๋ฆฌ

  • GarageType, BsmtQual ๋“ฑ ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜ โ†’ 'None'์œผ๋กœ ๋Œ€์ฒด
  • LotFrontage, GarageYrBlt, MasVnrArea ๋“ฑ ์ˆ˜์น˜ํ˜• ๋ณ€์ˆ˜ โ†’ ์ค‘์•™๊ฐ’(median)์œผ๋กœ ๋Œ€์ฒด
  • Electrical ๋ณ€์ˆ˜ ์ตœ๋นˆ๊ฐ’(mode)์œผ๋กœ ๋Œ€์ฒด
  • ๋ถˆํ•„์š”ํ•œ ๋ณ€์ˆ˜ ์ œ๊ฑฐ (PoolQC, MiscFeature, Alley, Fence)

2๏ธโƒฃ Feature Engineering (house_price_add.ipynb์—์„œ ์ถ”๊ฐ€ ๊ฐœ์„ )

  • BuildingAge ์ƒ์„ฑ (ํ˜„์žฌ ์—ฐ๋„ - YearBuilt)
  • Remodeled ๋ณ€์ˆ˜ ์ถ”๊ฐ€ (YearBuilt vs YearRemodAdd)
  • ๋กœ๊ทธ ๋ณ€ํ™˜ ์ ์šฉ (Log_SalePrice)
  • VIF(๋‹ค์ค‘๊ณต์„ ์„ฑ) ์ œ๊ฑฐ ๋ฐ PCA ์ ์šฉ (TotalLivingArea โ†’ PCA_LivingArea)
  • ์ด์ƒ์น˜ ์ œ๊ฑฐ (IQR์„ ํ™œ์šฉํ•˜์—ฌ ๊ทน๋‹จ๊ฐ’ ์ œ์™ธ)

๐Ÿ“Š ๋ชจ๋ธ ํ•™์Šต (Model Training)

1๏ธโƒฃ Baseline Model (house_price.ipynb)

  • Linear Regression
  • RandomForest Regressor
  • XGBoost Regressor

2๏ธโƒฃ ๊ณ ๊ธ‰ ๋ชจ๋ธ (house_price_add.ipynb)

  • Feature Engineering ์ ์šฉ ํ›„ ์žฌํ•™์Šต
  • ๊ต์ฐจ ๊ฒ€์ฆ ์ ์šฉ (K-Fold)๋กœ ๋ชจ๋ธ ํ‰๊ฐ€
  • XGBoost ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ ์ถ”๊ฐ€ ์ ์šฉ

3๏ธโƒฃ XGBoost ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹ (house_price_add.ipynb)

  • GridSearchCV๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ตœ์  ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํƒ์ƒ‰
  • ์ตœ์ ์˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์ ์šฉ ํ›„ ๋ชจ๋ธ ์žฌํ•™์Šต
param_grid = {
    "n_estimators": [300, 500, 1000],
    "learning_rate": [0.01, 0.05, 0.1],
    "max_depth": [3, 6, 9],
    "subsample": [0.6, 0.8, 1.0],
    "colsample_bytree": [0.6, 0.8, 1.0]
}

๐Ÿ“ˆ ๋ชจ๋ธ ์„ฑ๋Šฅ ํ‰๊ฐ€ (Evaluation)

  • RMSE(Log(SalePrice)) ๊ธฐ์ค€ ์„ฑ๋Šฅ ๋น„๊ต| Model | RMSE | | ----------------- | ------------------ | | Linear Regression | 0.155 | | Random Forest | 0.139 | | XGBoost | 0.129 | | Tuned XGBoost | 0.120 โœ… |

๐Ÿ“ค Kaggle ์ œ์ถœ (Submission)

def create_submission(model, X_test, test_df, model_name):
    y_pred = model.predict(X_test)
    y_pred = np.expm1(y_pred)  # ๋กœ๊ทธ ๋ณ€ํ™˜ ๋ณต์›

    submission = pd.DataFrame({"Id": test_df["Id"], "SalePrice": y_pred})
    file_name = f"submission_{model_name}.csv"
    submission.to_csv(file_name, index=False)

    print(f"โœ… {model_name} ์ œ์ถœ ํŒŒ์ผ ์ €์žฅ ์™„๋ฃŒ: {file_name}")

create_submission(best_xgb_model, X_test_preprocessed, test_df, "Tuned_XGBoost")
  • submission_Tuned_XGBoost.csv ํŒŒ์ผ์„ Kaggle์— ์—…๋กœ๋“œํ•˜์—ฌ ํ‰๊ฐ€ ๊ฒฐ๊ณผ ํ™•์ธ

  1. LinearRegression, RandomForestRegressor, XGBoost ํšŒ๊ท€ ๋ชจ๋ธ RMSE ๋น„๊ต ํ›„ ์ œ์ผ ์ข‹์€ ๋ชจ๋ธ ์ œ์ถœ

    1741094368059

  2. ํšŒ๊ท€๋ชจ๋ธ+์ƒˆ ๋ณ€์ˆ˜ ์ถ”๊ฐ€ํ•˜๊ณ  ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐํŠœ๋‹ (VIF์ค„์ด๋ฉด ์˜คํžˆ๋ ค ๋ชจ๋ธ ์ •ํ™•๋„๊ฐ€ ๋–จ์–ด์ ธ์„œ ๋”ฐ๋กœ ์žก์ง€ ์•Š์Œ)

    1741094462881

๐Ÿ’ป ์‹คํ–‰ ๋ฐฉ๋ฒ•

1๏ธโƒฃ ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜

pip install pandas numpy scikit-learn xgboost seaborn matplotlib

2๏ธโƒฃ Jupyter Notebook ์‹คํ–‰

jupyter notebook
  • house_price.ipynb ์‹คํ–‰ โ†’ Baseline ๋ชจ๋ธ ํ‰๊ฐ€
  • house_price_add.ipynb ์‹คํ–‰ โ†’ Feature Engineering + XGBoost ํŠœ๋‹ ์ ์šฉ
  • ์ตœ์ ํ™”๋œ ๋ชจ๋ธ ํ•™์Šต ํ›„ Kaggle์— ์ œ์ถœ

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published