## 波士頓房價預測

請撰寫一程式，讀取 `boston.csv` 檔案 (波士頓房價資料集)，此資料集有 506 筆資料，每筆資料有 14 個欄位，欲建立一個多元線性迴歸模型以預測房價。

## 欄位說明
| 欄位名稱 | 說明 |
| -------- | ---- |
| CRIM | 人均犯罪率 |
| ZN | 25,000平方英尺以上民用土地的比例 |
| INDUS | 城鎮非零售業商用土地比例 |
| CHAS | 是否鄰近查爾斯河，1是鄰近，0是不鄰近 |
| NOX | 一氧化氮濃度（千萬分之一，10 ppm） |
| RM | 住宅的平均房間數 |
| AGE | 自住且建於1940年前的房屋比例 |
| DIS | 到5個波士頓就業中心的加權距離 |
| RAD | 到高速公路的便捷度指數 |
| TAX | 每一萬美元的不動產稅率 |
| PTRATIO | 城鎮學生教師比例 |
| B | 1000*(Bk − 0.63)^2，其中 Bk 是城鎮中黑人比例 |
| LSTAT | 低收入人群比例 |
| MEDV | 房價。自住房中位數價格，單位是千元 |

## 分析程序
1. 請建立一個多元線性迴歸模型，用此資料集中的所有欄位來預測 MEDV 欄位。
2. 請將資料集分為訓練集和測試集，其中測試資料集佔 20%，radom_state=1。
3. 針對測試資料集，輸出此模型的平均絕對誤差（Mean Absolute Error, MAE）、均方誤差（Mean Squared Error, MSE）和均方根誤差（Root Mean Squared Error, RMSE）。
4. 依據輸入值進行房價預測：
    - 浮點數均四捨五入取至小數點後第四位。
    - 輸入資料為：[0.00632, 18.00, 2.310, 0.0, 0.5380, 6.5750, 65.20, 4.0900, 1.0, 296.0, 15.30, 396.90, 4.98]。

In [1]:
# 忽略警告訊息
import warnings
warnings.filterwarnings("ignore")

In [3]:
import numpy as np
import pandas as pd

df = pd.read_csv('boston.csv')
df

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.0900,1,296,15.3,396.90,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.90,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.90,5.33,36.2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
501,0.06263,0.0,11.93,0,0.573,6.593,69.1,2.4786,1,273,21.0,391.99,9.67,22.4
502,0.04527,0.0,11.93,0,0.573,6.120,76.7,2.2875,1,273,21.0,396.90,9.08,20.6
503,0.06076,0.0,11.93,0,0.573,6.976,91.0,2.1675,1,273,21.0,396.90,5.64,23.9
504,0.10959,0.0,11.93,0,0.573,6.794,89.3,2.3889,1,273,21.0,393.45,6.48,22.0


In [4]:
# MEDV 即預測目標向量
X = df.iloc[:, :-1]
y = df['MEDV']

# 顯示前兩筆資料
X.head(2)

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14


In [5]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# 切分訓練集及測試集
X_train, X_test, y_train, y_test = train_test_split(
    X, 
    y, 
    test_size=0.2, 
    random_state=1
)

# 建立線性回歸模型
lm = LinearRegression()
lm.fit(X_train, y_train)

In [6]:
import joblib

# 儲存模型和標準化物件
joblib.dump(lm, 'linear_regression_for_boston.pkl')
print("模型已儲存！")

模型已儲存！


In [7]:
lr_model = joblib.load('linear_regression_for_boston.pkl')
print("模型已載入！")

模型已載入！


In [9]:
from sklearn.metrics import mean_absolute_error, mean_squared_error

y_pred = lr_model.predict(X_test)
mae = mean_absolute_error(y_pred, y_test)
mse = mean_squared_error(y_pred, y_test)
print(f'MAE: {mae:.2f}')
print(f'MSE: {mse:.2f}')
print(f'RMSE: {mse**0.5:.2f}')

MAE: 3.75
MSE: 23.38
RMSE: 4.84


In [12]:
'''
對應欄位:
CRIM, ZN, INDUS, CHAS, NOX, RM, AGE, DIS, RAD, TAX, PTRATIO, B, LSTAT
'''
X_new = [
    [0.00632, 18.00, 2.310, 0, 0.5380, 6.5750, 65.20, 4.0900, 1, 296.0, 15.30, 396.90 , 4.98]
]

# 預測房價
prediction = lm.predict(X_new)

# 顯示預測結果
print(f'{prediction[0]:.2f}')

30.07
