
# üè† Ames Housing Price Prediction using Machine Learning

This project uses the **Ames Housing Dataset**, a well-known real estate dataset from Ames, Iowa (USA), 
to predict **house sale prices** based on various property features such as size, quality, and location.

The main goal is to build and evaluate a **machine learning regression model** that accurately predicts housing prices.



# üßæ Dataset Description: AMES_Final_DF.csv

**Dataset Name:** `AMES_Final_DF.csv`  
**Source:** Derived from the Ames Housing Dataset (Iowa State University)

### üìä Description
This dataset contains detailed information about residential homes in Ames, Iowa.  
It includes various **physical, locational, and quality attributes** that can be used to predict the **final sale price** of each property.

### üìÅ Key Columns
- **SalePrice:** Target variable representing the sale price of the house.
- **OverallQual:** Overall material and finish quality.
- **GrLivArea:** Above ground living area (in square feet).
- **GarageCars:** Size of garage in car capacity.
- **TotalBsmtSF:** Total square feet of basement area.
- **YearBuilt:** Original construction year.
- **Neighborhood:** Physical locations within Ames city boundaries.

### üéØ Objective
Use this dataset to **train a regression model** capable of predicting the sale price of houses based on their characteristics.

### ‚öôÔ∏è File Information
- File: `AMES_Final_DF.csv`
- Rows and columns count will be displayed after loading the data.


In [52]:
# Why: Import necessary Python libraries for data manipulation, visualization, and modeling.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.linear_model import Ridge

from sklearn.metrics import mean_squared_error

from sklearn.linear_model import ElasticNet

from sklearn.model_selection import GridSearchCV


In [53]:
# Why: Load the Ames housing dataset into a DataFrame to begin analysis and modeling.
df = pd.read_csv("AMES_Final_DF.csv")
df.head()

Unnamed: 0,Lot Frontage,Lot Area,Overall Qual,Overall Cond,Year Built,Year Remod/Add,Mas Vnr Area,BsmtFin SF 1,BsmtFin SF 2,Bsmt Unf SF,...,Sale Type_ConLw,Sale Type_New,Sale Type_Oth,Sale Type_VWD,Sale Type_WD,Sale Condition_AdjLand,Sale Condition_Alloca,Sale Condition_Family,Sale Condition_Normal,Sale Condition_Partial
0,141.0,31770,6,5,1960,1960,112.0,639.0,0.0,441.0,...,0,0,0,0,1,0,0,0,1,0
1,80.0,11622,5,6,1961,1961,0.0,468.0,144.0,270.0,...,0,0,0,0,1,0,0,0,1,0
2,81.0,14267,6,6,1958,1958,108.0,923.0,0.0,406.0,...,0,0,0,0,1,0,0,0,1,0
3,93.0,11160,7,5,1968,1968,0.0,1065.0,0.0,1045.0,...,0,0,0,0,1,0,0,0,1,0
4,74.0,13830,5,5,1997,1998,0.0,791.0,0.0,137.0,...,0,0,0,0,1,0,0,0,1,0


In [54]:
# Why: Split dataset into training and test sets to evaluate model performance on unseen data.
X=df.drop('SalePrice',axis=1)
y=df['SalePrice']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=101)


In [55]:
scaler=StandardScaler()

In [56]:
# Why: Train the machine learning model on training data.
scaled_X_train=scaler.fit_transform(X_train)

scaled_X_test=scaler.transform(X_test)

In [57]:
base_elastic_model=ElasticNet()

In [None]:
# Why: Perform hyperparameter tuning using GridSearchCV to find the optimal combination 
# of alpha (regularization strength) and l1_ratio (balance between L1 and L2 regularization) 
# for the ElasticNet model. This helps improve model accuracy and prevent overfitting 
# by systematically testing different parameter values through cross-validation.

param_grid={'alpha':[0.1,1,5,10,50,100],'l1_ratio':[.1, .5, .7, .9, .95, .99, 1]}


grid_model=GridSearchCV(estimator=base_elastic_model,param_grid=param_grid,
                        scoring='neg_mean_squared_error',cv=5,verbose=1)



In [59]:
# Why: Train the machine learning model on training data.
grid_model.fit(scaled_X_train,y_train)

Fitting 5 folds for each of 42 candidates, totalling 210 fits


  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = cd_fast.enet_coordinate_descent(
  model = c

In [60]:
grid_model.best_params_

{'alpha': 100, 'l1_ratio': 1}

In [61]:
# Why: Generate predictions using the trained model on test data.
y_pred=grid_model.predict(scaled_X_test)

rmse=np.sqrt(mean_squared_error(y_test,y_pred))
rmse

20558.508566893157

In [None]:
# Why: Evaluate how accurate the model's predictions are compared to the actual house prices.
# mean_squared_error(y_test, y_pred): Calculates the Mean Squared Error (MSE) ‚Äî 
#    the average of the squared differences between predicted and actual prices. 
#    A lower MSE means the model is performing better.



mean_squared_error(y_test, y_pred)


422652274.4950194

In [None]:
# Why: np.mean(df['SalePrice']): Computes the average (mean) sale price of all homes in the dataset.
np.mean(df['SalePrice'])

# the average SalePrice ‚âà $180,000 

180815.53743589742