### Importing Required Libraries
- pandas for data manipulation
- numpy for numerical operations
- matplotlib for basic plotting
- seaborn for advanced visualization
- scikit-learn for machine learning


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import QuantileTransformer

### Loading Training Dataset
Loading the training data for model development


In [2]:
train_df = pd.read_csv("/kaggle/input/datathon2025/train.csv")
train_df

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,...,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,...,0,,,,0,12,2008,WD,Normal,250000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1455,1456,60,RL,62.0,7917,Pave,,Reg,Lvl,AllPub,...,0,,,,0,8,2007,WD,Normal,175000
1456,1457,20,RL,85.0,13175,Pave,,Reg,Lvl,AllPub,...,0,,MnPrv,,0,2,2010,WD,Normal,210000
1457,1458,70,RL,66.0,9042,Pave,,Reg,Lvl,AllPub,...,0,,GdPrv,Shed,2500,5,2010,WD,Normal,266500
1458,1459,20,RL,68.0,9717,Pave,,Reg,Lvl,AllPub,...,0,,,,0,4,2010,WD,Normal,142125


### Basic Data Exploration
Examining basic statistics and information about the dataset


In [3]:
train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 81 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             1460 non-null   int64  
 1   MSSubClass     1460 non-null   int64  
 2   MSZoning       1460 non-null   object 
 3   LotFrontage    1201 non-null   float64
 4   LotArea        1460 non-null   int64  
 5   Street         1460 non-null   object 
 6   Alley          91 non-null     object 
 7   LotShape       1460 non-null   object 
 8   LandContour    1460 non-null   object 
 9   Utilities      1460 non-null   object 
 10  LotConfig      1460 non-null   object 
 11  LandSlope      1460 non-null   object 
 12  Neighborhood   1460 non-null   object 
 13  Condition1     1460 non-null   object 
 14  Condition2     1460 non-null   object 
 15  BldgType       1460 non-null   object 
 16  HouseStyle     1460 non-null   object 
 17  OverallQual    1460 non-null   int64  
 18  OverallC

### Basic Data Exploration
Examining basic statistics and information about the dataset


In [4]:
train_df.describe()

Unnamed: 0,Id,MSSubClass,LotFrontage,LotArea,OverallQual,OverallCond,YearBuilt,YearRemodAdd,MasVnrArea,BsmtFinSF1,...,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SalePrice
count,1460.0,1460.0,1201.0,1460.0,1460.0,1460.0,1460.0,1460.0,1452.0,1460.0,...,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0
mean,730.5,56.89726,70.049958,10516.828082,6.099315,5.575342,1971.267808,1984.865753,103.685262,443.639726,...,94.244521,46.660274,21.95411,3.409589,15.060959,2.758904,43.489041,6.321918,2007.815753,180921.19589
std,421.610009,42.300571,24.284752,9981.264932,1.382997,1.112799,30.202904,20.645407,181.066207,456.098091,...,125.338794,66.256028,61.119149,29.317331,55.757415,40.177307,496.123024,2.703626,1.328095,79442.502883
min,1.0,20.0,21.0,1300.0,1.0,1.0,1872.0,1950.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,2006.0,34900.0
25%,365.75,20.0,59.0,7553.5,5.0,5.0,1954.0,1967.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,2007.0,129975.0
50%,730.5,50.0,69.0,9478.5,6.0,5.0,1973.0,1994.0,0.0,383.5,...,0.0,25.0,0.0,0.0,0.0,0.0,0.0,6.0,2008.0,163000.0
75%,1095.25,70.0,80.0,11601.5,7.0,6.0,2000.0,2004.0,166.0,712.25,...,168.0,68.0,0.0,0.0,0.0,0.0,0.0,8.0,2009.0,214000.0
max,1460.0,190.0,313.0,215245.0,10.0,9.0,2010.0,2010.0,1600.0,5644.0,...,857.0,547.0,552.0,508.0,480.0,738.0,15500.0,12.0,2010.0,755000.0


### Model Training
Training the machine learning model


In [5]:
for col in train_df.columns:
    if train_df[col].isnull().sum() != 0:
        print(col, train_df[col].isnull().sum(), train_df[col].dtype)

LotFrontage 259 float64
Alley 1369 object
MasVnrType 872 object
MasVnrArea 8 float64
BsmtQual 37 object
BsmtCond 37 object
BsmtExposure 38 object
BsmtFinType1 37 object
BsmtFinType2 38 object
Electrical 1 object
FireplaceQu 690 object
GarageType 81 object
GarageYrBlt 81 float64
GarageFinish 81 object
GarageQual 81 object
GarageCond 81 object
PoolQC 1453 object
Fence 1179 object
MiscFeature 1406 object


### Basic Data Exploration
Examining basic statistics and information about the dataset


In [6]:
train_df.drop(columns=["Alley", "MasVnrType", "FireplaceQu", "PoolQC", "MiscFeature", "Fence"], inplace=True)
train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 75 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             1460 non-null   int64  
 1   MSSubClass     1460 non-null   int64  
 2   MSZoning       1460 non-null   object 
 3   LotFrontage    1201 non-null   float64
 4   LotArea        1460 non-null   int64  
 5   Street         1460 non-null   object 
 6   LotShape       1460 non-null   object 
 7   LandContour    1460 non-null   object 
 8   Utilities      1460 non-null   object 
 9   LotConfig      1460 non-null   object 
 10  LandSlope      1460 non-null   object 
 11  Neighborhood   1460 non-null   object 
 12  Condition1     1460 non-null   object 
 13  Condition2     1460 non-null   object 
 14  BldgType       1460 non-null   object 
 15  HouseStyle     1460 non-null   object 
 16  OverallQual    1460 non-null   int64  
 17  OverallCond    1460 non-null   int64  
 18  YearBuil

### Model Training
Training the machine learning model


In [7]:
train_df.drop(columns=["Id"], inplace=True)

### Model Training
Training the machine learning model


In [8]:
columns_to_norm = []
for col in train_df.columns:
    if train_df[col].dtype != "object":
        columns_to_norm.append(col)
columns_to_norm

['MSSubClass',
 'LotFrontage',
 'LotArea',
 'OverallQual',
 'OverallCond',
 'YearBuilt',
 'YearRemodAdd',
 'MasVnrArea',
 'BsmtFinSF1',
 'BsmtFinSF2',
 'BsmtUnfSF',
 'TotalBsmtSF',
 '1stFlrSF',
 '2ndFlrSF',
 'LowQualFinSF',
 'GrLivArea',
 'BsmtFullBath',
 'BsmtHalfBath',
 'FullBath',
 'HalfBath',
 'BedroomAbvGr',
 'KitchenAbvGr',
 'TotRmsAbvGrd',
 'Fireplaces',
 'GarageYrBlt',
 'GarageCars',
 'GarageArea',
 'WoodDeckSF',
 'OpenPorchSF',
 'EnclosedPorch',
 '3SsnPorch',
 'ScreenPorch',
 'PoolArea',
 'MiscVal',
 'MoSold',
 'YrSold',
 'SalePrice']

### Model Training
Training the machine learning model


In [11]:
scaler = QuantileTransformer(output_distribution = "normal")
train_df[columns_to_norm] = scaler.fit_transform(train_df[columns_to_norm])

### Model Training
Training the machine learning model


In [12]:
for col in columns_to_norm:
    print(col, train_df[col].skew())

MSSubClass -0.25867094906524224
LotFrontage -1.1739411031821216
LotArea 0.00021211443840980519
OverallQual 0.990487386661579
OverallCond 1.482322069996893
YearBuilt -0.026424849921664807
YearRemodAdd -1.4796733518801706
MasVnrArea 0.428125188963714
BsmtFinSF1 -0.6195740752177061
BsmtFinSF2 2.4552313928430567
BsmtUnfSF -1.75730656510441
TotalBsmtSF -1.4527520279195127
1stFlrSF -0.0020245500107543115
2ndFlrSF 0.32726534385281447
LowQualFinSF 7.3781246921908545
GrLivArea 0.0014099806402547614
BsmtFullBath 0.3695859634063412
BsmtHalfBath 3.903720040047887
FullBath 1.5689781541202856
HalfBath 0.5922736602676492
BedroomAbvGr -0.4444463505369875
KitchenAbvGr 3.728210112498181
TotRmsAbvGrd 0.05251536798201041
Fireplaces -0.03950766063903659
GarageYrBlt 0.12516672113872288
GarageCars -1.9731943711759312
GarageArea -1.7883329419850655
WoodDeckSF 0.14763316974622923
OpenPorchSF -0.1214380547294336
EnclosedPorch 2.076965475893514
3SsnPorch 7.698203176316242
ScreenPorch 3.14548030588986
PoolArea 14

### Model Training
Training the machine learning model


In [13]:
for col in train_df.columns:
    if train_df[col].dtype == "object":
        print(col, train_df[col].unique())

MSZoning ['RL' 'RM' 'C (all)' 'FV' 'RH']
Street ['Pave' 'Grvl']
LotShape ['Reg' 'IR1' 'IR2' 'IR3']
LandContour ['Lvl' 'Bnk' 'Low' 'HLS']
Utilities ['AllPub' 'NoSeWa']
LotConfig ['Inside' 'FR2' 'Corner' 'CulDSac' 'FR3']
LandSlope ['Gtl' 'Mod' 'Sev']
Neighborhood ['CollgCr' 'Veenker' 'Crawfor' 'NoRidge' 'Mitchel' 'Somerst' 'NWAmes'
 'OldTown' 'BrkSide' 'Sawyer' 'NridgHt' 'NAmes' 'SawyerW' 'IDOTRR'
 'MeadowV' 'Edwards' 'Timber' 'Gilbert' 'StoneBr' 'ClearCr' 'NPkVill'
 'Blmngtn' 'BrDale' 'SWISU' 'Blueste']
Condition1 ['Norm' 'Feedr' 'PosN' 'Artery' 'RRAe' 'RRNn' 'RRAn' 'PosA' 'RRNe']
Condition2 ['Norm' 'Artery' 'RRNn' 'Feedr' 'PosN' 'PosA' 'RRAn' 'RRAe']
BldgType ['1Fam' '2fmCon' 'Duplex' 'TwnhsE' 'Twnhs']
HouseStyle ['2Story' '1Story' '1.5Fin' '1.5Unf' 'SFoyer' 'SLvl' '2.5Unf' '2.5Fin']
RoofStyle ['Gable' 'Hip' 'Gambrel' 'Mansard' 'Flat' 'Shed']
RoofMatl ['CompShg' 'WdShngl' 'Metal' 'WdShake' 'Membran' 'Tar&Grv' 'Roll'
 'ClyTile']
Exterior1st ['VinylSd' 'MetalSd' 'Wd Sdng' 'HdBoard' 'BrkF

### Model Training
Training the machine learning model


In [14]:
example_dict = {k:v for k,v in zip(train_df["MSZoning"].sort_values().unique(), [i for i in range(train_df["MSZoning"].nunique())])}
example_dict

{'C (all)': 0, 'FV': 1, 'RH': 2, 'RL': 3, 'RM': 4}

### Model Training
Training the machine learning model


In [15]:
for col in train_df.columns:
    if train_df[col].dtype == "object":
        train_df[col] = train_df[col].map({k:v for k,v in zip(train_df[col].sort_values().unique(), [i for i in range(train_df[col].nunique())])})

### Model Training
Training the machine learning model


In [16]:
train_df

Unnamed: 0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,LandSlope,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,0.321611,3,-0.163824,-0.387078,1,3,3,0,4,0,...,-5.199338,-5.199338,-5.199338,-5.199338,-5.199338,-1.579968,0.111890,8,4,0.624748
1,-5.199338,3,0.627196,0.062770,1,3,3,0,2,0,...,-5.199338,-5.199338,-5.199338,-5.199338,-5.199338,-0.483658,-0.445919,8,4,0.311577
2,0.321611,3,-0.041413,0.583496,1,0,3,0,4,0,...,-5.199338,-5.199338,-5.199338,-5.199338,-5.199338,0.928105,0.111890,8,4,0.764710
3,0.672129,3,-0.487893,0.024235,1,0,3,0,0,0,...,2.433949,-5.199338,-5.199338,-5.199338,-5.199338,-1.579968,-5.199338,8,0,-0.408805
4,0.321611,3,0.795322,1.265328,1,0,3,0,2,0,...,-5.199338,-5.199338,-5.199338,-5.199338,-5.199338,5.199338,0.111890,8,4,1.031509
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1455,0.321611,3,-0.299230,-0.567211,1,3,3,0,4,0,...,-5.199338,-5.199338,-5.199338,-5.199338,-5.199338,0.705530,-0.445919,8,4,0.175278
1456,-5.199338,3,0.868016,1.040242,1,3,3,0,4,0,...,-5.199338,-5.199338,-5.199338,-5.199338,-5.199338,-1.579968,5.199338,8,4,0.637923
1457,0.672129,3,-0.103057,-0.147316,1,3,3,0,4,0,...,-5.199338,-5.199338,-5.199338,-5.199338,2.856945,-0.483658,5.199338,8,4,1.175153
1458,-5.199338,3,-0.041413,0.096000,1,3,3,0,4,0,...,1.283836,-5.199338,-5.199338,-5.199338,-5.199338,-0.857097,5.199338,8,4,-0.362719


### Basic Data Exploration
Examining basic statistics and information about the dataset


In [17]:
train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 74 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   MSSubClass     1460 non-null   float64
 1   MSZoning       1460 non-null   int64  
 2   LotFrontage    1201 non-null   float64
 3   LotArea        1460 non-null   float64
 4   Street         1460 non-null   int64  
 5   LotShape       1460 non-null   int64  
 6   LandContour    1460 non-null   int64  
 7   Utilities      1460 non-null   int64  
 8   LotConfig      1460 non-null   int64  
 9   LandSlope      1460 non-null   int64  
 10  Neighborhood   1460 non-null   int64  
 11  Condition1     1460 non-null   int64  
 12  Condition2     1460 non-null   int64  
 13  BldgType       1460 non-null   int64  
 14  HouseStyle     1460 non-null   int64  
 15  OverallQual    1460 non-null   float64
 16  OverallCond    1460 non-null   float64
 17  YearBuilt      1460 non-null   float64
 18  YearRemo

### Code Execution
Executing code for data analysis or model development


### Correlation Analysis
Analyzing relationships between variables


In [18]:
corr = train_df.corr()
corr.style.background_gradient(cmap='coolwarm')

  xa[xa < 0] = -1


Unnamed: 0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,HouseStyle,OverallQual,OverallCond,YearBuilt,YearRemodAdd,RoofStyle,RoofMatl,Exterior1st,Exterior2nd,MasVnrArea,ExterQual,ExterCond,Foundation,BsmtQual,BsmtCond,BsmtExposure,BsmtFinType1,BsmtFinSF1,BsmtFinType2,BsmtFinSF2,BsmtUnfSF,TotalBsmtSF,Heating,HeatingQC,CentralAir,Electrical,1stFlrSF,2ndFlrSF,LowQualFinSF,GrLivArea,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,KitchenQual,TotRmsAbvGrd,Functional,Fireplaces,GarageType,GarageYrBlt,GarageFinish,GarageCars,GarageArea,GarageQual,GarageCond,PavedDrive,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
MSSubClass,1.0,0.08412,-0.306217,-0.243761,-0.01989,0.087111,-0.06114,-0.033094,0.036507,-0.007002,-0.051409,-0.031085,-0.030382,0.423117,0.434397,0.020299,0.005521,-0.161557,-0.162409,-0.172349,-0.030616,0.004814,-0.029977,-0.060289,0.003215,-0.044264,-0.03576,-0.009981,-0.057879,-0.008846,0.146254,-0.161746,0.075111,-0.108567,-0.090337,-0.305368,0.09147,0.008637,-0.175639,-0.02807,-0.362868,0.608781,0.102923,0.207134,-0.105325,-0.008696,0.125798,0.303015,0.108144,0.234057,0.001784,0.188978,0.000427,0.020268,0.196577,-0.070624,0.011235,-0.09156,-0.132882,-0.0978,-0.071053,-0.155834,0.011415,0.01358,0.11036,-0.0418,-0.016758,0.028934,-0.00586,0.007799,-0.035742,0.05564,-0.039946,-0.064742
MSZoning,0.08412,1.0,-0.1868,-0.097602,0.087654,0.061887,-0.017854,-0.001192,-0.009895,-0.022055,-0.249679,-0.027874,0.044606,0.00569,-0.105315,-0.138446,0.171435,-0.323688,-0.180693,-0.000301,0.005133,-0.008558,0.006963,-0.048967,0.200536,-0.096041,-0.235174,0.128144,-0.011191,0.036216,0.02262,-0.052281,-0.032548,0.03142,-0.052969,-0.085421,0.056866,0.134279,-0.049523,-0.070811,-0.058332,-0.024291,-0.003592,-0.103621,-0.020102,0.007496,-0.130719,-0.135545,-0.015555,0.051411,0.128976,-0.050286,-0.095722,-0.019675,0.133178,-0.284021,0.164604,-0.12103,-0.16322,-0.160666,-0.089703,-0.100366,-0.006268,-0.178319,0.132607,0.002288,0.012707,-0.003145,-0.00137,-0.019287,-0.024208,0.097437,0.009494,-0.200019
LotFrontage,-0.306217,-0.1868,1.0,0.68224,-0.037607,-0.170342,-0.04348,,-0.193646,0.062575,0.157571,-0.001896,-0.001774,-0.520703,-0.013445,0.274011,-0.046687,0.14007,0.13319,0.184382,0.081554,0.124647,0.148759,0.138088,-0.203255,0.051978,0.124957,-0.175805,0.060495,-0.106952,-0.00902,0.084812,-0.024381,0.034443,0.144783,0.325544,-0.027449,-0.117009,0.059304,0.049154,0.485533,-0.034368,-0.003128,0.383685,0.076937,-0.002478,0.212374,0.02601,0.290813,0.017541,-0.192376,0.365345,0.017315,0.25418,-0.29516,0.104521,-0.243153,0.280981,0.319136,0.052027,0.045312,0.0935,0.06716,0.166075,-0.036032,0.062284,0.05014,0.120239,0.025817,-0.000135,0.020207,-0.032042,0.082498,0.402344
LotArea,-0.243761,-0.097602,0.68224,1.0,-0.075504,-0.304357,-0.066653,0.033103,-0.197694,0.198814,0.110944,0.076311,0.029128,-0.503167,0.014072,0.245488,-0.013422,0.072815,0.106532,0.153519,0.167421,0.074593,0.083225,0.121703,-0.145231,0.018291,0.047551,-0.141905,0.003565,-0.160067,-0.036188,0.112863,-0.057332,0.08186,0.078222,0.319772,-0.015818,-0.058341,0.089537,0.057698,0.483172,0.043286,-0.010427,0.444938,0.110852,0.046987,0.218107,0.100706,0.285,-0.020469,-0.14936,0.40363,-0.016783,0.331359,-0.253743,0.028965,-0.170801,0.279717,0.319201,0.042155,0.037024,0.051617,0.141255,0.154558,-0.046243,0.062581,0.093503,0.097203,0.064938,-0.022853,-0.025859,-0.014159,0.060001,0.451068
Street,-0.01989,0.087654,-0.037607,-0.075504,1.0,-0.010224,0.115995,0.001682,0.01396,-0.17936,-0.011561,-0.071657,0.002039,-0.018243,0.023704,0.053601,0.046004,0.025104,0.034054,-0.019732,0.008081,0.002505,0.006166,0.011496,0.049976,0.005874,0.035277,-0.0304,-0.018543,0.071793,-0.015487,-0.003669,0.060838,-0.044454,0.05788,0.002975,0.007904,-0.053995,0.069869,0.021311,0.005685,0.036747,0.008624,0.052378,-0.038434,0.015629,0.03402,0.027711,0.028031,0.012598,-0.025307,0.048073,-0.016444,0.01756,-0.004083,0.029798,-0.001227,0.002851,-0.013031,-0.013404,-0.011558,0.024521,0.01343,0.006221,0.026104,0.008279,-0.022829,0.004437,-0.156658,0.000827,-0.043262,0.014339,0.006064,0.056402
LotShape,0.087111,0.061887,-0.170342,-0.304357,-0.010224,1.0,0.085434,-0.036101,0.221102,-0.099951,-0.038894,-0.115003,-0.043768,0.116262,-0.104026,-0.167944,0.004784,-0.199184,-0.16545,0.003182,-0.071174,-0.020463,-0.027951,-0.108164,0.148818,-0.029497,-0.135124,0.153535,-0.015909,0.135248,0.077698,-0.109422,0.023044,-0.033456,-0.040069,-0.177005,0.075894,0.096248,-0.115256,-0.097195,-0.177789,-0.008526,0.038822,-0.17644,-0.079893,-0.026435,-0.130005,-0.113678,-0.052018,0.094845,0.122182,-0.121201,-0.029321,-0.205147,0.18774,-0.166385,0.225746,-0.146368,-0.151097,-0.099481,-0.073016,-0.113698,-0.165381,-0.13413,0.104488,-0.035563,-0.046665,-0.020891,-0.025179,-0.040915,0.036651,-0.000911,-0.038118,-0.287498
LandContour,-0.06114,-0.017854,-0.04348,-0.066653,0.115995,0.085434,1.0,0.008238,-0.025527,-0.374267,0.019116,0.024801,-0.016185,0.051143,0.075234,0.020161,-0.021922,0.13396,0.129157,-0.004246,-0.020229,-0.011809,-0.034082,0.100874,-0.003613,0.009804,0.053478,-0.003584,0.023412,0.045079,-0.076111,0.009089,-0.031329,0.032121,0.039475,0.028398,0.015746,-0.066276,0.10541,0.08259,-0.024388,-0.042173,-0.068374,-0.040034,0.021588,0.018497,0.01896,0.032016,-0.034285,-0.036669,0.030813,-0.043143,0.036113,-0.037813,-0.094565,0.10386,-0.074908,0.061385,0.063913,0.013051,0.007847,0.140921,0.021015,0.068853,-0.076115,-0.039971,0.000398,-0.017423,0.014993,-0.001275,0.004352,-0.025754,0.033809,0.033328
Utilities,-0.033094,-0.001192,,0.033103,0.001682,-0.036101,0.008238,1.0,-0.032589,-0.005909,0.046809,-0.00095,-0.000831,-0.010778,0.054283,-0.001018,0.01199,-0.013142,-0.007166,-0.012868,-0.003293,-0.029686,-0.0321,0.03886,0.017369,0.009535,-0.014377,0.022579,0.007557,0.016995,-0.010639,0.009015,-0.02007,0.071153,-0.000517,-0.010532,-0.003221,0.00695,0.006907,-0.091725,0.015197,-0.022655,-0.003515,-0.006159,-0.021938,0.105769,-0.019106,-0.020095,0.004394,-0.005134,-0.010717,0.010544,0.006702,0.022695,-0.004214,-0.021458,-0.006082,0.007486,0.008043,0.005986,0.005161,0.007586,-0.024848,0.027202,-0.010639,-0.003374,0.092624,-0.001808,-0.005019,-0.08185,0.009802,-0.12677,-0.089701,-0.012373
LotConfig,0.036507,-0.009895,-0.193646,-0.197694,0.01396,0.221102,-0.025527,-0.032589,1.0,-0.007256,-0.036597,0.021457,0.033868,0.107229,-0.032945,-0.03636,-0.024019,0.018278,-0.026206,-0.010364,-0.068465,0.023316,0.005546,0.004114,-0.002503,0.034898,-0.011755,0.020952,0.033736,0.014787,0.017112,0.017798,-0.003044,0.002395,0.003633,-0.038937,-2.4e-05,-0.010217,-0.003729,-0.025483,-0.062898,-0.028445,-0.006462,-0.077668,-0.016782,-0.009142,-0.021893,-0.027581,-0.058754,0.001059,-0.010437,-0.047787,-0.021119,-0.048627,0.017912,0.013714,0.017835,-0.056051,-0.06395,0.02196,0.030535,-0.045058,-0.006693,-0.033923,-0.051909,-0.018509,0.004519,-0.050041,-0.022665,0.021278,-0.001418,0.014325,0.051579,-0.074686
LandSlope,-0.007002,-0.022055,0.062575,0.198814,-0.17936,-0.099951,-0.374267,-0.005909,-0.007256,1.0,-0.080405,-0.016762,-0.026322,-0.053582,-0.031793,-0.076873,-0.003189,-0.078264,-0.036159,-0.027739,0.178678,-0.04566,-0.032878,-0.05178,0.02104,0.000834,-0.027782,0.013013,0.0034,-0.164784,-0.06071,0.100685,-0.064024,0.077008,-0.112408,0.033576,0.005856,0.035674,-0.010849,-0.014302,0.065585,-0.009252,0.006706,0.034613,0.096449,0.072577,-0.048526,-0.001011,-0.053562,-0.033764,-0.002735,-0.047494,-0.106851,0.076739,0.000398,-0.079202,0.024092,-0.005383,0.000556,0.015275,-0.017684,-0.024538,0.050716,-0.028968,0.01154,0.026513,0.061731,-0.015589,0.047583,0.016679,0.00508,0.054858,-0.043095,0.039596


### Model Training
Training the machine learning model


In [19]:
for col in train_df.columns:
    print(col, train_df[col].skew())

MSSubClass -0.25867094906524224
MSZoning -1.7353953794159185
LotFrontage -1.1739411031821216
LotArea 0.00021211443840980519
Street -15.518769523446206
LotShape -0.6101746987339193
LandContour -3.1624994215215656
Utilities 38.209946349085605
LotConfig -1.1356318684354356
LandSlope 4.813682424489448
Neighborhood 0.04212153010649173
Condition1 3.019195845658258
Condition2 13.17184395130842
BldgType 2.245648012243153
HouseStyle 0.30675459298372293
OverallQual 0.990487386661579
OverallCond 1.482322069996893
YearBuilt -0.026424849921664807
YearRemodAdd -1.4796733518801706
RoofStyle 1.4737963742380584
RoofMatl 8.109402021334287
Exterior1st -0.7263135664402055
Exterior2nd -0.6929626745550396
MasVnrArea 0.428125188963714
ExterQual -1.8302652187854904
ExterCond -2.5653047441855623
Foundation 0.09121737920959147
BsmtQual -1.3139874551044863
BsmtCond -3.4019256531985613
BsmtExposure -1.149105813699179
BsmtFinType1 -0.015451330578594902
BsmtFinSF1 -0.6195740752177061
BsmtFinType2 -3.571682824671813

### Basic Data Exploration
Examining basic statistics and information about the dataset


In [20]:
train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 74 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   MSSubClass     1460 non-null   float64
 1   MSZoning       1460 non-null   int64  
 2   LotFrontage    1201 non-null   float64
 3   LotArea        1460 non-null   float64
 4   Street         1460 non-null   int64  
 5   LotShape       1460 non-null   int64  
 6   LandContour    1460 non-null   int64  
 7   Utilities      1460 non-null   int64  
 8   LotConfig      1460 non-null   int64  
 9   LandSlope      1460 non-null   int64  
 10  Neighborhood   1460 non-null   int64  
 11  Condition1     1460 non-null   int64  
 12  Condition2     1460 non-null   int64  
 13  BldgType       1460 non-null   int64  
 14  HouseStyle     1460 non-null   int64  
 15  OverallQual    1460 non-null   float64
 16  OverallCond    1460 non-null   float64
 17  YearBuilt      1460 non-null   float64
 18  YearRemo

### Data Preprocessing
Preparing the data for analysis


In [21]:
train_df.to_csv("preprocessed_train.csv")

### Code Execution
Executing code for data analysis or model development
