### General Information

| **Label**             | **Description**                                                                                        |
|-----------------------|--------------------------------------------------------------------------------------------------------|
| OutcomeVariable       | The primary metric we aim to predict or analyze for each item in the dataset.                          |
| BuildingCategory      | A numerical code categorizing the broad structural type of the item.                                   |
| ZoningClassification  | The designated usage category for the area where the item is located.                                  |
| StreetLineLength      | The length in linear units of the item's frontage on a street.                                         |
| ParcelSize            | The total area of the land associated with the item, measured in square units.                         |
| RoadAccessType        | The type of road providing access to the item's location.                                              |
| AlleyAccessType       | The type of alley access available to the item's location (if any).                                    |
| ParcelShape           | The general geometric shape of the land parcel.                                                        |
| TerrainFlatness       | A description of the land's levelness or slope.                                                        |
| UtilityAvailability   | The types of utility services available at the item's location.                                        |
| ParcelSettings        | Specific configurations or features of the land parcel.                                                |
| TerrainSlope          | The degree of inclination of the land.                                                                 |
| District              | Geographic sub-regions or zones within a larger area.                                                  |
| RoadProximity1        | Indicates the item's proximity to a major road or transportation route.                                |
| RoadProximity2        | Indicates proximity to a second major road or route, if applicable.                                    |

### Item Characteristics

| **Label**             | **Description**                                                                                        |
|-----------------------|--------------------------------------------------------------------------------------------------------|
| DwellingType          | The specific architectural classification of the structure.                                            |
| DwellingStyle         | The architectural style of the structure.                                                              |
| MaterialQuality       | A rating of the overall quality of materials used in the item's construction.                          |
| ConditionRating       | A general assessment of the item's current state of repair and upkeep.                                 |
| ConstructionYear      | The year in which the item was originally built.                                                       |
| RenovationYear        | The year of a significant remodel or addition to the item.                                             |
| RoofType              | The style or design of the roof.                                                                       |
| RoofMaterial          | The primary material used in the roof's construction.                                                  |
| ExteriorCladding1     | The dominant material covering the exterior of the item.                                               |
| ExteriorCladding2     | A secondary exterior covering material, if present.                                                   |
| MasonryType           | The type of decorative masonry work used on the item's exterior.                                       |
| MasonrySize           | The area covered by masonry work, measured in square units.                                            |
| ExteriorQuality       | The quality of the materials used for the exterior cladding.                                           |
| ExteriorCondition     | The current state of the exterior cladding.                                                            |
| FoundationType        | The type of foundation supporting the item's structure.                                                |

### Interior Features

| **Label**                   | **Description**                                                                                        |
|-----------------------------|--------------------------------------------------------------------------------------------------------|
| BasementHeight              | A classification of the basement's height relative to ground level.                                    |
| BasementCondition           | The general condition of the basement space.                                                           |
| BasementAccess              | Describes whether the basement has walkout access or is at garden level.                               |
| BasementFinish1             | The quality of finish in the main finished area of the basement.                                       |
| BasementFinishedArea1       | The size of the main finished area in the basement, measured in square units.                          |
| BasementFinish2             | The quality of finish in a second finished area of the basement (if applicable).                       |
| BasementFinishedArea2       | The size of a second finished area in the basement, measured in square units.                          |
| BasementUnfinishedArea      | The size of the unfinished portion of the basement, measured in square units.                          |
| TotalBasementArea           | The total area of the basement space, including finished and unfinished areas, measured in square units.|
| HeatingType                 | The system used for heating the item.                                                                  |
| HeatingQuality              | The quality and condition of the heating system.                                                       |
| AirConditioning             | Indicates whether the item has central air conditioning.                                               |
| ElectricalSystem            | The type of electrical system installed.                                                               |
| GroundFloorArea             | The area of the ground floor, measured in square units.                                                |
| UpperFloorArea              | The area of the upper floor(s), measured in square units.                                              |
| LowQualityArea              | Area of the item finished to a lower standard, measured in square units.                               |
| LivingArea                  | The total living space above ground level, measured in square units.                                   |
| BasementFullBathrooms       | The number of full bathrooms in the basement.                                                          |
| BasementHalfBathrooms       | The number of half bathrooms in the basement.                                                          |
| FullBathrooms               | The number of full bathrooms above ground level.                                                       |
| HalfBathrooms               | The number of half bathrooms above ground level.                                                       |
| Bedrooms                    | The number of bedrooms in the item.                                                                    |
| Kitchens                    | The number of kitchens in the item.                                                                    |
| KitchenQuality              | The quality of the kitchen's finishes and fixtures.                                                    |
| TotalRooms                  | The total number of rooms above ground level, excluding bathrooms.                                     |
| FunctionalityRating         | A rating of the item's overall functionality and layout.                                               |
| FireplaceCount              | The number of fireplaces in the item.                                                                  |
| FireplaceQuality            | The quality of the fireplace(s).                                                                       |

### Additional Features

| **Label**               | **Description**                                                                                        |
|-------------------------|--------------------------------------------------------------------------------------------------------|
| GarageLocation          | The location of the garage relative to the item.                                                       |
| GarageConstructionYear  | The year the garage was built.                                                                         |
| GarageInterior          | The interior finish of the garage.                                                                     |
| GarageCapacity          | The number of vehicles the garage can accommodate.                                                     |
| GarageSize              | The area of the garage, measured in square units.                                                      |
| GarageQuality           | The quality of the garage's construction.                                                              |
| GarageCondition         | The current condition of the garage.                                                                   |
| DrivewayType            | Indicates whether the driveway is paved.                                                               |
| WoodDeckArea            | The area of the wood deck, measured in square units.                                                   |
| OpenPorchArea           | The area of the open porch, measured in square units.                                                  |
| EnclosedPorchArea       | The area of the enclosed porch, measured in square units.                                              |
| ThreeSeasonPorchArea    | The area of the three-season porch, measured in square units.                                          |
| ScreenPorchArea         | The area of the screen porch, measured in square units.                                                |
| PoolSize                | The area of the pool, measured in square units.                                                        |
| PoolQuality             | The quality of the pool.                                                                               |
| FenceQuality            | The quality of the fence.                                                                              |
| AdditionalFeature       | A miscellaneous feature not covered in other categories.                                               |
| AdditionalFeatureValue  | The monetary value of the additional feature.                                                          |

### Sale Information

| **Label**      | **Description**                                                                                        |
|----------------|--------------------------------------------------------------------------------------------------------|
| SaleMonth      | The month in which the item was sold.                                                                  |
| SaleYear       | The year in which the item was sold.                                                                   |
| SaleType       | The method or type of sale transaction.                                                                |
| SaleCondition  | The condition of the sale.                                                                             |

- **🏡 Zoning Classifications Explained**
    - **RL (Residential Low Density):** This zone typically allows for single-family homes on larger lots, promoting a spacious and peaceful living environment. [source](https://monica.im/s/C7jUcgHpBW)
    - **RM (Residential Medium Density):** This zone accommodates a mix of housing types, including single-family homes, duplexes, and townhouses, offering a balance between density and affordability. [source](https://monica.im/s/C7jUcgHpBW)
    - **FV (Floodway):** This zone is designated for areas prone to flooding and restricts development to minimize potential damage. [source](https://monica.im/s/C7jUcgHpBW)

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import sklearn

In [2]:
df = pd.read_csv('train.csv')

In [3]:
df.describe()

Unnamed: 0,Id,BuildingCategory,StreetLineLength,ParcelSize,MaterialQuality,ConditionRating,ConstructionYear,RenovationYear,MasonrySize,BasementFinishedArea1,...,WoodDeckArea,OpenPorchArea,EnclosedPorchArea,ThreeSeasonPorchArea,ScreenPorchArea,PoolSize,AdditionalFeatureValue,SaleMonth,SaleYear,OutcomeVariable
count,1460.0,1460.0,1201.0,1460.0,1460.0,1460.0,1460.0,1460.0,1452.0,1460.0,...,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0
mean,730.5,56.89726,70.049958,10516.828082,6.099315,5.575342,1971.267808,1984.865753,103.685262,443.639726,...,94.244521,46.660274,21.95411,3.409589,15.060959,2.758904,43.489041,6.321918,2007.815753,180921.19589
std,421.610009,42.300571,24.284752,9981.264932,1.382997,1.112799,30.202904,20.645407,181.066207,456.098091,...,125.338794,66.256028,61.119149,29.317331,55.757415,40.177307,496.123024,2.703626,1.328095,79442.502883
min,1.0,20.0,21.0,1300.0,1.0,1.0,1872.0,1950.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,2006.0,34900.0
25%,365.75,20.0,59.0,7553.5,5.0,5.0,1954.0,1967.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,2007.0,129975.0
50%,730.5,50.0,69.0,9478.5,6.0,5.0,1973.0,1994.0,0.0,383.5,...,0.0,25.0,0.0,0.0,0.0,0.0,0.0,6.0,2008.0,163000.0
75%,1095.25,70.0,80.0,11601.5,7.0,6.0,2000.0,2004.0,166.0,712.25,...,168.0,68.0,0.0,0.0,0.0,0.0,0.0,8.0,2009.0,214000.0
max,1460.0,190.0,313.0,215245.0,10.0,9.0,2010.0,2010.0,1600.0,5644.0,...,857.0,547.0,552.0,508.0,480.0,738.0,15500.0,12.0,2010.0,755000.0


In [4]:
df.dtypes

Id                        int64
BuildingCategory          int64
ZoningClassification     object
StreetLineLength        float64
ParcelSize                int64
                         ...   
SaleMonth                 int64
SaleYear                  int64
SaleType                 object
SaleCondition            object
OutcomeVariable           int64
Length: 81, dtype: object

In [5]:
nullSum = df.isnull().sum()
nullSum[nullSum!=0]

StreetLineLength           259
AlleyAccessType           1369
MasonryType                872
MasonrySize                  8
BasementHeight              37
BasementCondition           37
BasementAccess              38
BasementFinish1             37
BasementFinish2             38
ElectricalSystem             1
FireplaceQuality           690
GarageLocation              81
GarageConstructionYear      81
GarageInterior              81
GarageQuality               81
GarageCondition             81
PoolQuality               1453
FenceQuality              1179
AdditionalFeature         1406
dtype: int64

In [6]:
# df['StreetLineLength'].fillna(format(df['StreetLineLength'].mean(),'.1f'),inplace=True)

In [78]:
df1 = df.copy()
df2 = df.copy()

In [28]:
df1['StreetLineLength'].fillna(round(df1['StreetLineLength'].mean(),1),inplace=True)

In [29]:
df1['ElectricalSystem'].fillna(df1['ElectricalSystem'].mode()[0],inplace=True)

In [30]:
df1['AdditionalFeature'].fillna('NO AF',inplace=True)

In [31]:
df1['AlleyAccessType'].fillna('UNKNOWN',inplace=True)
df1['FireplaceQuality'].fillna('UNKNOWN',inplace=True)
df1['PoolQuality'].fillna('UNKNOWN',inplace=True)
df1['FenceQuality'].fillna('UNKNOWN',inplace=True)

In [32]:
df1['MasonryType'].fillna(df1['MasonryType'].mode()[0],inplace=True)
df1['MasonrySize']=df1['MasonrySize'].replace(0,round(df1['MasonrySize'].mean(),1))

In [33]:
df1['BasementHeight'].fillna(df1['BasementHeight'].mode()[0],inplace=True)
df1['BasementCondition'].fillna(df1['BasementCondition'].mode()[0],inplace=True)
df1['BasementAccess'].fillna(df1['BasementAccess'].mode()[0],inplace=True)
df1['BasementFinish1'].fillna(df1['BasementFinish1'].mode()[0],inplace=True)
df1['BasementFinish2'].fillna(df1['BasementFinish2'].mode()[0],inplace=True)

In [34]:
df1['TotalBasementArea']=df1['TotalBasementArea'].replace(0,round(df1['TotalBasementArea'].mean(),1))


In [35]:
df1['GarageLocation'].fillna(df1['BasementAccess'].mode()[0],inplace=True)
df1['GarageConstructionYear'].fillna(df1['BasementFinish1'].mode()[0],inplace=True)
df1['GarageInterior'].fillna(df1['BasementFinish2'].mode()[0],inplace=True)
df1['GarageQuality'].fillna(df1['GarageQuality'].mode()[0],inplace=True)
df1['GarageCondition'].fillna(df1['GarageCondition'].mode()[0],inplace=True)

In [36]:
df1['GarageCapacity']=df1['GarageCapacity'].replace(0,round(df1['GarageCapacity'].mean(),1)).astype(int)
df1['GarageSize']=df1['GarageSize'].replace(0,round(df1['GarageSize'].mean(),1)).astype(int)

In [37]:
df1.to_csv('randomly.csv')

In [79]:
median_street_length_by_district = df2.groupby('District')['StreetLineLength'].median()

def fill_street_length(row):
    if pd.isnull(row['StreetLineLength']):
        return median_street_length_by_district.get(row['District'])
    return row['StreetLineLength']

df2['StreetLineLength'] = df2.apply(fill_street_length, axis=1)

In [80]:
df2['AlleyAccessType'].fillna('NoAlleyAccess',inplace=True)

In [81]:
df2['MasonryType'].fillna('None',inplace=True)

In [82]:
temp = df2.groupby('MasonryType')['MasonrySize'].value_counts()
print(temp)

MasonryType  MasonrySize
BrkCmn       41.0           1
             66.0           1
             67.0           1
             70.0           1
             113.0          1
                           ..
Stone        760.0          1
             762.0          1
             788.0          1
             796.0          1
             860.0          1
Name: MasonrySize, Length: 401, dtype: int64


In [83]:
check1 = df2.loc[df2['MasonrySize'].isnull(), 'MasonryType']

In [84]:
df2['MasonrySize'].fillna(0,inplace=True)

In [85]:
def fillMode(column):
    df2[column].fillna(df2[column].mode()[0],inplace=True)

In [86]:
fillMode('BasementHeight')
fillMode('BasementCondition')
fillMode('BasementAccess')
fillMode('BasementFinish1')
fillMode('BasementFinish2')
df2['BasementUnfinishedArea']=df2['BasementUnfinishedArea'].replace(0,df2['TotalBasementArea'].median()).astype(int)
df2['TotalBasementArea']=df2['TotalBasementArea'].replace(0,df2['TotalBasementArea'].median()).astype(int)


In [87]:
fillMode('ElectricalSystem')

In [89]:
df2['FireplaceQuality'].fillna('NoFireplace',inplace=True)

In [90]:
fillMode('GarageLocation')
fillMode('GarageConstructionYear')
fillMode('GarageInterior')
fillMode('GarageQuality')
fillMode('GarageCondition')

In [91]:
df2['PoolQuality'].fillna('NoPool',inplace=True)

In [92]:
fillMode('FenceQuality')