# Progetto - Complementi di Basi di Dati A.A. 2023/2024

### Libraries

About each library:
- **pandas**:
  - Documentation: [pandas documentation](https://pandas.pydata.org/docs/)
  - Installation:
    ```bash
    pip install pandas
    ```

- **joblib**:
  - Documentation: No direct link provided; typically included in scikit-learn documentation.
  - Installation:
    ```bash
    pip install joblib
    ```

- **fuzzywuzzy**:
  - Documentation: No direct official documentation; usage examples can be found in community resources.
  - Installation (including `python-Levenshtein` as a dependency):
    ```bash
    pip install fuzzywuzzy python-Levenshtein
    ```

- **scikit-learn** (includes `LabelEncoder`, `train_test_split`, `GradientBoostingRegressor`, `mean_squared_error`):
  - Documentation: [scikit-learn documentation](https://scikit-learn.org/stable/documentation.html)
  - Installation:
    ```bash
    pip install scikit-learn
    ```

- **scikit-optimize (skopt)**:
  - Documentation: [scikit-optimize documentation](https://scikit-optimize.github.io/stable/)
  - Installation:
    ```bash
    pip install scikit-optimize
    ```

- **SQLAlchemy**:
  - Documentation: [SQLAlchemy documentation](https://docs.sqlalchemy.org/en/14/)
  - Installation:
    ```bash
    pip install sqlalchemy
    ```

In [57]:
import pandas as pd 
# Used for data manipulation
import joblib       
# Save the model so that you don't need to train it every time
from fuzzywuzzy import process  
# String matching to correct typos
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error
from skopt import BayesSearchCV
from skopt.space import Real, Categorical, Integer
# sklearn and skopt to train the predictive model
from sqlalchemy import create_engine
from sqlalchemy.engine import URL
from sqlalchemy import text
# sqlalchemy to connect to the PostgreSQL database

# This is true when you want to retrain the predictive model 
RETRAIN_MODEL = False

### Data initialization

This are variables used in the initialization and cleaning of the data

In [58]:
# All the values that the string columns can assume
data_description = {
    "MSSubClass": {
        20: "1-STORY 1946 & NEWER ALL STYLES",
        30: "1-STORY 1945 & OLDER",
        40: "1-STORY W/FINISHED ATTIC ALL AGES",
        45: "1-1/2 STORY - UNFINISHED ALL AGES",
        50: "1-1/2 STORY FINISHED ALL AGES",
        60: "2-STORY 1946 & NEWER",
        70: "2-STORY 1945 & OLDER",
        75: "2-1/2 STORY ALL AGES",
        80: "SPLIT OR MULTI-LEVEL",
        85: "SPLIT FOYER",
        90: "DUPLEX - ALL STYLES AND AGES",
        120: "1-STORY PUD (Planned Unit Development) - 1946 & NEWER",
        150: "1-1/2 STORY PUD - ALL AGES",
        160: "2-STORY PUD - 1946 & NEWER",
        180: "PUD - MULTILEVEL - INCL SPLIT LEV/FOYER",
        190: "2 FAMILY CONVERSION - ALL STYLES AND AGES"
    },
    # Identifies the type of dwelling involved in the sale

    "MSZoning": {
        "A": "Agriculture",
        "C (all)": "Commercial",
        "FV": "Floating Village Residential",
        "I": "Industrial",
        "RH": "Residential High Density",
        "RL": "Residential Low Density",
        "RP": "Residential Low Density Park",
        "RM": "Residential Medium Density"
    },
    # Identifies the general zoning classification of the sale

    "LotFrontage": None,
    # Linear feet of street connected to property

    "LotArea": None,
    # Lot size in square feet

    "Street": {
        "Grvl": "Gravel",
        "Pave": "Paved"
    },
    # Type of road access to property

    "Alley": {
        "Grvl": "Gravel",
        "Pave": "Paved",
        "NA": "No alley access"
    },
    # Type of alley access to property

    "LotShape": {
        "Reg": "Regular",
        "IR1": "Slightly irregular",
        "IR2": "Moderately irregular",
        "IR3": "Irregular"
    },
    # General shape of property

    "LandContour": {
        "Lvl": "Near Flat/Level",
        "Bnk": "Banked - Quick and significant rise from street grade to building",
        "HLS": "Hillside - Significant slope from side to side",
        "Low": "Depression"
    },
    # Flatness of the property

    "Utilities": {
        "AllPub": "All public Utilities (E,G,W,& S)",
        "NoSewr": "Electricity, Gas, and Water (Septic Tank)",
        "NoSeWa": "Electricity and Gas Only",
        "ELO": "Electricity only"
    },
    # Type of utilities available

    "LotConfig": {
        "Inside": "Inside lot",
        "Corner": "Corner lot",
        "CulDSac": "Cul-de-sac",
        "FR2": "Frontage on 2 sides of property",
        "FR3": "Frontage on 3 sides of property"
    },
    # Lot configuration

    "LandSlope": {
        "Gtl": "Gentle slope",
        "Mod": "Moderate Slope",
        "Sev": "Severe Slope"
    },
    # Slope of property

    "Neighborhood": {
        "Blmngtn": "Bloomington Heights",
        "Blueste": "Bluestem",
        "BrDale": "Briardale",
        "BrkSide": "Brookside",
        "ClearCr": "Clear Creek",
        "CollgCr": "College Creek",
        "Crawfor": "Crawford",
        "Edwards": "Edwards",
        "Gilbert": "Gilbert",
        "IDOTRR": "Iowa DOT and Rail Road",
        "MeadowV": "Meadow Village",
        "Mitchel": "Mitchell",
        "Names": "North Ames",
        "NoRidge": "Northridge",
        "NPkVill": "Northpark Villa",
        "NridgHt": "Northridge Heights",
        "NWAmes": "Northwest Ames",
        "OldTown": "Old Town",
        "SWISU": "South & West of Iowa State University",
        "Sawyer": "Sawyer",
        "SawyerW": "Sawyer West",
        "Somerst": "Somerset",
        "StoneBr": "Stone Brook",
        "Timber": "Timberland",
        "Veenker": "Veenker"
    },
    # Physical locations within Ames city limits

    "Condition1": {
        "Artery": "Adjacent to arterial street",
        "Feedr": "Adjacent to feeder street",
        "Norm": "Normal",
        "RRNn": "Within 200' of North-South Railroad",
        "RRAn": "Adjacent to North-South Railroad",
        "PosN": "Near positive off-site feature--park, greenbelt, etc.",
        "PosA": "Adjacent to positive off-site feature",
        "RRNe": "Within 200' of East-West Railroad",
        "RRAe": "Adjacent to East-West Railroad"
    },
    # Proximity to various conditions

    "Condition2": {
        "Artery": "Adjacent to arterial street",
        "Feedr": "Adjacent to feeder street",
        "Norm": "Normal",
        "RRNn": "Within 200' of North-South Railroad",
        "RRAn": "Adjacent to North-South Railroad",
        "PosN": "Near positive off-site feature--park, greenbelt, etc.",
        "PosA": "Adjacent to positive off-site feature",
        "RRNe": "Within 200' of East-West Railroad",
        "RRAe": "Adjacent to East-West Railroad"
    },
    # Proximity to various conditions (if more than one is present)

    "BldgType": {
        "1Fam": "Single-family Detached",
        "2FmCon": "Two-family Conversion; originally built as one-family dwelling",
        "Duplx": "Duplex",
        "TwnhsE": "Townhouse End Unit",
        "TwnhsI": "Townhouse Inside Unit"
    },
    # Type of dwelling

    "HouseStyle": {
        "1Story": "One story",
        "1.5Fin": "One and one-half story: 2nd level finished",
        "1.5Unf": "One and one-half story: 2nd level unfinished",
        "2Story": "Two story",
        "2.5Fin": "Two and one-half story: 2nd level finished",
        "2.5Unf": "Two and one-half story: 2nd level unfinished",
        "SFoyer": "Split Foyer",
        "SLvl": "Split Level"
    },
    # Style of dwelling

    "OverallQual": None,
    # Rates the overall material and finish of the house

    "OverallCond": None,
    # Rates the overall condition of the house

    "YearBuilt": None,
    # Original construction date

    "YearRemodAdd": None,
    # Remodel date (same as construction date if no remodeling or additions)

    "RoofStyle": {
        "Flat": "Flat",
        "Gable": "Gable",
        "Gambrel": "Gabrel (Barn)",
        "Hip": "Hip",
        "Mansard": "Mansard",
        "Shed": "Shed"
    },
    # Type of roof

    "RoofMatl": {
        "ClyTile": "Clay or Tile",
        "CompShg": "Standard (Composite) Shingle",
        "Membran": "Membrane",
        "Metal": "Metal",
        "Roll": "Roll",
        "Tar&Grv": "Gravel & Tar",
        "WdShake": "Wood Shakes",
        "WdShngl": "Wood Shingles"
    },
    # Roof material

    "Exterior1st": {
        "AsbShng": "Asbestos Shingles",
        "AsphShn": "Asphalt Shingles",
        "BrkComm": "Brick Common",
        "BrkFace": "Brick Face",
        "CBlock": "Cinder Block",
        "CemntBd": "Cement Board",
        "HdBoard": "Hard Board",
        "ImStucc": "Imitation Stucco",
        "MetalSd": "Metal Siding",
        "Other": "Other",
        "Plywood": "Plywood",
        "PreCast": "PreCast",
        "Stone": "Stone",
        "Stucco": "Stucco",
        "VinylSd": "Vinyl Siding",
        "Wd Sdng": "Wood Siding",
        "WdShing": "Wood Shingles"
    },
    # Exterior covering on house

    "Exterior2nd": {
        "AsbShng": "Asbestos Shingles",
        "AsphShn": "Asphalt Shingles",
        "BrkComm": "Brick Common",
        "BrkFace": "Brick Face",
        "CBlock": "Cinder Block",
        "CemntBd": "Cement Board",
        "HdBoard": "Hard Board",
        "ImStucc": "Imitation Stucco",
        "MetalSd": "Metal Siding",
        "Other": "Other",
        "Plywood": "Plywood",
        "PreCast": "PreCast",
        "Stone": "Stone",
        "Stucco": "Stucco",
        "VinylSd": "Vinyl Siding",
        "Wd Sdng": "Wood Siding",
        "WdShing": "Wood Shingles"
    },
    # Exterior covering on house (if more than one material)

    "MasVnrType": {
        "BrkCmn": "Brick Common",
        "BrkFace": "Brick Face",
        "CBlock": "Cinder Block",
        "None": "None",
        "Stone": "Stone",
        "NA": "No veneer"
    },
    # Masonry veneer type

    "MasVnrArea": None,
    # Masonry veneer area in square feet

    "ExterQual": {
        "Ex": "Excellent",
        "Gd": "Good",
        "TA": "Average/Typical",
        "Fa": "Fair",
        "Po": "Poor"
    },
    # Evaluates the quality of the material on the exterior

    "ExterCond": {
        "Ex": "Excellent",
        "Gd": "Good",
        "TA": "Average/Typical",
        "Fa": "Fair",
        "Po": "Poor"
    },
    # Evaluates the present condition of the material on the exterior

    "Foundation": {
        "BrkTil": "Brick & Tile",
        "CBlock": "Cinder Block",
        "PConc": "Poured Contrete",
        "Slab": "Slab",
        "Stone": "Stone",
        "Wood": "Wood"
    },
    # Type of foundation

    "BsmtQual": {
        "Ex": "Excellent (100+ inches)",
        "Gd": "Good (90-99 inches)",
        "TA": "Average/Typical (80-89 inches)",
        "Fa": "Fair (70-79 inches)",
        "Po": "Poor (<70 inches)",
        "NA": "No Basement"
    },
    # Evaluates the height of the basement

    "BsmtCond": {
        "Ex": "Excellent",
        "Gd": "Good",
        "TA": "Typical - slight dampness allowed",
        "Fa": "Fair - dampness or some cracking",
        "Po": "Poor - Severe cracking, settling, or wetness",
        "NA": "No Basement"
    },
    # Evaluates the general condition of the basement

    "BsmtExposure": {
        "Gd": "Good Exposure",
        "Av": "Average Exposure (split levels or foyers typically score average or above)",
        "Mn": "Mimimum Exposure",
        "No": "No Exposure",
        "NA": "No Basement"
    },
    # Refers to walkout or garden level walls

    "BsmtFinType1": {
        "GLQ": "Good Living Quarters",
        "ALQ": "Average Living Quarters",
        "BLQ": "Below Average Living Quarters",
        "Rec": "Average Rec Room",
        "LwQ": "Low Quality",
        "Unf": "Unfinshed",
        "NA": "No Basement"
    },
    # Rating of basement finished area

    "BsmtFinSF1": None,
    # Type 1 finished square feet

    "BsmtFinType2": {
        "GLQ": "Good Living Quarters",
        "ALQ": "Average Living Quarters",
        "BLQ": "Below Average Living Quarters",
        "Rec": "Average Rec Room",
        "LwQ": "Low Quality",
        "Unf": "Unfinshed",
        "NA": "No Basement"
    },
    # Rating of basement finished area (if multiple types)

    "BsmtFinSF2": None,
    # Type 2 finished square feet

    "BsmtUnfSF": None,
    # Unfinished square feet of basement area

    "TotalBsmtSF": None,
    # Total square feet of basement area

    "Heating": {
        "Floor": "Floor Furnace",
        "GasA": "Gas forced warm air furnace",
        "GasW": "Gas hot water or steam heat",
        "Grav": "Gravity furnace",
        "OthW": "Hot water or steam heat other than gas",
        "Wall": "Wall furnace"
    },
    # Type of heating

    "HeatingQC": {
        "Ex": "Excellent",
        "Gd": "Good",
        "TA": "Average/Typical",
        "Fa": "Fair",
        "Po": "Poor"
    },
    # Heating quality and condition

    "CentralAir": {
        "N": "No",
        "Y": "Yes"
    },
    # Central air conditioning

    "Electrical": {
        "SBrkr": "Standard Circuit Breakers & Romex",
        "FuseA": "Fuse Box over 60 AMP and all Romex wiring (Average)",
        "FuseF": "60 AMP Fuse Box and mostly Romex wiring (Fair)",
        "FuseP": "60 AMP Fuse Box and mostly knob & tube wiring (poor)",
        "Mix": "Mixed"
    },
    # Electrical system

    "1stFlrSF": None,
    # First Floor square feet

    "2ndFlrSF": None,
    # Second floor square feet

    "LowQualFinSF": None,
    # Low quality finished square feet (all floors)

    "GrLivArea": None,
    # Above grade (ground) living area square feet

    "BsmtFullBath": None,
    # Basement full bathrooms

    "BsmtHalfBath": None,
    # Basement half bathrooms

    "FullBath": None,
    # Full bathrooms above grade

    "HalfBath": None,
    # Half baths above grade

    "BedroomAbvGr": None,
    # Bedrooms above grade (does NOT include basement bedrooms)

    "KitchenAbvGr": None,
    # Kitchens above grade

    "KitchenQual": {
        "Ex": "Excellent",
        "Gd": "Good",
        "TA": "Typical/Average",
        "Fa": "Fair",
        "Po": "Poor"
    },
    # Kitchen quality

    "TotRmsAbvGrd": None,
    # Total rooms above grade (does not include bathrooms)

    "Functional": {
        "Typ": "Typical Functionality",
        "Min1": "Minor Deductions 1",
        "Min2": "Minor Deductions 2",
        "Mod": "Moderate Deductions",
        "Maj1": "Major Deductions 1",
        "Maj2": "Major Deductions 2",
        "Sev": "Severely Damaged",
        "Sal": "Salvage only"
    },
    # Home functionality (Assume typical unless deductions are warranted)

    "Fireplaces": None,
    # Number of fireplaces

    "FireplaceQu": {
        "Ex": "Excellent - Exceptional Masonry Fireplace",
        "Gd": "Good - Masonry Fireplace in main level",
        "TA": "Average - Prefabricated Fireplace in main living area or Masonry Fireplace in basement",
        "Fa": "Fair - Prefabricated Fireplace in basement",
        "Po": "Poor - Ben Franklin Stove",
        "NA": "No Fireplace"
    },
    # Fireplace quality

    "GarageType": {
        "2Types": "More than one type of garage",
        "Attchd": "Attached to home",
        "Basment": "Basement Garage",
        "BuiltIn": "Built-In (Garage part of house - typically has room above garage)",
        "CarPort": "Car Port",
        "Detchd": "Detached from home",
        "NA": "No Garage"
    },
    # Garage location

    "GarageYrBlt": None,
    # Year garage was built

    "GarageFinish": {
        "Fin": "Finished",
        "RFn": "Rough Finished",
        "Unf": "Unfinished",
        "NA": "No Garage"
    },
    # Interior finish of the garage

    "GarageCars": None,
    # Size of garage in car capacity

    "GarageArea": None,
    # Size of garage in square feet

    "GarageQual": {
        "Ex": "Excellent",
        "Gd": "Good",
        "TA": "Typical/Average",
        "Fa": "Fair",
        "Po": "Poor",
        "NA": "No Garage"
    },
    # Garage quality

    "GarageCond": {
        "Ex": "Excellent",
        "Gd": "Good",
        "TA": "Typical/Average",
        "Fa": "Fair",
        "Po": "Poor",
        "NA": "No Garage"
    },
    # Garage condition

    "PavedDrive": {
        "Y": "Paved",
        "P": "Partial Pavement",
        "N": "Dirt/Gravel"
    },
    # Paved driveway

    "WoodDeckSF": None,
    # Wood deck area in square feet

    "OpenPorchSF": None,
    # Open porch area in square feet

    "EnclosedPorch": None,
    # Enclosed porch area in square feet

    "3SsnPorch": None,
    # Three season porch area in square feet

    "ScreenPorch": None,
    # Screen porch area in square feet

    "PoolArea": None,
    # Pool area in square feet

    "PoolQC": {
        "Ex": "Excellent",
        "Gd": "Good",
        "TA": "Average/Typical",
        "Fa": "Fair",
        "NA": "No Pool"
    },
    # Pool quality

    "Fence": {
        "GdPrv": "Good Privacy",
        "MnPrv": "Minimum Privacy",
        "GdWo": "Good Wood",
        "MnWw": "Minimum Wood/Wire",
        "NA": "No Fence"
    },
    # Fence quality

    "MiscFeature": {
        "Elev": "Elevator",
        "Gar2": "2nd Garage (if not described in garage section)",
        "Othr": "Other",
        "Shed": "Shed (over 100 SF)",
        "TenC": "Tennis Court",
        "NA": "None"
    },
    # Miscellaneous feature not covered in other categories

    "MiscVal": None,
    # Value of miscellaneous feature

    "MoSold": None,
    # Month Sold (MM)

    "YrSold": None,
    # Year Sold (YYYY)

    "SaleType": {
        "WD": "Warranty Deed - Conventional",
        "CWD": "Warranty Deed - Cash",
        "VWD": "Warranty Deed - VA Loan",
        "New": "Home just constructed and sold",
        "COD": "Court Officer Deed/Estate",
        "Con": "Contract 15% Down payment regular terms",
        "ConLw": "Contract Low Down payment and low interest",
        "ConLI": "Contract Low Interest",
        "ConLD": "Contract Low Down",
        "Oth": "Other"
    },
    # Type of sale

    "SaleCondition": {
        "Normal": "Normal Sale",
        "Abnorml": "Abnormal Sale - trade, foreclosure, short sale",
        "AdjLand": "Adjoining Land Purchase",
        "Alloca": "Allocation - two linked properties with separate deeds, typically condo with a garage unit",
        "Family": "Sale between family members",
        "Partial": "Home was not completed when last assessed (associated with New Homes)"
    },
    # Condition of sale

    "SalePrice": None
    # Price of sale
    # Was not in the original file description but it was in the db file
}

# If the value is missing take the value from rows that share this columns
columns_to_group = {
    'MSSubClass': ['BldgType', 'HouseStyle'],  # Identifies the type of dwelling involved in the sale
    'MSZoning': ['Neighborhood', 'LotConfig'],  # Identifies the general zoning classification of the sale
    'LotFrontage': 0,  # Linear feet of street connected to property
    'LotArea': [],  # Lot size in square feet
    'Street': [],  # Type of road access to property
    'Alley': [],  # Type of alley access to property
    'LotShape': [],  # General shape of property
    'LandContour': [],  # Flatness of the property
    'Utilities': [],  # Type of utilities available
    'LotConfig': [],  # Lot configuration
    'LandSlope': [],  # Slope of property
    'Neighborhood': [],  # Physical locations within Ames city limits
    'Condition1': ['Condition2'],  # Proximity to various conditions
    'Condition2': ['Condition1'],  # Proximity to various conditions (if more than one is present)
    'BldgType': ['MSSubClass', 'HouseStyle'],  # Type of dwelling
    'HouseStyle': ['MSSubClass', 'BldgType'],  # Style of dwelling
    'OverallQual': [],  # Rates the overall material and finish of the house
    'OverallCond': [],  # Rates the overall condition of the house
    'YearBuilt': [],  # Original construction date
    'YearRemodAdd': [],  # Remodel date (same as construction date if no remodeling or additions)
    'RoofStyle': [],  # Type of roof
    'RoofMatl': [],  # Roof material
    'Exterior1st': ['Exterior2nd'],  # Exterior covering on house
    'Exterior2nd': ['Exterior1st'],  # Exterior covering on house (if more than one material)
    'MasVnrType': ['MasVnrArea'],  # Masonry veneer type
    'MasVnrArea': ['MasVnrType'],  # Masonry veneer area in square feet
    'ExterQual': [],  # Evaluates the quality of the material on the exterior
    'ExterCond': [],  # Evaluates the present condition of the material on the exterior
    'Foundation': [],  # Type of foundation
    'BsmtQual': ['BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2'],  # Evaluates the height of the basement
    'BsmtCond': ['BsmtQual', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2'],  # Evaluates the general condition of the basement
    'BsmtExposure': ['BsmtQual', 'BsmtCond', 'BsmtFinType1', 'BsmtFinType2'],  # Refers to walkout or garden level walls
    'BsmtFinType1': ['BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType2'],  # Rating of basement finished area
    'BsmtFinSF1': ['BsmtFinType1', 'TotalBsmtSF'],  # Type 1 finished square feet
    'BsmtFinType2': ['BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1'],  # Rating of basement finished area (if multiple types)
    'BsmtFinSF2': ['BsmtFinType2', 'TotalBsmtSF'],  # Type 2 finished square feet
    'BsmtUnfSF': ['TotalBsmtSF'],  # Unfinished square feet of basement area
    'TotalBsmtSF': ['BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF'],  # Total square feet of basement area
    'Heating': [],  # Type of heating
    'HeatingQC': [],  # Heating quality and condition
    'CentralAir': [],  # Central air conditioning
    'Electrical': [],  # Electrical system
    '1stFlrSF': [],  # First Floor square feet
    '2ndFlrSF': [],  # Second floor square feet
    'LowQualFinSF': [],  # Low quality finished square feet (all floors)
    'GrLivArea': ['1stFlrSF', '2ndFlrSF', 'LowQualFinSF'],  # Above grade (ground) living area square feet
    'BsmtFullBath': [],  # Basement full bathrooms
    'BsmtHalfBath': [],  # Basement half bathrooms
    'FullBath': [],  # Full bathrooms above grade
    'HalfBath': [],  # Half baths above grade
    'Bedroom': [],  # Bedrooms above grade (does NOT include basement bedrooms)
    'Kitchen': [],  # Kitchens above grade
    'KitchenQual': [],  # Kitchen quality
    'TotRmsAbvGrd': ['Bedroom'],  # Total rooms above grade (does not include bathrooms)
    'Functional': [],  # Home functionality (Assume typical unless deductions are warranted)
    'Fireplaces': [],  # Number of fireplaces
    'FireplaceQu': ['Fireplaces'],  # Fireplace quality
    'GarageType': ['GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual', 'GarageCond'],  # Garage location
    'GarageYrBlt': ['GarageType', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual', 'GarageCond'],  # Year garage was built
    'GarageFinish': ['GarageType', 'GarageYrBlt', 'GarageCars', 'GarageArea', 'GarageQual', 'GarageCond'],  # Interior finish of the garage
    'GarageCars': ['GarageType', 'GarageYrBlt', 'GarageFinish', 'GarageArea', 'GarageQual', 'GarageCond'],  # Size of garage in car capacity
    'GarageArea': ['GarageType', 'GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageQual', 'GarageCond'],  # Size of garage in square feet
    'GarageQual': ['GarageType', 'GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageCond'],  # Garage quality
    'GarageCond': ['GarageType', 'GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual'],  # Garage condition
    'PavedDrive': [],  # Paved driveway
    'WoodDeckSF': [],  # Wood deck area in square feet
    'OpenPorchSF': [],  # Open porch area in square feet
    'EnclosedPorch': [],  # Enclosed porch area in square feet
    '3SsnPorch': [],  # Three season porch area in square feet
    'ScreenPorch': [],  # Screen porch area in square feet
    'PoolArea': ['PoolQC'],  # Pool area in square feet
    'PoolQC': ['PoolArea'],  # Pool quality
    'Fence': [],  # Fence quality
    'MiscFeature': ['MiscVal'],  # Miscellaneous feature not covered in other categories
    'MiscVal': ['MiscFeature'],  # $Value of miscellaneous feature
    'MoSold': [],  # Month Sold (MM)
    'YrSold': [],  # Year Sold (YYYY)
    'SaleType': [],  # Type of sale
    'SaleCondition': []  # Condition of sale
}

# Transform the easiest string values into int from 0 onwards
ordinal_encodings = {
    'Alley': {'NA': 0, 'Grvl': 1, 'Pave': 2},
    'BsmtCond': {'NA': 0, 'Po': 1, 'Fa': 2, 'TA': 3, 'Gd': 4, 'Ex': 5},
    'BsmtExposure': {'NA': 0, 'No': 1, 'Mn': 2, 'Av': 3, 'Gd': 4},
    'BsmtFinType1': {'NA': 0, 'Unf': 1, 'LwQ': 2, 'Rec': 3, 'BLQ': 4, 'ALQ': 5, 'GLQ': 6},
    'BsmtFinType2': {'NA': 0, 'Unf': 1, 'LwQ': 2, 'Rec': 3, 'BLQ': 4, 'ALQ': 5, 'GLQ': 6},
    'BsmtQual': {'NA': 0, 'Po': 1, 'Fa': 2, 'TA': 3, 'Gd': 4, 'Ex': 5},
    'CentralAir': {'N': 0, 'Y': 1},
    'ExterCond': {'Po': 1, 'Fa': 2, 'TA': 3, 'Gd': 4, 'Ex': 5},
    'ExterQual': {'Po': 1, 'Fa': 2, 'TA': 3, 'Gd': 4, 'Ex': 5},
    'Fence': {'NA': 0, 'MnWw': 1, 'GdWo': 2, 'MnPrv': 3, 'GdPrv': 4},
    'FireplaceQu': {'NA': 0, 'Po': 1, 'Fa': 2, 'TA': 3, 'Gd': 4, 'Ex': 5},
    'Functional': {'Sal': 1, 'Sev': 2, 'Maj2': 3, 'Maj1': 4, 'Mod': 5, 'Min2': 6, 'Min1': 7, 'Typ': 8},
    'GarageCond': {'NA': 0, 'Po': 1, 'Fa': 2, 'TA': 3, 'Gd': 4, 'Ex': 5},
    'GarageQual': {'NA': 0, 'Po': 1, 'Fa': 2, 'TA': 3, 'Gd': 4, 'Ex': 5},
    'GarageFinish': {'NA': 0, 'Unf': 1, 'RFn': 2, 'Fin': 3},
    'HeatingQC': {'Po': 1, 'Fa': 2, 'TA': 3, 'Gd': 4, 'Ex': 5},
    'KitchenQual': {'Po': 1, 'Fa': 2, 'TA': 3, 'Gd': 4, 'Ex': 5},
    'LandContour': {'Low': 1, 'HLS': 2, 'Bnk': 3, 'Lvl': 4},
    'LotShape': {'IR3': 1, 'IR2': 2, 'IR1': 3, 'Reg': 4},
    'PavedDrive': {'N': 0, 'P': 1, 'Y': 2},
    'PoolQC': {'NA': 0, 'Fa': 1, 'TA': 2, 'Gd': 3, 'Ex': 4},
    'Street': {'Grvl': 0, 'Pave': 1},
    'Utilities': {'ELO': 1, 'NoSeWa': 2, 'NoSewr': 3, 'AllPub': 4}
}

# Leverage the weight of each column
weights = {
    'MSSubClass': 0,
    'MSZoning': 0,
    'LotFrontage': 0,
    'LotArea': 1,
    'Street': 0,
    'Alley': 0,
    'LotShape': 0,
    'LandContour': 0,
    'Utilities': 0,
    'LotConfig': 0,
    'LandSlope': 0,
    'Neighborhood': 0,
    'Condition1': 0,
    'Condition2': 0,
    'BldgType': 0,
    'HouseStyle': 0,
    'OverallQual': 0,
    'OverallCond': 0,
    'YearBuilt': 0,
    'YearRemodAdd': 0,
    'RoofStyle': 0,
    'RoofMatl': 0,
    'Exterior1st': 0,
    'Exterior2nd': 0,
    'MasVnrType': 0,
    'MasVnrArea': 0,
    'ExterQual': 0,
    'ExterCond': 0,
    'Foundation': 0,
    'BsmtQual': 0,
    'BsmtCond': 0,
    'BsmtExposure': 0,
    'BsmtFinType1': 0,
    'BsmtFinSF1': 0,
    'BsmtFinType2': 0,
    'BsmtFinSF2': 0,
    'BsmtUnfSF': 0,
    'TotalBsmtSF': 0,
    'Heating': 0,
    'HeatingQC': 0,
    'CentralAir': 0,
    'Electrical': 0,
    '1stFlrSF': 0,
    '2ndFlrSF': 0,
    'LowQualFinSF': 0,
    'GrLivArea': 0,
    'BsmtFullBath': 0,
    'BsmtHalfBath': 0,
    'FullBath': 0,
    'HalfBath': 0,
    'BedroomAbvGr': 0,
    'KitchenAbvGr': 0,
    'KitchenQual': 0,
    'TotRmsAbvGrd': 0,
    'Functional': 0,
    'Fireplaces': 0,
    'FireplaceQu': 1,
    'GarageType': 0,
    'GarageYrBlt': 0,
    'GarageFinish': 0,
    'GarageCars': 0,
    'GarageArea': 0,
    'GarageQual': 0,
    'GarageCond': 0,
    'PavedDrive': 0,
    'WoodDeckSF': 0,
    'OpenPorchSF': 0,
    'EnclosedPorch': 0,
    '3SsnPorch': 0,
    'ScreenPorch': 0,
    'PoolArea': 0,
    'PoolQC': 1,
    'Fence': 0,
    'MiscFeature': 0,
    'MiscVal': 0,
    'MoSold': 1,
    'YrSold': 1,
    'SaleType': 0,
    'SaleCondition': 0
}

In [59]:
# Read the data from the .csv files
train_df = pd.read_csv('DATA/train.csv')
test_df = pd.read_csv('DATA/test.csv') 
print(train_df)

        Id  MSSubClass MSZoning  LotFrontage  LotArea Street Alley LotShape  \
0        1          60       RL         65.0     8450   Pave   NaN      Reg   
1        2          20       RL         80.0     9600   Pave   NaN      Reg   
2        3          60       RL         68.0    11250   Pave   NaN      IR1   
3        4          70       RL         60.0     9550   Pave   NaN      IR1   
4        5          60       RL         84.0    14260   Pave   NaN      IR1   
...    ...         ...      ...          ...      ...    ...   ...      ...   
1455  1456          60       RL         62.0     7917   Pave   NaN      Reg   
1456  1457          20       RL         85.0    13175   Pave   NaN      Reg   
1457  1458          70       RL         66.0     9042   Pave   NaN      Reg   
1458  1459          20       RL         68.0     9717   Pave   NaN      Reg   
1459  1460          20       RL         75.0     9937   Pave   NaN      Reg   

     LandContour Utilities  ... PoolArea PoolQC  Fe

In [60]:
print(train_df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 81 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             1460 non-null   int64  
 1   MSSubClass     1460 non-null   int64  
 2   MSZoning       1460 non-null   object 
 3   LotFrontage    1201 non-null   float64
 4   LotArea        1460 non-null   int64  
 5   Street         1460 non-null   object 
 6   Alley          91 non-null     object 
 7   LotShape       1460 non-null   object 
 8   LandContour    1460 non-null   object 
 9   Utilities      1460 non-null   object 
 10  LotConfig      1460 non-null   object 
 11  LandSlope      1460 non-null   object 
 12  Neighborhood   1460 non-null   object 
 13  Condition1     1460 non-null   object 
 14  Condition2     1460 non-null   object 
 15  BldgType       1460 non-null   object 
 16  HouseStyle     1460 non-null   object 
 17  OverallQual    1460 non-null   int64  
 18  OverallC

In [61]:
print(train_df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 81 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             1460 non-null   int64  
 1   MSSubClass     1460 non-null   int64  
 2   MSZoning       1460 non-null   object 
 3   LotFrontage    1201 non-null   float64
 4   LotArea        1460 non-null   int64  
 5   Street         1460 non-null   object 
 6   Alley          91 non-null     object 
 7   LotShape       1460 non-null   object 
 8   LandContour    1460 non-null   object 
 9   Utilities      1460 non-null   object 
 10  LotConfig      1460 non-null   object 
 11  LandSlope      1460 non-null   object 
 12  Neighborhood   1460 non-null   object 
 13  Condition1     1460 non-null   object 
 14  Condition2     1460 non-null   object 
 15  BldgType       1460 non-null   object 
 16  HouseStyle     1460 non-null   object 
 17  OverallQual    1460 non-null   int64  
 18  OverallC

In [62]:
missing_values_train = train_df.isnull().sum()
print("Missing values in training data:")
print(missing_values_train[missing_values_train > 0])

Missing values in training data:
LotFrontage      259
Alley           1369
MasVnrType       872
MasVnrArea         8
BsmtQual          37
BsmtCond          37
BsmtExposure      38
BsmtFinType1      37
BsmtFinType2      38
Electrical         1
FireplaceQu      690
GarageType        81
GarageYrBlt       81
GarageFinish      81
GarageQual        81
GarageCond        81
PoolQC          1453
Fence           1179
MiscFeature     1406
dtype: int64


In [63]:
missing_values_train = test_df.isnull().sum()
print("Missing values in test data:")
print(missing_values_train[missing_values_train > 0])

Missing values in test data:
MSZoning           4
LotFrontage      227
Alley           1352
Utilities          2
Exterior1st        1
Exterior2nd        1
MasVnrType       894
MasVnrArea        15
BsmtQual          44
BsmtCond          45
BsmtExposure      44
BsmtFinType1      42
BsmtFinSF1         1
BsmtFinType2      42
BsmtFinSF2         1
BsmtUnfSF          1
TotalBsmtSF        1
BsmtFullBath       2
BsmtHalfBath       2
KitchenQual        1
Functional         2
FireplaceQu      730
GarageType        76
GarageYrBlt       78
GarageFinish      78
GarageCars         1
GarageArea         1
GarageQual        78
GarageCond        78
PoolQC          1456
Fence           1169
MiscFeature     1408
SaleType           1
dtype: int64


In [64]:
# Get all the columns with NA as a valid value
def get_NA_columns(data_description):
    return_columns = []
    for column, desc_dict in data_description.items():
        if desc_dict is not None and "NA" in desc_dict:
            return_columns.append(column)
    return return_columns
    
columns_to_check = get_NA_columns(data_description)
# Change all the nan in those columns to the default NA
for col in columns_to_check:
    test_df[col] = test_df[col].fillna("NA")

# Change all the nan in those columns to the default NA
for col in columns_to_check:
    train_df[col] = test_df[col].fillna("NA")

In [65]:
def validate_and_correct_value(column, value, data_description):
    # Validate and corrects the value if it`s not valid for the specific column
    if column in data_description:
        if value in data_description[column]:
            return value
        else:
            # Make no sense to use the fuzzy on an undefined value
            if pd.isna(value):
                return 'nan'  # Value not defined
            # Find the closest value using fuzzy matching
            closest_match, score = process.extractOne(value, data_description[column].keys())
            if score > 80:  # Similarity threshold
                return closest_match
            else:
                return 'nan'  # Value not defined
    return value

In [66]:
def clean_and_validate_data(dataFrame, data_description, columns_to_group):
    # Iterate through every column of the DataFrame to verify the values
    for col in dataFrame.columns:
        if col in data_description:  # Check if the column has default values
            if data_description[col] is not None:
                # Check every line that is either null or does not have a value that appears in the definition
                invalid_rows = ~dataFrame[col].isin(data_description[col].keys())
                if invalid_rows.any():  # If there are invalid values
                    # For each element invalid call the validate_and_correct_value function so that it applies a fuzzy search to see if it was a typo
                    dataFrame.loc[invalid_rows, col] = dataFrame.loc[invalid_rows, col].apply(lambda x: validate_and_correct_value(col, x, data_description))
                    
            # After the correction make an educated guess about the value that can be placed there
            filter_notnull = dataFrame[col].notnull()

            # If it's not a list it means there`s a default value when null
            if type(columns_to_group.get(col)) is not list:
                default_value = columns_to_group.get(col)
                # Change the values into the default value
                dataFrame.loc[dataFrame[col].isnull(), col] = default_value
            else:
                if columns_to_group.get(col):  # Check if there are correlated columns
                    for related_col in columns_to_group[col]:
                        if related_col in dataFrame.columns:
                            filter_notnull = filter_notnull & dataFrame[related_col].notnull()
                
                # Calculate the mode for the rows that satisfy the filter
                if filter_notnull.any():
                    mode_value = dataFrame.loc[filter_notnull, col].mode()[0]
                    
                    # Change the values into the mode calculated
                    dataFrame.loc[dataFrame[col].isnull(), col] = mode_value
    return dataFrame

In [67]:
# Clean the data
test_df = clean_and_validate_data(test_df, data_description, columns_to_group)
train_df = clean_and_validate_data(train_df, data_description, columns_to_group)

In [68]:
# It should not be missing anything now
missing_values_test = test_df.isnull().sum()
print("Missing values in training data:")
print(missing_values_test[missing_values_test > 0])

Missing values in training data:
Series([], dtype: int64)


In [69]:
missing_values_test = train_df.isnull().sum()
print("Missing values in testing data:")
print(missing_values_test[missing_values_test > 0])

Missing values in testing data:
Series([], dtype: int64)


### Creation and connection to the DB

Here we connect to the PostgreSQL server and create a new database

In [70]:
# Connection parameters for the PostgreSQL server
db_url = URL.create(
    drivername="postgresql+psycopg2",
    username="postgres",
    password="unimib",
    host="localhost",
    port="5432",
    database=f"postgres"
)

# Database name
new_db_name = 'ames_housing_db'

try:
    # Creation of the engine for the connection to the database
    engine = create_engine(db_url, isolation_level="AUTOCOMMIT")
    with engine.connect() as conn:
        # Check if the database exists
        result = conn.execute(text(f"SELECT 1 FROM pg_database WHERE datname = '{new_db_name}'"))
        exists = result.scalar() is not None

        # If it does eliminate it
        if exists:
            conn.execute(text(f"DROP DATABASE {new_db_name} WITH (FORCE)"))
            print(f"Database '{new_db_name}' deleted successfully.")

        # Create the database
        conn.execute(text(f"CREATE DATABASE {new_db_name}"))
        print(f"Database '{new_db_name}' successfully created.")

except Exception as e:
    print(f"Errore: {e}")


Database 'ames_housing_db' deleted successfully.
Database 'ames_housing_db' successfully created.


In [71]:
# Connect to postgreSQL
engine = create_engine('postgresql://postgres:unimib@localhost/ames_housing_db')

# Upload the two tables into the database
train_df.to_sql('housing_data_train', engine, index=False, if_exists='replace')
test_df.to_sql('housing_data_test', engine, index=False, if_exists='replace')


235

We already create a table called results here so that we can then use the UPDATE query to showcase how it's done

In [72]:
results_df = pd.read_csv('DATA/sample_submission.csv')

# Rename 'SalePrice' column to 'SalePrice_actual' if necessary
results_df.rename(columns={'SalePrice': 'SalePrice_actual'}, inplace=True)
results_df["SalePrice_predicted"] = float('NaN')
results_df["AbsError"] = float('NaN')
results_df["PercentError"] = float('NaN')

table_name = 'housing_data_prediction'

# Insert half of the results into the table so that we can show the UPDATE
results_df.to_sql(table_name, engine, if_exists='replace', index=False)

459

### Predictive model

In [73]:
# Load the test data from PostgreSQL
def load_data(engine, table_name, data_description):
    query = f'SELECT * FROM {table_name}'
    data = pd.read_sql(query, engine)

    # Change all the nan in those columns to the default NA
    columns_to_check = get_NA_columns(data_description)
    for col in columns_to_check:
        if col in data:
            data[col] = data[col].fillna("NA")

    return data

In [74]:
# Prepare the data for the predictive model, it can't use string so it's trasformed into a number
def prepare_data_for_prediction(data):
    # Transform string variables into numerical values
    for col in data.columns:
        if col in ordinal_encodings:
            data[col] = data[col].map(lambda x: ordinal_encodings[col].get(x, 0))
        elif data[col].dtype == 'object':
            lbl = LabelEncoder()
            data[col] = lbl.fit_transform(data[col].astype(str))
            
    # Multiply each column by its corresponding weight
    for col, weight in weights.items():
        if col in data.columns:
            data[col] = data[col] * weight
    return data


In [75]:
# Save the trained model
def save_model(best_model, model_file):
    joblib.dump(best_model, model_file)

# Load the trained model
def load_model(model_file):
    model = joblib.load(model_file)
    return model

In [76]:
# Find the best params for the predictive model using a Bayesian optimization
def find_best_params(data):
    # Split data into features (X) and target (y)
    X = data.drop(['SalePrice', 'Id'], axis=1)
    y = data['SalePrice']

    # Define the hyperparameters for the Bayesian optimization
    search_space = {
        'n_estimators': Integer(100, 400),
        'learning_rate': Real(0.01, 1.0, 'log-uniform'),
        'max_depth': Integer(3, 10),
        'min_samples_split': Integer(2, 20),
        'min_samples_leaf': Integer(1, 10),
        'max_features': Categorical(['sqrt', 'log2'])
    }

    # Split data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Initialize the Gradient Boosting Regressor
    regressor = GradientBoostingRegressor(random_state=42)

    # Initialize BayesSearchCV the regressor and the hyperparamters
    opt = BayesSearchCV(
        regressor,
        search_space,
        n_iter=50,
        scoring='neg_mean_squared_error',
        cv=5,
        random_state=42
    )

    # Find the best parameters
    opt.fit(X_train, y_train)

    print("Best parameters found: ", opt.best_params_)
    return opt.best_params_

In [77]:
# Try to create a prediction out of different models and return the best one
def train_model(data, model_file):
    # Split data into features (X) and target (y)
    X = data.drop(['SalePrice', 'Id'], axis=1)
    y = data['SalePrice']

    # Split data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    # Find the best parameters for the GradientBoostingRegressor using a Bayesian optimization
    best_params = find_best_params(data)

    # Initialize the Gradient Boosting Regressor with the best parameters
    gb_reg = GradientBoostingRegressor(**best_params, random_state=42)

    # Train the model
    gb_reg.fit(X_train, y_train)

    # Predict on the training set
    y_pred_train = gb_reg.predict(X_train)
    mse_train = mean_squared_error(y_train, y_pred_train)
    print(f'Model trained with MSE: {mse_train}')

    save_model(gb_reg, model_file)

    return


In [78]:
table_name = 'housing_data_train'

data_train = load_data(engine, table_name, data_description)

In [79]:
model_file = 'DATA/model.pkl'
data_train = prepare_data_for_prediction(data_train)

if(RETRAIN_MODEL):
    train_model(data_train, model_file)

# Given the best predictive model for this data make a prediction
# Load the trained model
model = load_model(model_file)

In [80]:
# Predict the data and save it into a database
def predict_results(model, data):
    data = prepare_data_for_prediction(data)
    ids = data['Id'].values
    predictions = model.predict(data.drop(['Id'], axis=1))
    results = pd.DataFrame({'Id': ids, 'SalePrice': predictions})
    return results

In [81]:
# Load the test data
table_name = 'housing_data_test'
data_test = load_data(engine, table_name, data_description)

# Try to predict with the test dataFrame
predicted_data = predict_results(model, data_test)

# Rename 'SalePrice' column to 'SalePrice_actual' if necessary
predicted_data.rename(columns={'SalePrice': 'SalePrice_predicted'}, inplace=True)

In [82]:
# Update operation using SQLAlchemy's text() function
table_name = 'housing_data_prediction'

with engine.connect() as connection:
    for index, row in predicted_data.iterrows():
        update_query = text(f"""
            UPDATE {table_name}
            SET "SalePrice_predicted" = :sale_price_predicted
            WHERE "Id" = :id
        """)
        
        params = {
            'id': row['Id'],
            'sale_price_predicted': row['SalePrice_predicted']
        }
        
        connection.execute(update_query, params)
    # Commit the transaction explicitly
    connection.commit()
    print("Update operation completed successfully.")

Update operation completed successfully.


In [83]:
# Load the newly merged results
table_name = 'housing_data_prediction'
merged_results = load_data(engine, table_name, data_description)

In [84]:
# Find the absolute and relative error
merged_results['AbsError'] = abs(merged_results['SalePrice_actual'] - merged_results['SalePrice_predicted'])
merged_results['PercentError'] = (merged_results['AbsError'] / merged_results['SalePrice_actual']) * 100

In [85]:
# Update the database one final time
table_name = 'housing_data_prediction'
with engine.connect() as connection:
    for index, row in merged_results.iterrows():
        update_query = text(f"""
            UPDATE {table_name}
            SET "AbsError" = :abs_error, "PercentError" = :percent_error
            WHERE "Id" = :id
        """)
        
        params = {
            'id': row['Id'],
            'abs_error': row['AbsError'],
            'percent_error': row['PercentError']
        }
        
        connection.execute(update_query, params)
    # Commit the transaction explicitly
    connection.commit()
    print("Update operation completed successfully.")

Update operation completed successfully.


In [86]:
# Calculate the mean of the different errors
mean_error = merged_results['PercentError'].mean()

# Find the number of prediction with error greater than 20%
high_error_count = (merged_results['PercentError'] > 20).sum()

# Total number of prediction
total_predictions = merged_results.shape[0]

# Print the worst 10 results
print(merged_results.sort_values(by='PercentError', ascending=False).head(10))

        Id  SalePrice_actual  SalePrice_predicted       AbsError  PercentError
1355  2816     190054.639237        298201.177237  108146.537999     56.902867
806   2267     158011.114890        241572.006121   83560.891231     52.882920
112   1573     162015.318512        243754.882352   81739.563841     50.451750
685   2146     162820.928265        244023.731167   81202.802903     49.872460
1157  2618     218648.131133        323100.945353  104452.814220     47.772105
105   1566     170401.630997        249197.528196   78795.897199     46.241281
1053  2514     160927.610449        234732.414357   73804.803908     45.862114
1343  2804     205328.078219        298201.177237   92873.099017     45.231563
276   1737     175943.371817        252202.538429   76259.166613     43.343018
203   1664     170935.859218        244004.407132   73068.547914     42.746179


In [87]:
print(f'\nPercentage of errors: {mean_error:.2f}%')


Percentage of errors: 11.38%


In [88]:
print(f'Number of prediction with errors > 20%: {high_error_count} su {total_predictions} ({high_error_count / total_predictions:.2%})')

Number of prediction with errors > 20%: 208 su 1459 (14.26%)


In [89]:
# Dispose engine to close connections
engine.dispose()

### Contributors

**Roberto**, **Boccaccio** e **869135**.
**Weronika**, **Pyrka** e **908115**.
