## Problem Statement

You are tasked with predicting the price of houses based on various features such as the area of the house, the number of bedrooms and bathrooms, parking spaces, and more. Your objective is to explore different machine learning models, including Linear Regression, Decision Tree, and Random Forest, to predict house prices and identify the key features that influence the price the most.

### Tasks:

- Explore and understand the dataset through visualizations and statistical analysis.
- Build and evaluate different machine learning models for price prediction.
- Share insights on the most important features affecting house prices using feature importance from decision tree and random forest models.
- Compare Linear Regression and Random forest accuracy measures

### Data Dictionary

| **Index** | **Feature**        | **Description**                                             |
|-----------|--------------------|-------------------------------------------------------------|
| 1         | `price`            | The price of the house (in local currency)                  |
| 2         | `area`             | Total area of the house (in square feet)                    |
| 3         | `bedrooms`         | Number of bedrooms                                          |
| 4         | `bathrooms`        | Number of bathrooms                                         |
| 5         | `stories`          | Number of stories (floors) in the house                     |
| 6         | `mainroad`         | Whether the house is facing the main road (`yes`/`no`)      |
| 7         | `guestroom`        | Presence of a guest room (`yes`/`no`)                       |
| 8         | `basement`         | Presence of a basement (`yes`/`no`)                         |
| 9         | `hotwaterheating`  | Presence of hot water heating (`yes`/`no`)                  |
| 10        | `airconditioning`  | Presence of air conditioning (`yes`/`no`)                   |
| 11        | `parking`          | Number of parking spaces                                    |
| 12        | `prefarea`         | Whether the house is located in a preferred area (`yes`/`no`)|
| 13        | `furnishingstatus` | Furnishing status of the house (`furnished`/`semi-furnished`/`unfurnished`) |


In [1]:
import pandas as pd

  from pandas.core.computation.check import NUMEXPR_INSTALLED


### DATA FETCH

In [11]:
def get_data():
    data_url = "https://raw.githubusercontent.com/vasudevgupta31/acadamic_datasets/refs/heads/master/Housing_LR.csv"
    data = pd.read_csv(data_url)
    submission_rows = [276, 540, 364, 437, 288, 198, 173, 167,  72, 298, 457,   2, 512,
                       101, 538, 465, 507, 360, 530, 138, 380, 129, 206, 221, 389, 123, 307]
    training_data = data.query("index not in @submission_rows").reset_index(drop=True)
    test_data = data.query("index in @submission_rows").reset_index(drop=True)

    return training_data, test_data

In [12]:
train_df, test_df = get_data()