<h1 style="text-align: center; font-family: 'Times New Roman', serif; font-weight: bold; font-size: 60px;">Predicting Apartments Prices in Buenos Aires, Argentina</h1>


<div style="width: 90%; font-family: Times New Roman, serif; text-justify: inter-word; margin: 0 auto; font-size: 20px; text-align: justify;">

## **Summary**

The `RealEstateApp` class is a Dash-based application designed for predicting real estate prices, orchestrating various components to provide a seamless user experience. The class builds a comprehensive web interface that allows users to input details such as surface area and geographic coordinates. Upon receiving these inputs, the application retrieves the address and predicts the property price using a pre-trained machine learning model. The user interface is carefully crafted, incorporating various sections such as form inputs for property details, a map for selecting coordinates, and an output area that displays the predicted price along with detailed information about the property's municipality.

At the core of the `RealEstateApp` is the `CoordinateConverter` class, which handles the retrieval of addresses based on latitude and longitude. This class uses the Nominatim geolocator from the Geopy library to convert coordinates into human-readable addresses. The `RealEstateApp` leverages this converter to enrich the prediction process by validating whether the selected location falls within a specific region (e.g., Buenos Aires) and providing contextual information about the municipality, including statistical summaries. These functionalities are tied together through a series of Dash callbacks, which ensure that the user interface responds dynamically to user actions, such as clicking on the map or requesting a price prediction. The app also employs caching to improve performance, ensuring that frequently accessed data like addresses and predictions are readily available without redundant processing.

Overall, the `RealEstateApp` and `CoordinateConverter` classes work in harmony to offer an interactive and efficient tool for real estate price prediction. The `RealEstateApp` manages the overall user experience, integrating layout, input handling, and data visualization, while the `CoordinateConverter` supports the app by providing essential address information, enhancing the accuracy and relevance of the predictions.

</div>

<div style="width: 90%; font-family: Times New Roman, serif; text-justify: inter-word; margin: 0 auto; font-size: 20px;">

## **Wrangling the data**


In [1]:
from classes.RealEstateDataWragler import DataWrangler

data_url = DataWrangler('data/buenos-aires-real-estate.csv')
df = data_url.wrangle()

2024-08-13 20:01:06,851 - DEBUG - Reading CSV file from data/buenos-aires-real-estate.csv
2024-08-13 20:01:07,046 - DEBUG - Subsetting data: Selecting only Apartments in Capital Federal, less than $400,000
2024-08-13 20:01:07,085 - DEBUG - Removing outliers: Strategy --> Quantiles based on get_quantile_category() analysis
2024-08-13 20:01:07,087 - DEBUG - Extracting relevant information from columns
2024-08-13 20:01:07,106 - DEBUG - Dropping null columns: Strategy --> get_nan_columns() analysis (50%+ null columns will be dropped.)
2024-08-13 20:01:07,110 - DEBUG - Dropping irrelevant columns.
2024-08-13 20:01:07,115 - DEBUG - Removing empty rows from municipality column.
2024-08-13 20:01:07,119 - DEBUG - Dropping null values.
2024-08-13 20:01:07,121 - DEBUG - Wrangling complete, returning dataframe with proper headings


<div style="width: 90%; font-family: Times New Roman, serif; text-justify: inter-word; margin: 0 auto; font-size: 20px;">

## **Training the model**

In [2]:
from classes.RealEstatePredictor import RealEstateRegressor

regressor = RealEstateRegressor(df)
regressor.run_pipeline()

2024-08-13 20:01:08,249 - DEBUG - RealEstateRegressor initialized.
2024-08-13 20:01:08,250 - DEBUG - Pipeline started.
2024-08-13 20:01:08,250 - DEBUG - Processing data...
2024-08-13 20:01:08,252 - DEBUG - Features and target variable prepared.
2024-08-13 20:01:08,255 - DEBUG - Data split into training and testing sets.
2024-08-13 20:01:08,258 - DEBUG - OLS regression model fitted to identify influential points.
2024-08-13 20:01:11,859 - DEBUG - Influential points removed from training data.
2024-08-13 20:01:11,860 - DEBUG - Box-Cox transformation applied to features.
2024-08-13 20:01:11,863 - DEBUG - Data scaled using StandardScaler.
2024-08-13 20:01:11,864 - DEBUG - Data processing complete.
2024-08-13 20:01:11,864 - DEBUG - Fitting model...
2024-08-13 20:01:15,313 - DEBUG - Gradient Boosting Regressor model fitted.
2024-08-13 20:01:15,313 - DEBUG - Model fitting complete.
2024-08-13 20:01:15,314 - DEBUG - Scaler stored at results/scaler.pkl.
2024-08-13 20:01:15,338 - DEBUG - Model s

<div style="width: 90%; font-family: Times New Roman, serif; text-justify: inter-word; margin: 0 auto; font-size: 20px;">

## **Restoring the model**

In [3]:
import pickle

model = pickle.load(open('results/model.pkl', 'rb'))
scaler = pickle.load(open('results/scaler.pkl', 'rb'))

<div style="width: 90%; font-family: Times New Roman, serif; text-justify: inter-word; margin: 0 auto; font-size: 20px;">

## **Deploying the app**

In [4]:
from classes.RealEstateAppDeployer import RealEstateApp

app = RealEstateApp(df, model, scaler).run()

2024-08-13 20:01:16,473 - DEBUG - Converted retries value: 2 -> Retry(total=2, connect=None, read=None, redirect=None, status=None)
2024-08-13 20:01:16,474 - DEBUG - Converted retries value: 2 -> Retry(total=2, connect=None, read=None, redirect=None, status=None)
2024-08-13 20:01:16,531 - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8050
2024-08-13 20:01:16,537 - DEBUG - http://127.0.0.1:8050 "GET /_alive_8a6d6a06-f298-4c64-adc8-04322d2664ee HTTP/11" 200 5
