Skip to content

xGabrielR/Plus-House-Sales

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Plus House Sales

plus

0. Plus Houses Info

Plus House are a real state located in south of Brazil, with around 10 employees. The company was founded by Jeorge Rodrigo.

What is a Real State Company ?
Bassicaly the company work with properties like houses in diferent styles, duples, single-family, townhouse end unit and much more.

In other Words, The company receives property sellers of different categories (studio, house, flat, kitnets... ), depending on the characteristics of the property, the company itself buys the property to resell on the Site. The customer search on the site or mobile application the property of your choice, perform the registration and from there it enters a process called waterfall (Visit, Imagens of choiced portfolio...). Basically the money comes mainly for customer experience buying the property. That's why a lot of technology is important for the part of pricing these properties and even segmenting a list of possible customers who access the site.

During a Brainstorm meeting, a new problem raised brought up by one of the brokers, the price variation is very different from the other portfolios, causing doubt in the broker who receives these pricings and in the seller who does not understand why his property was classified in a specific price range.

How to better price the prices of properties already enabled for sale ?

How to predict the price of properties giving their characteristics ?

CEO wants to see the price of some specific properties he plans to sell in 2010.

Real State Bussiness Model Plus house has a slow business model, as a person usually buys only one property at a time and it can often be the biggest purchase of their life. That is, in the sale of a property there is a team prepared at all stages of the purchase, the marketing stage, visitation, brokers, designers, engineers and even the team that will carry out the repair of the property.

Metrics & Assumptions

  • Market Share: Other enterprises in the region.
  • Customers & Market Size: People over > 25 years old.
  • Marketing Channel:
    1. Offline: Physical Agencies, interviews...
    2. Online: Plus Houses App, Regional Marketing.
  • Customers:
    1. Main Objective is on New Customers.
  • Website and App: 1 Page Speed Score and Bouce Rate is very important! (If the site / app is bad, the customer can never come back.)
  1. Customers have good experience in App or Website ?
  2. Older Customers can navigate on Site or App ?
  3. One customer can get back and buy another portfolio* ?
  4. How is the experience of customers on buying and selling process ?
  5. How is the process of visiting in the portfolios ?
  6. The Img in App / Site have a good view quality ? ...

The Dataset Base House Prices at Kaggle.

First Deploy is Web App

plus_app.mp4

At This Link

Second Deploy executable software.

In Dev

1. Bussines Problem

Plus House CEO would like to predict how much cost the properties of your choice from 2010.

2. Solution Strategy & Assumptions

First CRISP Cycle

2.1. After Braimstorm Interview

How to better price the prices of properties already enabled for sale ?

How to predict the price of properties giving their characteristics ?

2.2. Data Product

A.I Model to forecast the sales at smartphone

    Data Clearing & Descriptive Statistical.
    First real step is download the dataset, import in jupyter and start in seven steps to change data types, data dimension, fillout na... At first statistic dataframe, i used simple statistic descriptions to check how my data is organized.
    Feature Engineering.
    In this step, with coggle.it to make a mind map and use the mind map to create some hypothesis list, after this list, i created some new features based on month and Lot Size.
    Data Filtering.
    Simple way to reduce dimensionality of dataset, because dataset have 81 features and aprox 1400 rows, grave problem.
    Exploration Data Analysis.
    Validation of all hypotesis list with data and individual 81 Features.
    Data Preparation.
    Prepare and Split, used base of Year 2010.
    Machine Learning Modeling.
    Selection of Four ML Models, Base, Linear and two Tree-Based.

Second CRISP Cycle

In second Cycle, i focus on Feature Engineering creating more five Features to train the model, one of them i have droped.

I have used same data preparation of the First Cycle, in next Cycles i can change the encoding and create new Features.

3. EDA Insight's

After brainstorming and hypothesis validation, some insights appeared.

Top 3 Insight's

  • Plus House dont sell more per year.
  • year

  • The Excellent Quality dont have Greater prices of properties.
  • quality

  • In Timberland Neighborhood have grater prices of properties.
  • timberland

4. Data Preparation

Used individual data preparation for feature selection.

    Categorical Data.
    Used the Frequency Encoding and Ordinal Encoding for all Categorical Data.
    Normalization.
    After QQplot, it was not necessary to normalize, because dont have normal distribution.
    Nature Transformation.
    Working with Sin/Cos for month data.

4.1. Frequency Encoding

It is an encoder method that takes into account the number of times the value appears, for example in 10 records, 5 of which are blue and red, so the frequency is .5%

4.2. QQPlot

With QQPlot Quantile-Quantile Plot it is possible to observe how close the tested distribution is to a normal distribution, the normal distribution is characterized when blue line is equal to red line, there are other ways of doing this verification such as statistical tests, among others.

f

4.3. Feature Selection

XGBoost & Random Forest Feature Importance is a fast and good way to see which feature is important, feature selection is a second way to select features for better performace of model and following the principles of Occam's Razor.

XGBoost Feature Selection on Second Cycle.

xgb

  1. Overall Quall: Suggestion of XGBoost and Random Forest and have a positive correlation.
  2. Exter Qual (Evaluates the quality of the material on the exterior): Suggestion of XGBoost, with Ordinal Encoder its haved a good Importance.
  3. Total Sqft: Suggestion of XGBoost, feature engineering (living_area + bsmt).
  4. Total Abv Grade: Feature Engineering Feature.
  5. Total Bath: Feature Engineering feature.
  6. Garage Multy Car: Feature Engineering Feature.
  7. Land Slope: Visual Linear dependence with Sales, people prefer houses without Slope.
  8. Bldg Type: Type of House.
  9. Exter Cond: Exterior of House Condition.
  10. Neighborhood: Neighborhood of House if located.
  11. Central Air: Have Central air or Not.
  12. Garage Finish: Have or no a finished agarage on House.
  13. Condition2: Geral condition of the house.
  14. Foundation: The of Foundation of the House.
  15. Bsmt Cond: Overall condition of Basement.
  16. Heating Qc: Quall of method Heating (Dense Glass...)
  17. Paved Drive: Type of Paved Driveway ( Dirt, Partial or Paved )
  18. Fireplace Qu: Quall of Fireplace of the house.

5. Machine Learning Models

I have used three models, SVR (Support Vector Regression), Random Forest and XGBoost (Gradient boosted decision tree).

models_performace

Model Performace on Second Cycle.

model_performace

I have selected the XGBoost than all of other two for production, in the step of hyperparameter fine tuning I used a tuning technique called Random Search and tested the trained model in the dataset with data leakage and in the dataset without data leakage. The information are in Notebook m03_machine_learningII.

model_c

Second Cycle Tuned Model.

model_tunned

6. Bussiness Results

This istep is to convert the model performace in money!!.

Below have model performace for two of the mos harder shops to forecast, there are stores where the algorithm cannot predict sales, so the RMSE error was high. MAE error be greater too, to avoid this is train more the model and work on better features. Have two columns, worst & best scenario, this columns is the sum and subtraction respectively os MAE for each model forecast.

Below have the Sum of sales for each senario.

result

In a most Realistic Scenario on Second Cycle

sales

First and Second Cycles -> Error Rate of Model

second_results

7. Second Cycle Resume

The Objective of this second Cycle is Feature Engineering and EDA focusing on Indivudual 81 individual feature analysis.

    Reduced MAPE error of Property 812 from 0.92 to 0.38.
    The Greater MAPE is 0.61 on Dataset, reduction of 0.33 of MAPE error rate.

8. Model Deployment

For deployment i selected Heroku for base clound 24/7h free.

Made a Streamlit App for CFO to check the sales on smartphone and desktop.

9. References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published