Skip to content

This project began as motivation to learn Machine Learning Algorithms and learn various data preprocessing techniques such as Exploratory Data Analysis, Feature Engineering, Feature Selection, Feature Scaling, and finally to build machine learning models.

License

Notifications You must be signed in to change notification settings

zulkarnainprastyo/deploy-ml_boston-house-price

Repository files navigation

Deploy-ML_Boston-House-Price

image

Background Project

The main purpose of the Machine Learning Process course is to prepare in applying end to end machine learning workflows into real-world tasks, starting from business problems to service deployments.

This case study is based on the famous Boston housing data. It contains the details of 506 houses in the Boston city. Your task is to create a machine learning model which can predict the average price of house based on its characteristics. In the below case study I will discuss the step by step approach to create a Machine Learning predictive model in such scenarios. You can use this flow as a template to solve any supervised ML Regression problem! The flow of the case study is as below:

* Reading the data in python

* Defining the problem statement

* Identifying the Target variable

* Looking at the distribution of Target variable

* Basic Data exploration

* Rejecting useless columns

* Visual Exploratory Data Analysis for data distribution (Histogram and Barcharts)

* Feature Selection based on data distribution

* Outlier treatment

* Missing Values treatment

* Visual correlation analysis

* Statistical correlation analysis (Feature Selection)

* Converting data to numeric for ML

* Sampling and K-fold cross validation

* Trying multiple Regression algorithms

* Selecting the best Model

* Deploying the best model in production

Data Description


The business meaning of each column in the data is as below:

  • CRIM - per capita crime rate by town
  • ZN - proportion of residential land zoned for lots over 25,000 sq.ft.
  • INDUS - proportion of non-retail business acres per town.
  • CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)
  • NOX - nitric oxides concentration (parts per 10 million)
  • RM - average number of rooms per dwelling
  • AGE - proportion of owner-occupied units built prior to 1940
  • DIS - weighted distances to five Boston employment centres
  • RAD - index of accessibility to radial highways
  • TAX - full-value property-tax rate per 10,000 dollars
  • PTRATIO - pupil/teacher ratio by town
  • B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
  • LSTAT - % lower status of the population

Work Instructions


Step 1. Select Dataset
Step 2. Statement of Business Problems
Step 3. Implement Endo to End Machine Learning Workflow
Step 4. Perform and Summary Analysis

Outcome Project


* Block data preparation diagram

Lets check the description of the dataset: print(boston.DESCR):

.. _boston_dataset:

Boston house prices dataset

Data Set Characteristics:

:Number of Instances: 506 

:Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

:Attribute Information (in order):
    - CRIM     per capita crime rate by town
    - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
    - INDUS    proportion of non-retail business acres per town
    - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
    - NOX      nitric oxides concentration (parts per 10 million)
    - RM       average number of rooms per dwelling
    - AGE      proportion of owner-occupied units built prior to 1940
    - DIS      weighted distances to five Boston employment centres
    - RAD      index of accessibility to radial highways
    - TAX      full-value property-tax rate per $10,000
    - PTRATIO  pupil-teacher ratio by town
    - B        1000(Bk - 0.63)^2 where Bk is the proportion of black people by town
    - LSTAT    % lower status of the population
    - MEDV     Median value of owner-occupied homes in $1000's

:Missing Attribute Values: None

:Creator: Harrison, D. and Rubinfeld, D.L.

This is a copy of UCI ML housing dataset. https://archive.ics.uci.edu/ml/machine-learning-databases/housing/

This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.

The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978. Used in Belsley, Kuh & Welsch, 'Regression diagnostics ...', Wiley, 1980. N.B. Various transformations are used in the table on pages 244-261 of the latter.

The Boston house-price data has been used in many machine learning papers that address regression problems.

.. topic:: References

  • Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.
  • Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.

image

image

* Block preprocessing diagram

X_train

image

X_test

image

* Block diagram features engineering.

image

image
image
image
image
image

* Block diagram modeling and its evaluation

X_train

image

X_test

image

* Format Message to make predictions via API

image

* Message response format of API

* Flask
* sklearn
* pandas
* numpy
* matplotlib
* seaborn
* gunicorn

* How to run machine learning services on a local computer:

* Retraining model

image

image

* Running API

image

Software And Tools Requirements

  1. Github Account
  2. HerokuAccount
  3. VSCodeIDE
  4. GitCLI

Create a new environment

conda create -p venv python==3.7 -y

About

This project began as motivation to learn Machine Learning Algorithms and learn various data preprocessing techniques such as Exploratory Data Analysis, Feature Engineering, Feature Selection, Feature Scaling, and finally to build machine learning models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages