# <font color=darkblue> Machine Learning model deployment with Flask framework on Heroku</font>

## <font color=Blue>Used Cars Price Prediction Application</font>

### Objective:
1. To build a Machine learning regression model to predict the selling price of the used cars based on the different input features like fuel_type, kms_driven, type of transmission etc.
2. Deploy the machine learning model with flask framework on heroku.

### Dataset Information:
#### Dataset Source: https://www.kaggle.com/datasets/nehalbirla/vehicle-dataset-from-cardekho?select=CAR+DETAILS+FROM+CAR+DEKHO.csv
This dataset contains information about used cars listed on www.cardekho.com
- **Car_Name**: Name of the car
- **Year**: Year of Purchase
- **Selling Price (target)**: Selling price of the car in lakhs
- **Present Price**: Present price of the car in lakhs
- **Kms_Driven**: kilometers driven
- **Fuel_Type**: Petrol/diesel/CNG
- **Seller_Type**: Dealer or Indiviual
- **Transmission**: Manual or Automatic
- **Owner**: first, second or third owner


### 1. Import required libraries

In [3]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
import sklearn
from sklearn.preprocessing import StandardScaler,LabelEncoder, MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score

### 2. Load the dataset

In [4]:
df = pd.read_csv('car+data.csv')
df.head()

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
0,ritz,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0
1,sx4,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0
2,ciaz,2017,7.25,9.85,6900,Petrol,Dealer,Manual,0
3,wagon r,2011,2.85,4.15,5200,Petrol,Dealer,Manual,0
4,swift,2014,4.6,6.87,42450,Diesel,Dealer,Manual,0


### 3. Check the shape and basic information of the dataset.

In [5]:
df.shape

(301, 9)

### 4. Check for the presence of the duplicate records in the dataset? If present drop them

In [6]:
len(df[df.duplicated()])

2

In [7]:
df.drop_duplicates(inplace=True)

In [8]:
len(df[df.duplicated()])

0

### 5. Drop the columns which you think redundant for the analysis.

In [9]:
df = df.drop(columns=['Car_Name'], axis=1)

### 6. Extract a new feature called 'age_of_the_car' from the feature 'year' and drop the feature year

In [10]:
df['age_of_the_car'] = 2023 - df['Year']

In [11]:
df.drop(['Year'], axis = 1, inplace=True)

In [12]:
df

Unnamed: 0,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner,age_of_the_car
0,3.35,5.59,27000,Petrol,Dealer,Manual,0,9
1,4.75,9.54,43000,Diesel,Dealer,Manual,0,10
2,7.25,9.85,6900,Petrol,Dealer,Manual,0,6
3,2.85,4.15,5200,Petrol,Dealer,Manual,0,12
4,4.60,6.87,42450,Diesel,Dealer,Manual,0,9
...,...,...,...,...,...,...,...,...
296,9.50,11.60,33988,Diesel,Dealer,Manual,0,7
297,4.00,5.90,60000,Petrol,Dealer,Manual,0,8
298,3.35,11.00,87934,Petrol,Dealer,Manual,0,14
299,11.50,12.50,9000,Diesel,Dealer,Manual,0,6


In [13]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 299 entries, 0 to 300
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Selling_Price   299 non-null    float64
 1   Present_Price   299 non-null    float64
 2   Kms_Driven      299 non-null    int64  
 3   Fuel_Type       299 non-null    object 
 4   Seller_Type     299 non-null    object 
 5   Transmission    299 non-null    object 
 6   Owner           299 non-null    int64  
 7   age_of_the_car  299 non-null    int64  
dtypes: float64(2), int64(3), object(3)
memory usage: 21.0+ KB


### 7. Encode the categorical columns

In [14]:
labelEncode = LabelEncoder()

In [15]:
# labelEncode.fit_transform()
print(df['Fuel_Type'].unique())
labelEncode.fit(df['Fuel_Type'])
labelEncode.transform(labelEncode.classes_)

['Petrol' 'Diesel' 'CNG']


array([0, 1, 2])

In [16]:
print(df['Seller_Type'].unique())
labelEncode.fit(df['Seller_Type'])
labelEncode.transform(labelEncode.classes_)

['Dealer' 'Individual']


array([0, 1])

In [17]:
print(df['Transmission'].unique())
labelEncode.fit(df['Transmission'])
labelEncode.transform(labelEncode.classes_)

['Manual' 'Automatic']


array([0, 1])

In [18]:
category = ['Fuel_Type', 'Seller_Type', 'Transmission']
labelEncode = LabelEncoder()

for i in category:
    df[i] = df[[i]].apply(labelEncode.fit_transform)


In [19]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 299 entries, 0 to 300
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Selling_Price   299 non-null    float64
 1   Present_Price   299 non-null    float64
 2   Kms_Driven      299 non-null    int64  
 3   Fuel_Type       299 non-null    int32  
 4   Seller_Type     299 non-null    int32  
 5   Transmission    299 non-null    int32  
 6   Owner           299 non-null    int64  
 7   age_of_the_car  299 non-null    int64  
dtypes: float64(2), int32(3), int64(3)
memory usage: 17.5 KB


### 8. Separate the target and independent features.

In [20]:
# we need to predict the selling price based on kms driven, transmission type, age of the car, fuel type
# X = independent features, y=target
# X = df.iloc[:, 2:]
# print(X.shape)
X = df.drop(['Selling_Price'],axis=1)
y = df['Selling_Price']

X

Unnamed: 0,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner,age_of_the_car
0,5.59,27000,2,0,1,0,9
1,9.54,43000,1,0,1,0,10
2,9.85,6900,2,0,1,0,6
3,4.15,5200,2,0,1,0,12
4,6.87,42450,1,0,1,0,9
...,...,...,...,...,...,...,...
296,11.60,33988,1,0,1,0,7
297,5.90,60000,2,0,1,0,8
298,11.00,87934,2,0,1,0,14
299,12.50,9000,1,0,1,0,6


In [21]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=0)

In [22]:
print(X_train.shape,X_test.shape)
print(y_train.shape,y_test.shape)

(209, 7) (90, 7)
(209,) (90,)


### 9. Split the data into train and test.

In [23]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)

### 10. Build a Random forest Regressor model and check the r2-score for train and test.

In [24]:
regressor = RandomForestRegressor(n_estimators=100,random_state=42)
regressor.fit(X_train, y_train)

In [25]:
y_train_pred = regressor.predict(X_train)
y_test_pred = regressor.predict(X_test)

r2_train = r2_score(y_train,y_train_pred)
r2_test = r2_score(y_test,y_test_pred)

print('r2-score train:',r2_train)
print('r2-score test',r2_test)

r2-score train: 0.9863510256793296
r2-score test 0.3091186642496888


### 11. Create a pickle file with an extension as .pkl

In [26]:
import pickle
pickle.dump(regressor, open('../lab5-webapp/model.pkl','wb'))

# # Loading model to compare the results
# model = pickle.load(open('model.pkl','rb'))

### 12. Create new folder/new project in visual studio/pycharm that should contain the "model.pkl" file *make sure you are using a virutal environment and install required packages.*

### a) Create a basic HTML form for the frontend

Create a file **index.html** in the templates folder and copy the following code.

In [None]:
<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="UTF-8">
    <title>Used Cars</title>
    <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-9ndCyUaIbzAi2FUVXJi0CjmCapSmO7SnpJef0486qhLnuZ2cdeRhO02iuK6FUUVM" crossorigin="anonymous">
    <style>
        #firstLine {
            margin-top: 2%;
            margin-bottom: 0%;
            text-align: center
        }

        .formBox{
            margin: auto;
            width: 50%;
            padding: 15px;
        }

        #formStyle{
            display: flex;
            flex-direction: column;
            align-items: flex-start;
            margin:auto;
            width:35%;
            padding: 5%;
        }

        #formStyle button{
            margin: auto;
            border-radius: 12px solid;
        }
        
        h3{
            margin: 0 ;
            text-align: center;
        }

    </style>
</head>

<body>
    <div id="firstLine">
        <h2>The Car Price predictor</h2>
    </div>
    <div class="formBox">
        <form id="formStyle" action="{{ url_for('predict')}}" method="post">
            <label class="form-label">Present Price Of the Car</label>
            <input name="Present_Price" type="number" class="form-control form-control-sm" placeholder="Price in Lakhs"><br>

            <label class="form-label">Kilometers Driven</label>
            <input name="Kms_Driven" type="number" class="form-control form-control-sm"><br>

            <label class="form-label">Owner (0/1/3) </label>
            <input name="Owner" type="number" class="form-control form-control-sm"><br>

            <label class="form-label">Age of the Car</label>
            <input name="age_of_the_car" type="number" class="form-control form-control-sm" placeholder="Age in years"><br>

            <label class="form-label">Fuel Type</label>
            <select class="form-select form-select-sm" name="Fuel_Type">
                <option value="0">Petrol</option>
                <option value="1">Diesel</option>
                <option value="2">CNG</option>
            </select>
            <br>
            <label class="form-label">Seller Type</label>
            <select class="form-select form-select-sm" name="Seller_Type">
                <option value="0">Dealer</option>
                <option value="1">Individual</option>
            </select>
            <br>
            <label class="form-label">Transmission Type</label>
            <select class="form-select form-select-sm" name="Transmission">
                <option value="0">Manual</option>
                <option value="1">Automatic</option>
            </select>
            <br>
            <br>
            <button type="submit" class="btn btn-primary"> Predict Selling Price </button>
        </form>
    </div>
    <div>
        <h3>{{prediction_text}}</h3>
    </div>
</body>

</html>

### b) Create app.py file and write the predict function

In [None]:
from flask import Flask, render_template, request
import pickle

app = Flask(__name__)
model = pickle.load(open('model.pkl', 'rb'))

@app.route('/', methods=['GET'])
def home():
    return render_template('index.html')

@app.route('/predict', methods=['GET','POST'])
def predict():
    if request.method == 'GET':
        return render_template('index.html')
    if request.method == 'POST':
        Present_Price = float(request.form['Present_Price'])
        Kms_Driven = int(request.form['Kms_Driven'])
        Fuel_Type = request.form['Fuel_Type']
        Seller_Type = request.form['Seller_Type']
        Transmission = request.form['Transmission']
        Owner = int(request.form['Owner'])
        age_of_the_car = request.form['age_of_the_car']
        prediction = model.predict([[Present_Price, Kms_Driven, Fuel_Type, Seller_Type, Transmission, Owner, age_of_the_car]])
        output = round(prediction[0], 2)
        return render_template('index.html', prediction_text="You can sell your car at {} lakhs".format(output))

if __name__ == "__main__":
    app.run(debug=True)


### 13. Deploy your app on Heroku. (write commands for deployment)

### 14. Paste the URL of the heroku application below, and while submitting the solution submit this notebook along with the source code.

### Happy Learning :)