# <font color=darkblue> Machine Learning model deployment with Flask framework on Heroku</font>

## <font color=Blue>Used Cars Price Prediction Application</font>

### Objective:
1. To build a Machine learning regression model to predict the selling price of the used cars based on the different input features like fuel_type, kms_driven, type of transmission etc.
2. Deploy the machine learning model with flask framework on heroku.

### Dataset Information:
#### Dataset Source: https://www.kaggle.com/datasets/nehalbirla/vehicle-dataset-from-cardekho?select=CAR+DETAILS+FROM+CAR+DEKHO.csv
This dataset contains information about used cars listed on www.cardekho.com
- **Car_Name**: Name of the car
- **Year**: Year of Purchase
- **Selling Price (target)**: Selling price of the car in lakhs
- **Present Price**: Present price of the car in lakhs
- **Kms_Driven**: kilometers driven
- **Fuel_Type**: Petrol/diesel/CNG
- **Seller_Type**: Dealer or Indiviual
- **Transmission**: Manual or Automatic
- **Owner**: first, second or third owner


### 1. Import required libraries

In [1]:
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import KFold, cross_validate
from sklearn.metrics import r2_score
import warnings
warnings.filterwarnings(action='ignore')

### 2. Load the dataset

In [3]:
df = pd.read_csv('car+data.csv')

### 3. Check the shape and basic information of the dataset.

In [4]:
df.head()

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
0,ritz,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0
1,sx4,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0
2,ciaz,2017,7.25,9.85,6900,Petrol,Dealer,Manual,0
3,wagon r,2011,2.85,4.15,5200,Petrol,Dealer,Manual,0
4,swift,2014,4.6,6.87,42450,Diesel,Dealer,Manual,0


In [5]:
df.shape

(301, 9)

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 301 entries, 0 to 300
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Car_Name       301 non-null    object 
 1   Year           301 non-null    int64  
 2   Selling_Price  301 non-null    float64
 3   Present_Price  301 non-null    float64
 4   Kms_Driven     301 non-null    int64  
 5   Fuel_Type      301 non-null    object 
 6   Seller_Type    301 non-null    object 
 7   Transmission   301 non-null    object 
 8   Owner          301 non-null    int64  
dtypes: float64(2), int64(3), object(4)
memory usage: 21.3+ KB


### 4. Check for the presence of the duplicate records in the dataset? If present drop them

In [7]:
df[df.duplicated()]

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
17,ertiga,2016,7.75,10.79,43000,Diesel,Dealer,Manual,0
93,fortuner,2015,23.0,30.61,40000,Diesel,Dealer,Automatic,0


In [8]:
df2 = df.drop_duplicates()

In [9]:
df2.shape

(299, 9)

### 5. Drop the columns which you think redundant for the analysis.

In [10]:
df2=df2.drop(['Owner','Seller_Type'],axis=1)

In [11]:
df2.head()

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Transmission
0,ritz,2014,3.35,5.59,27000,Petrol,Manual
1,sx4,2013,4.75,9.54,43000,Diesel,Manual
2,ciaz,2017,7.25,9.85,6900,Petrol,Manual
3,wagon r,2011,2.85,4.15,5200,Petrol,Manual
4,swift,2014,4.6,6.87,42450,Diesel,Manual


### 6. Extract a new feature called 'age_of_the_car' from the feature 'year' and drop the feature year

In [12]:
df2['age_of_the_car'] = 2023-df2['Year']

In [13]:
df2=df2.drop(['Year'],axis=1)

In [14]:
df2.head()

Unnamed: 0,Car_Name,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Transmission,age_of_the_car
0,ritz,3.35,5.59,27000,Petrol,Manual,9
1,sx4,4.75,9.54,43000,Diesel,Manual,10
2,ciaz,7.25,9.85,6900,Petrol,Manual,6
3,wagon r,2.85,4.15,5200,Petrol,Manual,12
4,swift,4.6,6.87,42450,Diesel,Manual,9


### 7. Encode the categorical columns

In [15]:
from sklearn.preprocessing import LabelEncoder

cate=['Car_Name','Fuel_Type','Transmission']
lbl_encode = LabelEncoder()
for i in cate:
    df2[i]=df2[[i]].apply(lbl_encode.fit_transform)

In [16]:
df2.head()

Unnamed: 0,Car_Name,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Transmission,age_of_the_car
0,90,3.35,5.59,27000,2,1,9
1,93,4.75,9.54,43000,1,1,10
2,68,7.25,9.85,6900,2,1,6
3,96,2.85,4.15,5200,2,1,12
4,92,4.6,6.87,42450,1,1,9


### 8. Separate the target and independent features.

In [17]:
X = df2.drop('Selling_Price',axis=1)
y = df2['Selling_Price']

### 9. Split the data into train and test.

In [18]:
X = df2.drop('Selling_Price',axis=1)
y = df2['Selling_Price']

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=0)

print(X_train.shape,X_test.shape)
print(y_train.shape,y_test.shape)

(209, 6) (90, 6)
(209,) (90,)


### 10. Build a Random forest Regressor model and check the r2-score for train and test.

In [20]:
X_train.head()

Unnamed: 0,Car_Name,Present_Price,Kms_Driven,Fuel_Type,Transmission,age_of_the_car
283,69,11.8,9010,2,1,7
45,68,12.04,15000,2,0,9
243,84,7.6,7000,2,1,7
191,10,0.57,25000,2,1,11
154,61,0.88,8000,2,1,9


In [21]:
X_test.head()

Unnamed: 0,Car_Name,Present_Price,Kms_Driven,Fuel_Type,Transmission,age_of_the_car
208,84,8.1,3435,2,1,6
190,14,0.75,60000,2,1,15
12,68,9.94,15000,2,0,8
221,84,6.79,32000,2,0,10
239,75,4.43,23709,2,1,11


In [22]:
# Building Random forest regressor
rf = RandomForestRegressor()
rf.fit(X_train,y_train)

In [23]:
# Checking r2 score
y_train_pred = rf.predict(X_train)
y_test_pred = rf.predict(X_test)

r2_train = r2_score(y_train,y_train_pred)
r2_test = r2_score(y_test,y_test_pred)

print('r2-score train:',r2_train)
print('r2-score test',r2_test)

r2-score train: 0.9772395084069588
r2-score test 0.9040683952959521


### 11. Create a pickle file with an extension as .pkl

In [24]:
import pickle
# Saving model to disk
pickle.dump(rf, open('model.pkl','wb'))

# Loading model to compare the results
model = pickle.load(open('model.pkl','rb'))

### 12. Create new folder/new project in visual studio/pycharm that should contain the "model.pkl" file *make sure you are using a virutal environment and install required packages.*

### a) Create a basic HTML form for the frontend

Create a file **index.html** in the templates folder and copy the following code.

In [None]:
<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="UTF-8">
    <title>Predict Car Sell Price</title>
</head>

<body>
    <h1>Car Details</h1>
    <form action="{{ url_for('predict')}}" method="POST">
        <p><input type="text" label="Car Name" name="car_name" placeholder="Car Name"></p>
        <p><input type="text" label='Present Price' name="present_price" placeholder="Present Price"></p>
        <p><input type="text" label='Kms Driven' name="kms_driven" placeholder="Kms Driven"></p>
        <p><input type="text" label='Fuel Type' name="fuel_type" placeholder="Fuel Type"></p>
        <p><input type="text" label='Transmission' name="transmission" placeholder="Transmission"></p>
        <p><input type="text" label='Age of Car' name="age_of_car" placeholder="Age of Car"></p>

        <p><input type="submit" value="Submit">
    </form>
    <div><strong> <span style="color: #0a0506">{{ prediction_text }}</span></strong></div>
</body>

</html>

### b) Create app.py file and write the predict function

In [None]:
import pickle
from flask import Flask, request, jsonify, render_template
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
ss = StandardScaler()

app = Flask(__name__)

model = pickle.load(open(r'model.pkl', 'rb'))


@app.route('/')
def home():
    return render_template('index.html')


@app.route('/predict', methods=['POST'])
def predict():
    '''
    For rendering results on HTML GUI
    '''
    int_features = [int(x) for x in request.form.values()]
    final_features = [np.array(int_features)]
    scaled_final_features = ss.fit_transform(final_features)
    prediction = model.predict(scaled_final_features)

    output = round(prediction[0], 2)

    return render_template('index.html', prediction_text='Price of the Car is {} dollars'.format(output))


if __name__ == "__main__":
    app.run(debug=True)
    app.config['TEMPLATES_AUTO_RELOAD'] = True


### 13. Deploy your app on Heroku. (write commands for deployment)

Commands are as follows: 
* heroku login 
* heroku create carprice-predictor 
* Then do a pip install gunicorn in the project folder 
* echo>Procfile 
* web: gunicorn app:app --logfile - 
* Then create runtime file and paste the python version details 
* pip freeze>requirements 
* git init 
* git add . 
* git commit -m 'first commit' 
* heroku git:remote -a carprice-predictor 
* git push heroku master

### 14. Paste the URL of the heroku application below, and while submitting the solution submit this notebook along with the source code.

App could not be uploaded since account needed payment details to be added and was not accepting india issued cards

### Happy Learning :)