# <font color=darkblue> Machine Learning model deployment with Flask framework on Heroku</font>

## <font color=Blue>Used Cars Price Prediction Application</font>

### Objective:
1. To build a Machine learning regression model to predict the selling price of the used cars based on the different input features like fuel_type, kms_driven, type of transmission etc.
2. Deploy the machine learning model with flask framework on heroku.

### Dataset Information:
#### Dataset Source: https://www.kaggle.com/datasets/nehalbirla/vehicle-dataset-from-cardekho?select=CAR+DETAILS+FROM+CAR+DEKHO.csv
This dataset contains information about used cars listed on www.cardekho.com
- **Car_Name**: Name of the car
- **Year**: Year of Purchase
- **Selling Price (target)**: Selling price of the car in lakhs
- **Present Price**: Present price of the car in lakhs
- **Kms_Driven**: kilometers driven
- **Fuel_Type**: Petrol/diesel/CNG
- **Seller_Type**: Dealer or Indiviual
- **Transmission**: Manual or Automatic
- **Owner**: first, second or third owner


### 1. Import required libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score


### 2. Load the dataset

In [2]:
import pandas as pd

# Load the dataset
df = pd.read_csv('car+data.csv')

# Display the first few rows of the dataset
print(df.head())


  Car_Name  Year  Selling_Price  Present_Price  Kms_Driven Fuel_Type   
0     ritz  2014           3.35           5.59       27000    Petrol  \
1      sx4  2013           4.75           9.54       43000    Diesel   
2     ciaz  2017           7.25           9.85        6900    Petrol   
3  wagon r  2011           2.85           4.15        5200    Petrol   
4    swift  2014           4.60           6.87       42450    Diesel   

  Seller_Type Transmission  Owner  
0      Dealer       Manual      0  
1      Dealer       Manual      0  
2      Dealer       Manual      0  
3      Dealer       Manual      0  
4      Dealer       Manual      0  


### 3. Check the shape and basic information of the dataset.

In [3]:
# Check the shape of the dataset
print("Shape of the dataset:", df.shape)

# Check the basic information of the dataset
print("\nBasic Information:")
print(df.info())


Shape of the dataset: (301, 9)

Basic Information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 301 entries, 0 to 300
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Car_Name       301 non-null    object 
 1   Year           301 non-null    int64  
 2   Selling_Price  301 non-null    float64
 3   Present_Price  301 non-null    float64
 4   Kms_Driven     301 non-null    int64  
 5   Fuel_Type      301 non-null    object 
 6   Seller_Type    301 non-null    object 
 7   Transmission   301 non-null    object 
 8   Owner          301 non-null    int64  
dtypes: float64(2), int64(3), object(4)
memory usage: 21.3+ KB
None


In [4]:
print(df.head())

  Car_Name  Year  Selling_Price  Present_Price  Kms_Driven Fuel_Type   
0     ritz  2014           3.35           5.59       27000    Petrol  \
1      sx4  2013           4.75           9.54       43000    Diesel   
2     ciaz  2017           7.25           9.85        6900    Petrol   
3  wagon r  2011           2.85           4.15        5200    Petrol   
4    swift  2014           4.60           6.87       42450    Diesel   

  Seller_Type Transmission  Owner  
0      Dealer       Manual      0  
1      Dealer       Manual      0  
2      Dealer       Manual      0  
3      Dealer       Manual      0  
4      Dealer       Manual      0  


In [5]:
import pandas as pd

df = pd.read_csv('car+data.csv')
car_names = df['Car_Name'].unique()
unique_count = len(car_names)

print("Unique Car Names:")
for car_name in car_names:
    print(car_name)

print("Total Unique Car Names:", unique_count)


Unique Car Names:
ritz
sx4
ciaz
wagon r
swift
vitara brezza
s cross
alto 800
ertiga
dzire
alto k10
ignis
800
baleno
omni
fortuner
innova
corolla altis
etios cross
etios g
etios liva
corolla
etios gd
camry
land cruiser
Royal Enfield Thunder 500
UM Renegade Mojave
KTM RC200
Bajaj Dominar 400
Royal Enfield Classic 350
KTM RC390
Hyosung GT250R
Royal Enfield Thunder 350
KTM 390 Duke 
Mahindra Mojo XT300
Bajaj Pulsar RS200
Royal Enfield Bullet 350
Royal Enfield Classic 500
Bajaj Avenger 220
Bajaj Avenger 150
Honda CB Hornet 160R
Yamaha FZ S V 2.0
Yamaha FZ 16
TVS Apache RTR 160
Bajaj Pulsar 150
Honda CBR 150
Hero Extreme
Bajaj Avenger 220 dtsi
Bajaj Avenger 150 street
Yamaha FZ  v 2.0
Bajaj Pulsar  NS 200
Bajaj Pulsar 220 F
TVS Apache RTR 180
Hero Passion X pro
Bajaj Pulsar NS 200
Yamaha Fazer 
Honda Activa 4G
TVS Sport 
Honda Dream Yuga 
Bajaj Avenger Street 220
Hero Splender iSmart
Activa 3g
Hero Passion Pro
Honda CB Trigger
Yamaha FZ S 
Bajaj Pulsar 135 LS
Activa 4g
Honda CB Unicorn
Hero 

### 4. Check for the presence of the duplicate records in the dataset? If present drop them

In [6]:
# Check for duplicate records
print("Number of duplicate records:", df.duplicated().sum())

# Drop duplicate records
df.drop_duplicates(inplace=True)


Number of duplicate records: 2


### 5. Drop the columns which you think redundant for the analysis.

In [7]:
import pandas as pd



# Drop the 'Car_Name' column
df = df.drop('Car_Name', axis=1)

# Print the modified DataFrame
print(df)


     Year  Selling_Price  Present_Price  Kms_Driven Fuel_Type Seller_Type   
0    2014           3.35           5.59       27000    Petrol      Dealer  \
1    2013           4.75           9.54       43000    Diesel      Dealer   
2    2017           7.25           9.85        6900    Petrol      Dealer   
3    2011           2.85           4.15        5200    Petrol      Dealer   
4    2014           4.60           6.87       42450    Diesel      Dealer   
..    ...            ...            ...         ...       ...         ...   
296  2016           9.50          11.60       33988    Diesel      Dealer   
297  2015           4.00           5.90       60000    Petrol      Dealer   
298  2009           3.35          11.00       87934    Petrol      Dealer   
299  2017          11.50          12.50        9000    Diesel      Dealer   
300  2016           5.30           5.90        5464    Petrol      Dealer   

    Transmission  Owner  
0         Manual      0  
1         Manual      0

### 6. Extract a new feature called 'age_of_the_car' from the feature 'year' and drop the feature year

In [8]:
# Extract 'age_of_the_car' feature
current_year = 2023  # Update with the current year
df['age_of_the_car'] = current_year - df['Year']

# Drop the 'year' feature
df.drop('Year', axis=1, inplace=True)

print(df.info())


<class 'pandas.core.frame.DataFrame'>
Index: 299 entries, 0 to 300
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Selling_Price   299 non-null    float64
 1   Present_Price   299 non-null    float64
 2   Kms_Driven      299 non-null    int64  
 3   Fuel_Type       299 non-null    object 
 4   Seller_Type     299 non-null    object 
 5   Transmission    299 non-null    object 
 6   Owner           299 non-null    int64  
 7   age_of_the_car  299 non-null    int64  
dtypes: float64(2), int64(3), object(3)
memory usage: 21.0+ KB
None


### 7. Encode the categorical columns

In [9]:
from sklearn.preprocessing import LabelEncoder

# Initialize LabelEncoder
label_encoder = LabelEncoder()

# Encode categorical columns using label encoding
df['Fuel_Type'] = label_encoder.fit_transform(df['Fuel_Type'])
df['Seller_Type'] = label_encoder.fit_transform(df['Seller_Type'])
df['Transmission'] = label_encoder.fit_transform(df['Transmission'])
df['Owner'] = label_encoder.fit_transform(df['Owner'])

# Print the encoded dataset
print(df)


     Selling_Price  Present_Price  Kms_Driven  Fuel_Type  Seller_Type   
0             3.35           5.59       27000          2            0  \
1             4.75           9.54       43000          1            0   
2             7.25           9.85        6900          2            0   
3             2.85           4.15        5200          2            0   
4             4.60           6.87       42450          1            0   
..             ...            ...         ...        ...          ...   
296           9.50          11.60       33988          1            0   
297           4.00           5.90       60000          2            0   
298           3.35          11.00       87934          2            0   
299          11.50          12.50        9000          1            0   
300           5.30           5.90        5464          2            0   

     Transmission  Owner  age_of_the_car  
0               1      0               9  
1               1      0             

### 8. Separate the target and independent features.

In [10]:
# Separate the target variable and independent features
X = df.drop('Selling_Price', axis=1)
y = df['Selling_Price']


### 9. Split the data into train and test.

In [11]:
from sklearn.model_selection import train_test_split

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


### 10. Build a Random forest Regressor model and check the r2-score for train and test.

In [12]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split
import pandas as pd

# Encode categorical columns using one-hot encoding
df_encoded = pd.get_dummies(df, drop_first=True)


# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a Random Forest Regressor model
rf_model = RandomForestRegressor()

# Fit the model on the training data
rf_model.fit(X_train, y_train)

# Make predictions on the training and testing data
train_predictions = rf_model.predict(X_train)
test_predictions = rf_model.predict(X_test)

# Calculate R2 score for train and test
train_r2_score = r2_score(y_train, train_predictions)
test_r2_score = r2_score(y_test, test_predictions)

# Print the R2 scores
print("R2 score for train:", train_r2_score)
print("R2 score for test:", test_r2_score)


R2 score for train: 0.9835563811461037
R2 score for test: 0.586749478458209


### 11. Create a pickle file with an extension as .pkl

In [None]:
##project is uploaded separately

### 12. Create new folder/new project in visual studio/pycharm that should contain the "model.pkl" file *make sure you are using a virutal environment and install required packages.*

### a) Create a basic HTML form for the frontend

Create a file **index.html** in the templates folder and copy the following code.

### b) Create app.py file and write the predict function

### 13. Deploy your app on Heroku. (write commands for deployment)

### 14. Paste the URL of the heroku application below, and while submitting the solution submit this notebook along with the source code.

### Happy Learning :)