# <font color=darkblue> Machine Learning model deployment with Flask framework</font>

## <font color=Blue>Used Cars Price Prediction Application</font>

### Objective:
1. To build a Machine learning regression model to predict the selling price of the used cars based on the different input features like fuel_type, kms_driven, type of transmission etc.
2. Deploy the machine learning model with the help of the flask framework.

### Dataset Information:
#### Dataset Source: https://www.kaggle.com/datasets/nehalbirla/vehicle-dataset-from-cardekho?select=CAR+DETAILS+FROM+CAR+DEKHO.csv
This dataset contains information about used cars listed on www.cardekho.com
- **Car_Name**: Name of the car
- **Year**: Year of Purchase
- **Selling Price (target)**: Selling price of the car in lakhs
- **Present Price**: Present price of the car in lakhs
- **Kms_Driven**: kilometers driven
- **Fuel_Type**: Petrol/diesel/CNG
- **Seller_Type**: Dealer or Indiviual
- **Transmission**: Manual or Automatic
- **Owner**: first, second or third owner


### 1. Import required libraries

In [None]:
import pandas as pd

### 2. Load the dataset

In [1]:
import pandas as pd

# Step 3: Load the dataset

file_path = 'C:/Users/hp/Downloads/car+data.csv'

# Load the dataset
df = pd.read_csv(file_path)



### 3. Check the shape and basic information of the dataset.

In [2]:
import pandas as pd

# Correcting the file path
file_path = r'C:/Users/hp/Downloads/car+data.csv'

# Load the dataset
df = pd.read_csv(file_path)

# Check the shape of the dataset
print("Shape of the dataset:", df.shape)

# Display basic information about the dataset
print("\nBasic Information about the dataset:")
print(df.info())


Shape of the dataset: (301, 9)

Basic Information about the dataset:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 301 entries, 0 to 300
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Car_Name       301 non-null    object 
 1   Year           301 non-null    int64  
 2   Selling_Price  301 non-null    float64
 3   Present_Price  301 non-null    float64
 4   Kms_Driven     301 non-null    int64  
 5   Fuel_Type      301 non-null    object 
 6   Seller_Type    301 non-null    object 
 7   Transmission   301 non-null    object 
 8   Owner          301 non-null    int64  
dtypes: float64(2), int64(3), object(4)
memory usage: 21.3+ KB
None


### 4. Check for the presence of the duplicate records in the dataset? If present drop them

In [3]:
import pandas as pd

# Load the dataset
file_path = "C:/Users/hp/Downloads/car+data.csv"
df = pd.read_csv(file_path)

# Check for duplicate records
duplicate_rows = df[df.duplicated()]

if not duplicate_rows.empty:
    print("Duplicate records found. Dropping duplicates...")
    # Drop duplicates
    df.drop_duplicates(inplace=True)
    print("Duplicates dropped. Shape of the dataset:", df.shape)
else:
    print("No duplicate records found.")

# Optionally, you can save the cleaned dataset back to a file if needed
# df.to_csv("cleaned_car_data.csv", index=False)


Duplicate records found. Dropping duplicates...
Duplicates dropped. Shape of the dataset: (299, 9)


### 5. Drop the columns which you think redundant for the analysis.

In [6]:
import pandas as pd

# Load the dataset
file_path = "C:/Users/hp/Downloads/car+data.csv"
df = pd.read_csv(file_path)

# Display the columns to identify which ones are redundant
print("Original columns:")
print(df.columns)

# List of columns to drop (example of redundant columns)
columns_to_drop = ['Column1', 'Column2', 'Column3']

# Check if columns_to_drop exist in df.columns before dropping
columns_to_drop_existing = [col for col in columns_to_drop if col in df.columns]

if columns_to_drop_existing:
    # Drop the redundant columns
    df.drop(columns=columns_to_drop_existing, inplace=True)
    print("\nColumns after dropping redundant columns:")
    print(df.columns)
else:
    print("\nColumns to drop not found in the dataset.")

# Optionally, you can save the modified dataset back to a file if needed
# df.to_csv("updated_car_data.csv", index=False)


Original columns:
Index(['Car_Name', 'Year', 'Selling_Price', 'Present_Price', 'Kms_Driven',
       'Fuel_Type', 'Seller_Type', 'Transmission', 'Owner'],
      dtype='object')

Columns to drop not found in the dataset.


### 6. Extract a new feature called 'age_of_the_car' from the feature 'year' and drop the feature year

In [7]:
import pandas as pd
import datetime as dt

# Load the dataset
file_path = "C:/Users/hp/Downloads/car+data.csv"
df = pd.read_csv(file_path)

# Display the columns to understand the structure
print("Original columns:")
print(df.columns)

# Check if 'year' or equivalent column exists
if 'year' in df.columns:
    # Calculate current year
    current_year = dt.datetime.now().year

    # Calculate age of the car
    df['age_of_the_car'] = current_year - df['year']

    # Drop the 'year' column
    df.drop(columns=['year'], inplace=True)

    # Display the updated columns after dropping 'year'
    print("\nColumns after dropping 'year' and adding 'age_of_the_car':")
    print(df.columns)

    # Optionally, you can save the modified dataset back to a file if needed
    # df.to_csv("updated_car_data.csv", index=False)
else:
    print("Column 'year' not found in the dataset. Verify the column name or dataset contents.")


Original columns:
Index(['Car_Name', 'Year', 'Selling_Price', 'Present_Price', 'Kms_Driven',
       'Fuel_Type', 'Seller_Type', 'Transmission', 'Owner'],
      dtype='object')
Column 'year' not found in the dataset. Verify the column name or dataset contents.


### 7. Encode the categorical columns

In [9]:
import pandas as pd

# Load the dataset
file_path = "C:/Users/hp/Downloads/car+data.csv"
df = pd.read_csv(file_path)

# Display the columns to identify categorical variables
print("Original columns:")
print(df.columns)

# Replace with actual categorical column names
categorical_columns = ['ActualCategoricalColumn1', 'ActualCategoricalColumn2']

# Check if categorical columns exist in df.columns
existing_categorical_columns = [col for col in categorical_columns if col in df.columns]

if existing_categorical_columns:
    # One-hot encode categorical columns
    df_encoded = pd.get_dummies(df, columns=existing_categorical_columns, drop_first=True)

    # Display the updated columns after encoding
    print("\nColumns after encoding categorical columns:")
    print(df_encoded.columns)

    # Optionally, you can save the encoded dataset back to a file if needed
    # df_encoded.to_csv("encoded_car_data.csv", index=False)
else:
    print("No valid categorical columns found in the dataset.")



Original columns:
Index(['Car_Name', 'Year', 'Selling_Price', 'Present_Price', 'Kms_Driven',
       'Fuel_Type', 'Seller_Type', 'Transmission', 'Owner'],
      dtype='object')
No valid categorical columns found in the dataset.


### 8. Separate the target and independent features.

In [11]:
import pandas as pd

# Load the dataset
file_path = "C:/Users/hp/Downloads/car+data.csv"
df = pd.read_csv(file_path)

# Display the columns to understand the structure
print("Original columns:")
print(df.columns)

# Example: Assuming 'target_column' is your target variable (replace with actual target column name)
target_column = 'actual_target_column_name'

# Check if the target column exists in the DataFrame
if target_column in df.columns:
    # Separate the target variable (dependent feature) and independent features
    y = df[target_column]  # Target variable
    X = df.drop(columns=[target_column])  # Independent features

    # Display the shapes of X and y to verify separation
    print("\nShape of X (independent features):", X.shape)
    print("Shape of y (target variable):", y.shape)
else:
    print(f"Target column '{target_column}' not found in the dataset. Verify column name.")



Original columns:
Index(['Car_Name', 'Year', 'Selling_Price', 'Present_Price', 'Kms_Driven',
       'Fuel_Type', 'Seller_Type', 'Transmission', 'Owner'],
      dtype='object')
Target column 'actual_target_column_name' not found in the dataset. Verify column name.


### 9. Split the data into train and test.

In [13]:
import pandas as pd
from sklearn.model_selection import train_test_split

# Load the dataset
file_path = "C:/Users/hp/Downloads/car+data.csv"
df = pd.read_csv(file_path)

# Display the columns to understand the structure
print("Original columns:")
print(df.columns)

# Replace with the actual name of your target variable column
target_column = 'actual_target_column_name'

# Check if the target column exists in the DataFrame
if target_column in df.columns:
    # Separate the target variable (dependent feature) and independent features
    y = df[target_column]  # Target variable
    X = df.drop(columns=[target_column])  # Independent features

    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

    # Print the shapes of the train and test sets to verify
    print("Shape of X_train:", X_train.shape)
    print("Shape of X_test:", X_test.shape)
    print("Shape of y_train:", y_train.shape)
    print("Shape of y_test:", y_test.shape)
else:
    print(f"Target column '{target_column}' not found in the dataset. Verify column name.")


Original columns:
Index(['Car_Name', 'Year', 'Selling_Price', 'Present_Price', 'Kms_Driven',
       'Fuel_Type', 'Seller_Type', 'Transmission', 'Owner'],
      dtype='object')
Target column 'actual_target_column_name' not found in the dataset. Verify column name.


### 10. Build a Random forest Regressor model and check the r2-score for train and test.

In [14]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score

# Load the dataset
file_path = "C:/Users/hp/Downloads/car+data.csv"
df = pd.read_csv(file_path)

# Assuming 'target_column' is your target variable
target_column = 'target_variable'  # Replace with actual target column name

# Check if the target column exists in the DataFrame
if target_column in df.columns:
    # Separate the target variable (dependent feature) and independent features
    y = df[target_column]  # Target variable
    X = df.drop(columns=[target_column])  # Independent features

    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

    # Build the Random Forest Regressor model
    rf_regressor = RandomForestRegressor(random_state=42)

    # Train the model
    rf_regressor.fit(X_train, y_train)

    # Predict on training and test sets
    y_train_pred = rf_regressor.predict(X_train)
    y_test_pred = rf_regressor.predict(X_test)

    # Calculate R^2 score for training and test sets
    r2_train = r2_score(y_train, y_train_pred)
    r2_test = r2_score(y_test, y_test_pred)

    # Print R^2 scores
    print(f"R^2 score on training set: {r2_train:.2f}")
    print(f"R^2 score on test set: {r2_test:.2f}")

else:
    print(f"Target column '{target_column}' not found in the dataset. Verify column name.")


Target column 'target_variable' not found in the dataset. Verify column name.


### 11. Create a pickle file with an extension as .pkl

In [15]:
import pickle

# Example dictionary (replace with your object, e.g., trained model)
data = {'name': 'John', 'age': 30, 'city': 'New York'}

# Path to save the .pkl file
file_path = 'model.pkl'

# Write to the .pkl file
with open(file_path, 'wb') as file:
    pickle.dump(data, file)

print(f"Saved pickle file to '{file_path}'")


Saved pickle file to 'model.pkl'


### 12. Create new folder/new project in visual studio/pycharm that should contain the "model.pkl" file *make sure you are using a virutal environment and install required packages.*

### a) Create a basic HTML form for the frontend

Create a file **index.html** in the templates folder and copy the following code.

### b) Create app.py file and write the predict function

In [17]:
from flask import Flask, request, jsonify

# Initialize Flask application
app = Flask(__name__)

# Sample prediction function (replace with your actual prediction logic)
def predict(input_data):
    # Sample logic (replace with your actual model prediction)
    result = {'prediction': 'Sample prediction result'}
    return result

# Define a route for prediction
@app.route('/predict', methods=['POST'])
def predict_route():
    try:
        # Get data from request
        input_data = request.json
        
        # Call prediction function
        prediction_result = predict(input_data)
        
        # Return prediction result as JSON response
        return jsonify(prediction_result), 200
    except Exception as e:
        # Handle exceptions
        return jsonify({'error': str(e)}), 400

# Run the application
if __name__ == '__main__':
    app.run(debug=True)


 * Serving Flask app '__main__'
 * Debug mode: on


 * Running on http://127.0.0.1:5000
Press CTRL+C to quit
 * Restarting with stat


SystemExit: 1

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


### 13. Run the app.py python file which will render to index html page then enter the input values and get the prediction.

In [18]:
from flask import Flask, render_template, request, jsonify

# Initialize Flask application
app = Flask(__name__)

# Sample prediction function (replace with your actual prediction logic)
def predict(input_data):
    # Sample logic (replace with your actual model prediction)
    result = {'prediction': 'Sample prediction result'}
    return result

# Route to render index.html
@app.route('/')
def home():
    return render_template('index.html')

# Route to handle prediction
@app.route('/predict', methods=['POST'])
def predict_route():
    try:
        # Get data from request
        input_data = request.form  # Use request.form to get form data
        
        # Call prediction function
        prediction_result = predict(input_data)
        
        # Return prediction result as JSON response
        return jsonify(prediction_result), 200
    except Exception as e:
        # Handle exceptions
        return jsonify({'error': str(e)}), 400

# Run the application
if __name__ == '__main__':
    app.run(debug=True)


 * Serving Flask app '__main__'
 * Debug mode: on


 * Running on http://127.0.0.1:5000
Press CTRL+C to quit
 * Restarting with stat


SystemExit: 1

### Happy Learning :)