# Car Price Regression with PyCaret
This notebook uses PyCaret to build a regression model for predicting car prices based on various features from the `CarPrice_Assignment.csv` dataset.

**Problem Statement:**
A Chinese automobile company wants to understand the factors affecting car prices in the US market. The goal is to model car prices using available features to help management understand pricing dynamics and inform business strategy.

**Dataset:**
- `CarPrice_Assignment.csv`: Contains various features of cars and their prices.

**Steps:**
1. Load the dataset
2. Set up PyCaret regression experiment
3. Compare models and select the best
4. Evaluate and save the best model
5. Predict car prices

In [1]:
# Import required libraries
import pandas as pd
import numpy as np

In [2]:
# Load the dataset
data = pd.read_csv('CarPrice_Assignment.csv')
data.head()

Unnamed: 0,car_ID,symboling,CarName,fueltype,aspiration,doornumber,carbody,drivewheel,enginelocation,wheelbase,...,enginesize,fuelsystem,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price
0,1,3,alfa-romero giulia,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,13495.0
1,2,3,alfa-romero stelvio,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,16500.0
2,3,1,alfa-romero Quadrifoglio,gas,std,two,hatchback,rwd,front,94.5,...,152,mpfi,2.68,3.47,9.0,154,5000,19,26,16500.0
3,4,2,audi 100 ls,gas,std,four,sedan,fwd,front,99.8,...,109,mpfi,3.19,3.4,10.0,102,5500,24,30,13950.0
4,5,2,audi 100ls,gas,std,four,sedan,4wd,front,99.4,...,136,mpfi,3.19,3.4,8.0,115,5500,18,22,17450.0


In [3]:
# PyCaret Regression Setup
from pycaret.regression import RegressionExperiment
reg = RegressionExperiment()
reg.setup(data=data, target='price', session_id=123, normalize=True, transformation=True)

Unnamed: 0,Description,Value
0,Session id,123
1,Target,price
2,Target type,Regression
3,Original data shape,"(205, 26)"
4,Transformed data shape,"(205, 51)"
5,Transformed train set shape,"(143, 51)"
6,Transformed test set shape,"(62, 51)"
7,Numeric features,15
8,Categorical features,10
9,Preprocess,True


<pycaret.regression.oop.RegressionExperiment at 0x7f8f027bc410>

In [4]:
# Compare models and select the best
best_model = reg.compare_models()

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
gbr,Gradient Boosting Regressor,1677.3872,5500513.454,2273.9023,0.9087,0.1428,0.118,0.043
rf,Random Forest Regressor,1738.4027,6149582.5049,2418.1159,0.8918,0.1549,0.1224,0.058
et,Extra Trees Regressor,1700.1131,6185635.8377,2397.954,0.8886,0.1622,0.1279,0.046
ada,AdaBoost Regressor,2090.7731,6941211.4207,2606.8217,0.867,0.1924,0.1715,0.041
dt,Decision Tree Regressor,2004.2252,9463170.2628,2938.7837,0.8421,0.2095,0.1473,0.032
en,Elastic Net,2454.9655,12157188.5585,3417.0587,0.7865,0.2726,0.1878,0.033
par,Passive Aggressive Regressor,2375.3291,12253798.897,3411.556,0.784,0.2279,0.17,0.043
br,Bayesian Ridge,2415.4104,11384414.2834,3262.4082,0.7829,0.2638,0.1828,0.033
omp,Orthogonal Matching Pursuit,2740.4964,13356831.6264,3558.6914,0.7489,0.2599,0.22,0.034
ridge,Ridge Regression,2668.3986,14434523.1941,3570.3492,0.7256,0.3415,0.2076,0.038


In [None]:
# Evaluate the best model
reg.evaluate_model(best_model)

interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Pipeline Plot', 'pipelin…

In [6]:
# Save the best model for future use
reg.save_model(best_model, 'best_car_price_model')

Transformation Pipeline and Model Successfully Saved


(Pipeline(memory=Memory(location=None),
          steps=[('numerical_imputer',
                  TransformerWrapper(include=['car_ID', 'symboling', 'wheelbase',
                                              'carlength', 'carwidth',
                                              'carheight', 'curbweight',
                                              'enginesize', 'boreratio',
                                              'stroke', 'compressionratio',
                                              'horsepower', 'peakrpm', 'citympg',
                                              'highwaympg'],
                                     transformer=SimpleImputer())),
                 ('categorical_imputer',
                  TransformerWrapper(inc...
                                                               use_cat_names=True))),
                 ('rest_encoding',
                  TransformerWrapper(include=['CarName'],
                                     transformer=TargetEncoder(cols=['C

In [7]:
# Predict car prices on the same data (or new/unseen data)
predictions = reg.predict_model(best_model)
predictions.head()

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Gradient Boosting Regressor,1098.1978,2609733.6283,1615.467,0.9203,0.1317,0.1042


Unnamed: 0,car_ID,symboling,CarName,fueltype,aspiration,doornumber,carbody,drivewheel,enginelocation,wheelbase,...,fuelsystem,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price,prediction_label
88,89,-1,mitsubishi mirage g4,gas,std,four,sedan,fwd,front,96.300003,...,spdi,3.17,3.46,7.5,116,5500,23,30,9279.0,9011.780403
72,73,3,buick skylark,gas,std,two,convertible,rwd,front,96.599998,...,mpfi,3.46,3.1,8.3,155,4750,16,18,35056.0,32303.35515
114,115,0,peugeot 505s turbo diesel,diesel,turbo,four,wagon,rwd,front,114.199997,...,idi,3.7,3.52,21.0,95,4150,25,25,17075.0,17259.228021
158,159,0,toyota corona,diesel,std,four,sedan,fwd,front,95.699997,...,idi,3.27,3.35,22.5,56,4500,34,36,7898.0,8415.955824
163,164,1,toyota corolla liftback,gas,std,two,sedan,rwd,front,94.5,...,2bbl,3.19,3.03,9.0,70,4800,29,34,8058.0,7979.098467
