# Car Price Regression with PyCaret
This notebook uses PyCaret to build a regression model for predicting car prices based on various features from the `CarPrice_Assignment.csv` dataset.

**Problem Statement:**
A Chinese automobile company wants to understand the factors affecting car prices in the US market. The goal is to model car prices using available features to help management understand pricing dynamics and inform business strategy.

**Dataset:**
- `CarPrice_Assignment.csv`: Contains various features of cars and their prices.

**Steps:**
1. Load the dataset
2. Set up PyCaret regression experiment
3. Compare models and select the best
4. Evaluate and save the best model
5. Predict car prices

In [15]:
# Import required libraries
import pandas as pd
import numpy as np

In [16]:
# Load the dataset
data = pd.read_csv('CarPrice_Assignment.csv')
data = data.drop(columns=['car_ID'])
data.head()

Unnamed: 0,symboling,CarName,fueltype,aspiration,doornumber,carbody,drivewheel,enginelocation,wheelbase,carlength,...,enginesize,fuelsystem,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price
0,3,alfa-romero giulia,gas,std,two,convertible,rwd,front,88.6,168.8,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,13495.0
1,3,alfa-romero stelvio,gas,std,two,convertible,rwd,front,88.6,168.8,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,16500.0
2,1,alfa-romero Quadrifoglio,gas,std,two,hatchback,rwd,front,94.5,171.2,...,152,mpfi,2.68,3.47,9.0,154,5000,19,26,16500.0
3,2,audi 100 ls,gas,std,four,sedan,fwd,front,99.8,176.6,...,109,mpfi,3.19,3.4,10.0,102,5500,24,30,13950.0
4,2,audi 100ls,gas,std,four,sedan,4wd,front,99.4,176.6,...,136,mpfi,3.19,3.4,8.0,115,5500,18,22,17450.0


In [17]:
# PyCaret Regression Setup
from pycaret.regression import RegressionExperiment
reg = RegressionExperiment()
reg.setup(data=data, target='price', session_id=123, normalize=True, transformation=True)

Unnamed: 0,Description,Value
0,Session id,123
1,Target,price
2,Target type,Regression
3,Original data shape,"(205, 25)"
4,Transformed data shape,"(205, 50)"
5,Transformed train set shape,"(143, 50)"
6,Transformed test set shape,"(62, 50)"
7,Numeric features,14
8,Categorical features,10
9,Preprocess,True


<pycaret.regression.oop.RegressionExperiment at 0x7f940d5a8390>

In [18]:
# Compare models and select the best
best_model = reg.compare_models()

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
gbr,Gradient Boosting Regressor,1794.7802,6124204.8988,2395.3286,0.8994,0.1564,0.1276,0.037
et,Extra Trees Regressor,1720.0693,6138467.7043,2386.3566,0.8905,0.1593,0.1269,0.04
rf,Random Forest Regressor,1790.1093,6459039.4098,2462.9295,0.8879,0.1573,0.1262,0.043
ada,AdaBoost Regressor,2036.9578,6923120.432,2586.3322,0.8748,0.1875,0.1655,0.034
dt,Decision Tree Regressor,2110.6109,10247845.7634,2996.3348,0.8319,0.2043,0.1513,0.026
en,Elastic Net,2597.143,13404852.2061,3598.5825,0.763,0.2679,0.1983,0.027
par,Passive Aggressive Regressor,2488.4455,13232025.2141,3519.0233,0.7603,0.2425,0.1767,0.028
br,Bayesian Ridge,2573.1435,12978499.3978,3500.1029,0.7496,0.2751,0.1932,0.027
omp,Orthogonal Matching Pursuit,2864.4654,16811219.1035,3959.0047,0.7027,0.2716,0.2112,0.027
llar,Lasso Least Angle Regression,2721.8605,16059107.3763,3777.9649,0.6978,0.2864,0.1974,0.03


In [None]:
# Evaluate the best model
reg.evaluate_model(best_model)

interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Pipeline Plot', 'pipelin…

In [20]:
# Save the best model for future use
reg.save_model(best_model, 'best_car_price_model')

Transformation Pipeline and Model Successfully Saved


(Pipeline(memory=Memory(location=None),
          steps=[('numerical_imputer',
                  TransformerWrapper(include=['symboling', 'wheelbase',
                                              'carlength', 'carwidth',
                                              'carheight', 'curbweight',
                                              'enginesize', 'boreratio',
                                              'stroke', 'compressionratio',
                                              'horsepower', 'peakrpm', 'citympg',
                                              'highwaympg'],
                                     transformer=SimpleImputer())),
                 ('categorical_imputer',
                  TransformerWrapper(include=['Ca...
                                                               use_cat_names=True))),
                 ('rest_encoding',
                  TransformerWrapper(include=['CarName'],
                                     transformer=TargetEncoder(cols=['Ca

In [21]:
# Predict car prices on the same data (or new/unseen data)
predictions = reg.predict_model(best_model)
predictions.head()

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Gradient Boosting Regressor,1084.8522,2805510.6913,1674.9659,0.9143,0.1341,0.1018


Unnamed: 0,symboling,CarName,fueltype,aspiration,doornumber,carbody,drivewheel,enginelocation,wheelbase,carlength,...,fuelsystem,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price,prediction_label
88,-1,mitsubishi mirage g4,gas,std,four,sedan,fwd,front,96.300003,172.399994,...,spdi,3.17,3.46,7.5,116,5500,23,30,9279.0,8885.697626
72,3,buick skylark,gas,std,two,convertible,rwd,front,96.599998,180.300003,...,mpfi,3.46,3.1,8.3,155,4750,16,18,35056.0,32339.032203
114,0,peugeot 505s turbo diesel,diesel,turbo,four,wagon,rwd,front,114.199997,198.899994,...,idi,3.7,3.52,21.0,95,4150,25,25,17075.0,17834.458932
158,0,toyota corona,diesel,std,four,sedan,fwd,front,95.699997,166.300003,...,idi,3.27,3.35,22.5,56,4500,34,36,7898.0,7923.261293
163,1,toyota corolla liftback,gas,std,two,sedan,rwd,front,94.5,168.699997,...,2bbl,3.19,3.03,9.0,70,4800,29,34,8058.0,8173.265215


In [22]:
gbr = reg.create_model('gbr')

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,1364.4562,3453431.4781,1858.3411,0.9466,0.1396,0.1192
1,1391.6395,4015722.4407,2003.9268,0.8289,0.1473,0.1058
2,1759.9534,7021520.6771,2649.8152,0.8963,0.1385,0.1178
3,1338.8303,3884817.9478,1970.9942,0.958,0.1121,0.0952
4,2121.7224,7987092.4168,2826.1444,0.908,0.2014,0.1392
5,2370.9389,9169228.1785,3028.0733,0.8253,0.2084,0.1802
6,2676.3963,12104275.3002,3479.1199,0.8482,0.1973,0.1714
7,1261.3397,2557671.5965,1599.2722,0.8951,0.1214,0.1036
8,1301.4194,2739179.9528,1655.0468,0.9607,0.1276,0.1103
9,2361.1056,8309108.9997,2882.5525,0.927,0.1699,0.1336


In [23]:
data.head()

Unnamed: 0,symboling,CarName,fueltype,aspiration,doornumber,carbody,drivewheel,enginelocation,wheelbase,carlength,...,enginesize,fuelsystem,boreratio,stroke,compressionratio,horsepower,peakrpm,citympg,highwaympg,price
0,3,alfa-romero giulia,gas,std,two,convertible,rwd,front,88.6,168.8,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,13495.0
1,3,alfa-romero stelvio,gas,std,two,convertible,rwd,front,88.6,168.8,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,16500.0
2,1,alfa-romero Quadrifoglio,gas,std,two,hatchback,rwd,front,94.5,171.2,...,152,mpfi,2.68,3.47,9.0,154,5000,19,26,16500.0
3,2,audi 100 ls,gas,std,four,sedan,fwd,front,99.8,176.6,...,109,mpfi,3.19,3.4,10.0,102,5500,24,30,13950.0
4,2,audi 100ls,gas,std,four,sedan,4wd,front,99.4,176.6,...,136,mpfi,3.19,3.4,8.0,115,5500,18,22,17450.0


In [24]:
reg.create_app(gbr,None)

* Running on local URL:  http://127.0.0.1:7861
* To create a public link, set `share=True` in `launch()`.


