# Real-Time Price Prediction for E-Commerce

This notebook demonstrates data preprocessing, machine learning model
training, and price prediction using an e-commerce dataset.

## Data Loading

In this step, we load the e-commerce dataset and inspect its structure.


In [2]:
# E-commerce Price Prediction – Data Understanding
import pandas as pd

df = pd.read_csv("../data/ecommerce_data.csv",encoding = "latin1")
df.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,12/1/2010 8:26,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,12/1/2010 8:26,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,12/1/2010 8:26,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,12/1/2010 8:26,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,12/1/2010 8:26,3.39,17850.0,United Kingdom


In [3]:
df.shape

(541909, 8)

In [4]:
df.columns

Index(['InvoiceNo', 'StockCode', 'Description', 'Quantity', 'InvoiceDate',
       'UnitPrice', 'CustomerID', 'Country'],
      dtype='object')

In [5]:
df.dtypes

InvoiceNo       object
StockCode       object
Description     object
Quantity         int64
InvoiceDate     object
UnitPrice      float64
CustomerID     float64
Country         object
dtype: object

In [6]:
df.describe()

Unnamed: 0,Quantity,UnitPrice,CustomerID
count,541909.0,541909.0,406829.0
mean,9.55225,4.611114,15287.69057
std,218.081158,96.759853,1713.600303
min,-80995.0,-11062.06,12346.0
25%,1.0,1.25,13953.0
50%,3.0,2.08,15152.0
75%,10.0,4.13,16791.0
max,80995.0,38970.0,18287.0


In [7]:
df.isnull().sum()

InvoiceNo           0
StockCode           0
Description      1454
Quantity            0
InvoiceDate         0
UnitPrice           0
CustomerID     135080
Country             0
dtype: int64

## Data Preprocessing

This section cleans the data, removes unnecessary columns,
and converts categorical features into numerical form
so that machine learning models can process them.


In [8]:
df['InvoiceDate'] = pd.to_datetime(df['InvoiceDate'])
df['year'] = df['InvoiceDate'].dt.year
df['month'] = df['InvoiceDate'].dt.month



In [9]:
df = df.drop(
    columns=[
        'InvoiceDate',
        'Description',
        'InvoiceNo',
        'StockCode',
        'Country'
    ],
    errors='ignore'
)


In [22]:
df.columns

Index(['Quantity', 'UnitPrice', 'CustomerID', 'year', 'month'], dtype='object')

In [10]:
#separate features and target
X = df.drop('UnitPrice',axis = 1)
y = df['UnitPrice']
X = pd.get_dummies(X,drop_first = True)

In [11]:
from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2,random_state = 42)

In [12]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error,mean_squared_error,r2_score

In [13]:
rf_model = RandomForestRegressor(n_estimators = 100,random_state = 42)


In [14]:
print(X_train.dtypes)
print(y_train.dtype)


Quantity        int64
CustomerID    float64
year            int32
month           int32
dtype: object
float64


## Model Training

In this step, the dataset is split into training and testing sets.
A Random Forest Regressor is trained to learn patterns
and predict product prices.


In [15]:
rf_model.fit(X_train, y_train)

0,1,2
,n_estimators,100
,criterion,'squared_error'
,max_depth,
,min_samples_split,2
,min_samples_leaf,1
,min_weight_fraction_leaf,0.0
,max_features,1.0
,max_leaf_nodes,
,min_impurity_decrease,0.0
,bootstrap,True


In [16]:
y_pred = rf_model.predict(X_test)

## Model Evaluation & Results

The trained model is evaluated using standard regression metrics
such as Mean Absolute Error (MAE), Mean Squared Error (MSE),
and R² score to assess prediction accuracy.


In [17]:
mae = mean_absolute_error(y_test,y_pred)
mse = mean_squared_error(y_test,y_pred)
r2_score = r2_score(y_test,y_pred)

print('mae:',mae)
print('mse:',mse)
print('r2_score:',r2_score)

mae: 3.505479275643778
mse: 5613.112700940072
r2_score: 0.2269014273827601


In [18]:
results = X_test.copy()

In [19]:
results.head()

Unnamed: 0,Quantity,CustomerID,year,month
209268,24,17315.0,2011,6
207108,4,14031.0,2011,5
167085,4,14031.0,2011,4
471836,3,17198.0,2011,11
115865,2,13502.0,2011,3


In [20]:

results['Actual_Price'] = y_test.values
results['Predicted_Price'] = y_pred

results.head()

Unnamed: 0,Quantity,CustomerID,year,month,Actual_Price,Predicted_Price
209268,24,17315.0,2011,6,0.85,1.248028
207108,4,14031.0,2011,5,6.95,7.76539
167085,4,14031.0,2011,4,0.65,8.161651
471836,3,17198.0,2011,11,1.95,2.138721
115865,2,13502.0,2011,3,9.95,20.169174


In [21]:
import pickle

with open('price_model.pkl','wb') as file:
    pickle.dump(rf_model,file)

print('model saved successfully')

model saved successfully
