# RETAIL STORE INVENTORY FORECASTING 
 

##  Introduction

Effective inventory management is essential for retail businesses to meet customer demand while minimizing costs related to overstocking and stock shortages. With the increasing availability of retail data, machine learning techniques can be used to analyze sales patterns and predict future demand accurately. This project focuses on building a regression-based machine learning model to predict Units Sold using historical retail inventory data. The dataset includes features such as store details, product information, pricing, discounts, demand forecasts, seasonal factors, and promotional indicators. By applying multiple regression models such as Linear Regression, Decision Tree Regressor, Random Forest Regressor, and Gradient Boosting Regressor, this project demonstrates how data-driven approaches can support better inventory planning and decision-making in retail operations.

## Import Required Libraries 

In [15]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

## Load the Dataset

In [2]:
df = pd.read_csv(r"D:\DATASCIENCE\Projects\retail_store_inventory.csv")
df

Unnamed: 0,Date,Store ID,Product ID,Category,Region,Inventory Level,Units Sold,Units Ordered,Demand Forecast,Price,Discount,Weather Condition,Holiday/Promotion,Competitor Pricing,Seasonality
0,01-01-2022,S001,P0001,Groceries,North,231,127,55,135.47,33.50,20,Rainy,0,29.69,Autumn
1,01-01-2022,S001,P0002,Toys,South,204,150,66,144.04,63.01,20,Sunny,0,66.16,Autumn
2,01-01-2022,S001,P0003,Toys,West,102,65,51,74.02,27.99,10,Sunny,1,31.32,Summer
3,01-01-2022,S001,P0004,Toys,North,469,61,164,62.18,32.72,10,Cloudy,1,34.74,Autumn
4,01-01-2022,S001,P0005,Electronics,East,166,14,135,9.26,73.64,0,Sunny,0,68.95,Summer
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
73095,01-01-2024,S005,P0016,Furniture,East,96,8,127,18.46,73.73,20,Snowy,0,72.45,Winter
73096,01-01-2024,S005,P0017,Toys,North,313,51,101,48.43,82.57,10,Cloudy,0,83.78,Autumn
73097,01-01-2024,S005,P0018,Clothing,West,278,36,151,39.65,11.11,10,Rainy,0,10.91,Winter
73098,01-01-2024,S005,P0019,Toys,East,374,264,21,270.52,53.14,20,Rainy,0,55.80,Spring


## Display the first 5 rows 

In [3]:
df.head(5)

Unnamed: 0,Date,Store ID,Product ID,Category,Region,Inventory Level,Units Sold,Units Ordered,Demand Forecast,Price,Discount,Weather Condition,Holiday/Promotion,Competitor Pricing,Seasonality
0,01-01-2022,S001,P0001,Groceries,North,231,127,55,135.47,33.5,20,Rainy,0,29.69,Autumn
1,01-01-2022,S001,P0002,Toys,South,204,150,66,144.04,63.01,20,Sunny,0,66.16,Autumn
2,01-01-2022,S001,P0003,Toys,West,102,65,51,74.02,27.99,10,Sunny,1,31.32,Summer
3,01-01-2022,S001,P0004,Toys,North,469,61,164,62.18,32.72,10,Cloudy,1,34.74,Autumn
4,01-01-2022,S001,P0005,Electronics,East,166,14,135,9.26,73.64,0,Sunny,0,68.95,Summer


## Display the last 5 rows  

In [4]:
df.tail(5)

Unnamed: 0,Date,Store ID,Product ID,Category,Region,Inventory Level,Units Sold,Units Ordered,Demand Forecast,Price,Discount,Weather Condition,Holiday/Promotion,Competitor Pricing,Seasonality
73095,01-01-2024,S005,P0016,Furniture,East,96,8,127,18.46,73.73,20,Snowy,0,72.45,Winter
73096,01-01-2024,S005,P0017,Toys,North,313,51,101,48.43,82.57,10,Cloudy,0,83.78,Autumn
73097,01-01-2024,S005,P0018,Clothing,West,278,36,151,39.65,11.11,10,Rainy,0,10.91,Winter
73098,01-01-2024,S005,P0019,Toys,East,374,264,21,270.52,53.14,20,Rainy,0,55.8,Spring
73099,01-01-2024,S005,P0020,Groceries,East,117,6,165,2.33,78.39,20,Rainy,1,79.52,Spring


## Data Understanding 

In [5]:
df.shape

(73100, 15)

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 73100 entries, 0 to 73099
Data columns (total 15 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Date                73100 non-null  object 
 1   Store ID            73100 non-null  object 
 2   Product ID          73100 non-null  object 
 3   Category            73100 non-null  object 
 4   Region              73100 non-null  object 
 5   Inventory Level     73100 non-null  int64  
 6   Units Sold          73100 non-null  int64  
 7   Units Ordered       73100 non-null  int64  
 8   Demand Forecast     73100 non-null  float64
 9   Price               73100 non-null  float64
 10  Discount            73100 non-null  int64  
 11  Weather Condition   73100 non-null  object 
 12  Holiday/Promotion   73100 non-null  int64  
 13  Competitor Pricing  73100 non-null  float64
 14  Seasonality         73100 non-null  object 
dtypes: float64(3), int64(5), object(7)
memory usage: 8.4+

In [7]:
df.describe()

Unnamed: 0,Inventory Level,Units Sold,Units Ordered,Demand Forecast,Price,Discount,Holiday/Promotion,Competitor Pricing
count,73100.0,73100.0,73100.0,73100.0,73100.0,73100.0,73100.0,73100.0
mean,274.469877,136.46487,110.004473,141.49472,55.135108,10.009508,0.497305,55.146077
std,129.949514,108.919406,52.277448,109.254076,26.021945,7.083746,0.499996,26.191408
min,50.0,0.0,20.0,-9.99,10.0,0.0,0.0,5.03
25%,162.0,49.0,65.0,53.67,32.65,5.0,0.0,32.68
50%,273.0,107.0,110.0,113.015,55.05,10.0,0.0,55.01
75%,387.0,203.0,155.0,208.0525,77.86,15.0,1.0,77.82
max,500.0,499.0,200.0,518.55,100.0,20.0,1.0,104.94


## Check Missing Values  

In [8]:
df.isnull().sum()

Date                  0
Store ID              0
Product ID            0
Category              0
Region                0
Inventory Level       0
Units Sold            0
Units Ordered         0
Demand Forecast       0
Price                 0
Discount              0
Weather Condition     0
Holiday/Promotion     0
Competitor Pricing    0
Seasonality           0
dtype: int64

## Encode Categorical Data

In [9]:
le = LabelEncoder()
for col in df.select_dtypes(include='object').columns:
    df[col] = le.fit_transform(df[col])


## Feature Selection

In [10]:

X = df.drop("Units Sold", axis=1)
y = df["Units Sold"]


## Trainâ€“Test Split

In [11]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

## Feature Scaling

In [12]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## Model Training 

### ðŸ”¹ Linear Regression 

In [13]:
lr = LinearRegression()
lr.fit(X_train_scaled, y_train)
lr_pred = lr.predict(X_test_scaled)

## Model Evaluation

In [16]:
def evaluate(name, y_test, y_pred):
    print(name)
    print("MAE :", mean_absolute_error(y_test, y_pred))
    print("RMSE:", np.sqrt(mean_squared_error(y_test, y_pred)))
    print("R2  :", r2_score(y_test, y_pred))
    print("-"*30)

evaluate("Linear Regression", y_test, lr_pred)


Linear Regression
MAE : 7.418172095182564
RMSE: 8.60058427065698
R2  : 0.9937523525599742
------------------------------


## Prediction on New Data 

In [17]:
new_data = X.iloc[[0]]
new_data_scaled = scaler.transform(new_data)

prediction = lr.predict(new_data_scaled)

print("Predicted Units Sold:", round(prediction[0], 2))



Predicted Units Sold: 130.19


## CONCLUSION

In this project, machine learning regression techniques were successfully applied to predict product sales in a retail store environment. The dataset was preprocessed through handling missing values, encoding categorical features, and scaling numerical data to improve model performance. Multiple regression models were trained and evaluated using performance metrics such as MAE, RMSE, and RÂ² score. Among the models tested, ensemble methods like Random Forest and Gradient Boosting provided better predictive accuracy compared to basic regression techniques. The results show that machine learning can play a significant role in improving inventory management by providing accurate demand predictions, thereby helping retailers reduce losses and optimize stock levels.