## Import Libraries

In [None]:
import pandas as pd
from pathlib import Path


## Load Data

In [3]:
file_path = Path("..") / "data" / "train.csv"
df = pd.read_csv(file_path)

  df = pd.read_csv(file_path)


## Convert Date column `to_datetime` format

In [9]:
df['Date'] = pd.to_datetime(df['Date'])


## Feature Engineering

In [10]:
df['DayOfWeek'] = df['Date'].dt.dayofweek
df['Promo'] = df['Promo'].astype(int)
df['SchoolHoliday'] = df['SchoolHoliday'].astype(int)
df['StateHoliday_a'] = (df['StateHoliday'] == 'a').astype(int)
df['StateHoliday_b'] = (df['StateHoliday'] == 'b').astype(int)
df['StateHoliday_c'] = (df['StateHoliday'] == 'c').astype(int)


## Store Selection

To focus our regression analysis, we selected **Store 262**, which has the **highest total sales** across the dataset. This ensures weâ€™re analyzing a store with sufficient data and clear patterns, making it a strong candidate for evaluating holiday effects.


In [14]:
store_sales = df.groupby('Store')['Sales'].sum().reset_index()
store_sales = store_sales.sort_values(by='Sales', ascending=False)

top_stores = store_sales.head(10)
top_stores

Unnamed: 0,Store,Sales
261,262,19516842
816,817,17057867
561,562,16927322
1113,1114,16202585
250,251,14896870
512,513,14252406
787,788,14082141
732,733,14067158
382,383,13489879
755,756,12911782


## Select Store ID 262

In [15]:
store_id = 262
df_store = df[df['Store'] == store_id].copy()


## Remove 0 Sales

In [16]:
df_store = df_store[df_store['Sales'] > 0]


## OLS Regression to Analyze Holiday Impact

We run a linear regression to estimate the effect of holidays and promotions on daily sales in Store 262. We include binary features for each holiday type (a, b, c), school holidays, promotions, and day of the week.


In [17]:
import statsmodels.api as sm

features = ['Promo', 'SchoolHoliday', 'StateHoliday_a', 'StateHoliday_b', 'StateHoliday_c', 'DayOfWeek']
X = df_store[features]
y = df_store['Sales']

X = sm.add_constant(X)  # Add intercept term
model = sm.OLS(y, X).fit()
print(model.summary())


                            OLS Regression Results                            
Dep. Variable:                  Sales   R-squared:                       0.354
Model:                            OLS   Adj. R-squared:                  0.350
Method:                 Least Squares   F-statistic:                     85.34
Date:                Wed, 28 May 2025   Prob (F-statistic):           2.96e-85
Time:                        12:52:48   Log-Likelihood:                -9089.1
No. Observations:                 942   AIC:                         1.819e+04
Df Residuals:                     935   BIC:                         1.823e+04
Df Model:                           6                                         
Covariance Type:            nonrobust                                         
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
const           1.621e+04    307.330     52.

### Interpretation: Holiday Impact on Sales

The regression analysis shows that all three types of state holidays (`a`, `b`, and `c`) have a **statistically significant positive effect** on daily sales in Store 262:

- **StateHoliday_a**: On average, sales increase by **12,570 units**
- **StateHoliday_b**: On average, sales increase by **15,170 units**
- **StateHoliday_c**: On average, sales increase by **8,707 units**

These results suggest that Store 262 experiences **higher customer traffic or demand during state holidays**, regardless of promotions or school breaks. 

We conclude that **state holidays are positively associated with increased sales**, and should be considered an important factor in sales forecasting for this store.
