## Import Libraries

In [None]:
import pandas as pd
from pathlib import Path


## Load Data

In [None]:
file_path = Path("..") / "data" / "train.csv"
df = pd.read_csv(file_path)

## Convert Date column `to_datetime` format

In [None]:
df['Date'] = pd.to_datetime(df['Date'])


## Feature Engineering

In [None]:
df['DayOfWeek'] = df['Date'].dt.dayofweek
df['Promo'] = df['Promo'].astype(int)
df['SchoolHoliday'] = df['SchoolHoliday'].astype(int)
df['StateHoliday_a'] = (df['StateHoliday'] == 'a').astype(int)
df['StateHoliday_b'] = (df['StateHoliday'] == 'b').astype(int)
df['StateHoliday_c'] = (df['StateHoliday'] == 'c').astype(int)


## Store Selection

To focus our regression analysis, we selected **Store 262**, which has the **highest total sales** across the dataset. This ensures we’re analyzing a store with sufficient data and clear patterns, making it a strong candidate for evaluating holiday effects.


In [None]:
store_sales = df.groupby('Store')['Sales'].sum().reset_index()
store_sales = store_sales.sort_values(by='Sales', ascending=False)

top_stores = store_sales.head(10)
top_stores

## Select Store ID 262

In [None]:
store_id = 262
df_store = df[df['Store'] == store_id].copy()


## Remove 0 Sales

In [None]:
df_store = df_store[df_store['Sales'] > 0]


## OLS Regression to Analyze Holiday Impact

We run a linear regression to estimate the effect of holidays and promotions on daily sales in Store 262. We include binary features for each holiday type (a, b, c), school holidays, promotions, and day of the week.


In [None]:
import statsmodels.api as sm

features = ['Promo', 'SchoolHoliday', 'StateHoliday_a', 'StateHoliday_b', 'StateHoliday_c', 'DayOfWeek']
X = df_store[features]
y = df_store['Sales']

X = sm.add_constant(X)  # Add intercept term
model = sm.OLS(y, X).fit()
print(model.summary())


### Interpretation: Holiday Impact on Sales

The regression analysis shows that all three types of state holidays (`a`, `b`, and `c`) have a **statistically significant positive effect** on daily sales in Store 262:

- **StateHoliday_a**: On average, sales increase by **12,570 units**
- **StateHoliday_b**: On average, sales increase by **15,170 units**
- **StateHoliday_c**: On average, sales increase by **8,707 units**

These results suggest that Store 262 experiences **higher customer traffic or demand during state holidays**, regardless of promotions or school breaks. 

We conclude that **state holidays are positively associated with increased sales**, and should be considered an important factor in sales forecasting for this store.
