# Weather Analysis

Quick analysis of effects of precipitation and temperature on revenu for golf courses, waterparks, and campgrounds.

Overall conclusion is that at a significance level of 0.05, temperature and precipiation do have a *statistically significant* effect on revenue for all facility types.

However, the models constructed have small R^2 values of 0.13 or less. For all facility types our daily weather variables explain 13% or less of the variation in daily revenue. Estimated average effects of weather on daily revenue was much smaller for campgrounds than either golf courses or waterparks. This makes intuitive sense as campgrounds visitors are likely to be more committed to a multiday plan and less likely to cancel due to weather, whereas golf courses and waterparks are more likely to have visitors cancel their daily plans based on the weather. 

Further analysis of this dataset would involve looking for lagged effects of weather on revenue to see if there is a stronger explanatory effect of previous days weather or even the moving average of previous days' weather.

In [2]:
from pathlib import Path
import pandas as pd
import numpy as np
import matplotlib as plt
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.regression.linear_model import RegressionResultsWrapper

In [3]:
processed_data_path = Path('../data/processed')

parks_data = pd.read_csv(processed_data_path / 'parks-data-long.csv')
weather = pd.read_csv(processed_data_path / 'weather.csv')
df = (pd
      .merge(parks_data, weather, on='date', how='left')
      .loc[lambda x: x['variable'] == 'revenue']
      .rename(columns={'value':'revenue'})
      .drop(columns=['variable'])
      .assign(month=lambda x: pd.to_datetime(x['date']).dt.month,
              year=lambda x: pd.to_datetime(x['date']).dt.year,
              weekday=lambda x: pd.to_datetime(x['date']).dt.day_name(),
      )
)

df.head()

Unnamed: 0,date,park_name,facility,revenue,temp_f,precip_in,month,year,weekday
2,2022-06-01,Groveland Oaks,campground,108,75,0.42,6,2022,Wednesday
3,2022-06-01,Addison Oaks,campground,80,75,0.42,6,2022,Wednesday
4,2022-06-01,Springfield Oaks,golf,1184,75,0.42,6,2022,Wednesday
5,2022-06-01,Glen Oaks,golf,1248,75,0.42,6,2022,Wednesday
8,2022-06-01,Red Oaks,waterpark,402,75,0.42,6,2022,Wednesday


In [4]:
def build_regression_model(df: pd.DataFrame, facility_type: str) -> RegressionResultsWrapper:
    # Convert month and weekday to categorical variables
    df['month'] = df['month'].astype('category')
    df['weekday'] = df['weekday'].astype('category')

    subset = df[df['facility'] == facility_type]
        
        # Build the model using Patsy formulas to handle categorical variables
    formula = 'revenue ~ temp_f + precip_in'
    model = smf.ols(formula, data=subset).fit()
    
    return model

### Golf

In [5]:
golf_model = build_regression_model(df, 'golf')
print("Facility Type: Golf")
print(golf_model.summary())

Facility Type: Golf
                            OLS Regression Results                            
Dep. Variable:                revenue   R-squared:                       0.128
Model:                            OLS   Adj. R-squared:                  0.125
Method:                 Least Squares   F-statistic:                     40.18
Date:                Thu, 25 Apr 2024   Prob (F-statistic):           5.16e-17
Time:                        09:24:44   Log-Likelihood:                -3811.4
No. Observations:                 552   AIC:                             7629.
Df Residuals:                     549   BIC:                             7642.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept    590.8467     89.471

### Campground

In [6]:
campground_model = build_regression_model(df, 'campground')
print("Facility Type: Campground")
print(campground_model.summary())

Facility Type: Campground
                            OLS Regression Results                            
Dep. Variable:                revenue   R-squared:                       0.118
Model:                            OLS   Adj. R-squared:                  0.115
Method:                 Least Squares   F-statistic:                     36.88
Date:                Thu, 25 Apr 2024   Prob (F-statistic):           9.41e-16
Time:                        09:24:44   Log-Likelihood:                -2865.8
No. Observations:                 552   AIC:                             5738.
Df Residuals:                     549   BIC:                             5750.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     41.4988     

### Waterpark

In [7]:
waterpark_model = build_regression_model(df, 'waterpark')
print("Facility Type: Waterpark")
print(waterpark_model.summary())

Facility Type: Waterpark
                            OLS Regression Results                            
Dep. Variable:                revenue   R-squared:                       0.064
Model:                            OLS   Adj. R-squared:                  0.060
Method:                 Least Squares   F-statistic:                     18.73
Date:                Thu, 25 Apr 2024   Prob (F-statistic):           1.35e-08
Time:                        09:24:44   Log-Likelihood:                -3973.9
No. Observations:                 552   AIC:                             7954.
Df Residuals:                     549   BIC:                             7967.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept    401.2879    12

In [8]:
df

Unnamed: 0,date,park_name,facility,revenue,temp_f,precip_in,month,year,weekday
2,2022-06-01,Groveland Oaks,campground,108,75,0.42,6,2022,Wednesday
3,2022-06-01,Addison Oaks,campground,80,75,0.42,6,2022,Wednesday
4,2022-06-01,Springfield Oaks,golf,1184,75,0.42,6,2022,Wednesday
5,2022-06-01,Glen Oaks,golf,1248,75,0.42,6,2022,Wednesday
8,2022-06-01,Red Oaks,waterpark,402,75,0.42,6,2022,Wednesday
...,...,...,...,...,...,...,...,...,...
3303,2024-08-31,Addison Oaks,campground,143,65,0.18,8,2024,Saturday
3304,2024-08-31,Springfield Oaks,golf,880,65,0.18,8,2024,Saturday
3305,2024-08-31,Glen Oaks,golf,1147,65,0.18,8,2024,Saturday
3308,2024-08-31,Red Oaks,waterpark,1280,65,0.18,8,2024,Saturday
