# Expore Data Analysis (EDA) for OCPR

### Data Overview:
Explore basic features of data e.g. ranges, averages, and distributions.

#### Trend Analysis:
Yearly Trends: Compare the revenue for each month across the three years to assess whether the business is growing, stagnating, or declining.
Monthly Trends: Analyze how revenue trends within each year to see if there are any particular patterns or peaks and troughs during the summer.
Trends in aggregate and across parks....

#### Seasonal Analysis:
Monthly comparisons: Since you have data specifically for June, July, and August, you can look into which month consistently performs best, and investigate what might be driving higher revenues during that period.

#### Day of the Week Analysis:
If data allows, analyze which days of the week are most profitable. This can help optimize staffing and marketing strategies.

#### Weather Impact Analysis (if weather data is available):
Correlate weather conditions with revenue performance, as weather can significantly impact facility usage.

#### Event Impact Analysis:
If any special events or promotions were held during these months, analyze their impact on revenue. This can help in planning future events or promotions more effectively. Focus on holidays.

In [1]:
from pathlib import Path

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import rcParams
import scipy.stats as stats


In [9]:
file_path_base = Path('../data/processed/')

weather = pd.read_csv(file_path_base / 'weather.csv')
parks_data = pd.read_csv(file_path_base / 'parks-data-long.csv')
parks_data.head()

Unnamed: 0,date,park_name,facility,variable,value
0,2022-06-01,Groveland Oaks,campground,campers,9
1,2022-06-01,Addison Oaks,campground,campers,10
2,2022-06-01,Groveland Oaks,campground,revenue,108
3,2022-06-01,Addison Oaks,campground,revenue,80
4,2022-06-01,Springfield Oaks,golf,revenue,1184


# Golf Analysis

In [24]:
def pivot(df: pd.DataFrame) -> pd.DataFrame:
    # Helper function for properly pivoting data
    index = ['date', 'park_name']
    columns = 'variable'
    values = 'value'
    return (df
            .pivot(index=index, columns=columns, values=values)
            .reset_index()
            .assign(date=lambda x: pd.to_datetime(x['date']))
    )

golf = pivot(parks_data[parks_data['facility'] == 'golf'])
golf.head()

variable,date,park_name,revenue,rounds played
0,2022-06-01,Glen Oaks,1248,32
1,2022-06-01,Springfield Oaks,1184,37
2,2022-06-02,Glen Oaks,1344,42
3,2022-06-02,Springfield Oaks,1485,45
4,2022-06-03,Glen Oaks,962,26


In [36]:
print(golf.describe().apply(f"{x:0.3f}".format))

NameError: name 'x' is not defined

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>variable</th>
      <th>date</th>
      <th>revenue</th>
      <th>rounds played</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>count</th>
      <td>552</td>
      <td>552.000000</td>
      <td>552.000000</td>
    </tr>
    <tr>
      <th>mean</th>
      <td>2023-07-16 20:00:00</td>
      <td>1105.789855</td>
      <td>31.574275</td>
    </tr>
    <tr>
      <th>min</th>
      <td>2022-06-01 00:00:00</td>
      <td>600.000000</td>
      <td>20.000000</td>
    </tr>
    <tr>
      <th>25%</th>
      <td>2022-08-08 18:00:00</td>
      <td>900.000000</td>
      <td>26.000000</td>
    </tr>
    <tr>
      <th>50%</th>
      <td>2023-07-16 12:00:00</td>
      <td>1088.000000</td>
      <td>32.000000</td>
    </tr>
    <tr>
      <th>75%</th>
      <td>2024-06-23 06:00:00</td>
      <td>1287.000000</td>
      <td>36.000000</td>
    </tr>
    <tr>
      <th>max</th>
      <td>2024-08-31 00:00:00</td>
      <td>1950.000000</td>
      <td>50.000000</td>
    </tr>
    <tr>
      <th>std</th>
      <td>NaN</td>
      <td>258.532090</td>
      <td>6.882862</td>
    </tr>
  </tbody>
</table>