# Final Dataset Cleaning

## Summary of Cleaning Tasks

1. **Merged Sales and Climate Dataframes**  
   - Combined the sales data and climate data into a single unified dataframe.

2. **Added Holiday Information**  
   - Used the `holidays` library to identify all holidays.
   - Created new columns:  
     - `holiday`: Name of the holiday.  
     - `is_holiday`: Indicator (1/0) if the date is a holiday.

3. **Merged Holiday Dataframe**  
   - Merged the holiday dataframe with the final dataframe.

4. **Filled Non-Holiday Days**  
   - Filled all non-holiday days with the label **"Regular Day"**.

5. **Filtered Date Range**  
   - Filtered rows to only include dates between **2023-02-01** and **2024-11-30** (inclusive) to ensure data completeness.

6. **Added Holiday-Related Columns**  
   - Added four new columns to indicate proximity to holidays:
     - **`is_holiday_prev_2`**: 1 if two days before a holiday, else 0.  
     - **`is_holiday_prev_1`**: 1 if one day before a holiday, else 0.  
     - **`is_holiday_next_1`**: 1 if one day after a holiday, else 0.  
     - **`is_holiday_next_2`**: 1 if two days after a holiday, else 0.

7. **Added 'Season' Column**  
   - Added a `season` column based on the date to capture seasonal trends.

8. **Added 'Temp Category' Column**  
   - Created a `temp_category` column to categorize temperature based on retail customer behavior (e.g., "Cold", "Warm").


In [1]:
import pandas as pd 
import numpy as np
import holidays
from datetime import date

In [2]:
sales_df = pd.read_csv('../data/clean/merged_sales.csv') # parse_dates=[]
climate_df = pd.read_csv('../data/clean/merged_climate.csv')

In [3]:
# Merge the sales and climate datasets on the 'date' column
df = pd.merge(sales_df, climate_df, on='date', how='inner')

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 708 entries, 0 to 707
Data columns (total 20 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   date               708 non-null    object 
 1   day                708 non-null    object 
 2   gross_sales        708 non-null    float64
 3   returns            708 non-null    float64
 4   discounts_comps    708 non-null    float64
 5   net_sales          708 non-null    float64
 6   gift_card_sales    708 non-null    float64
 7   tax                708 non-null    float64
 8   tip                708 non-null    float64
 9   refunds_by_amount  708 non-null    float64
 10  total_collected    708 non-null    float64
 11  cash               708 non-null    float64
 12  card               708 non-null    float64
 13  gift_card          708 non-null    float64
 14  fees               708 non-null    float64
 15  is_store_open      708 non-null    int64  
 16  mean_temp_c        708 non

In [5]:
#create a holidays dataframe using import holidays
ontario_holidays = holidays.CA(subdiv="ON", years=[2023, 2024])
holidays_list = list(ontario_holidays.items())

if holidays_list:
    ontario_holidays_df = pd.DataFrame(holidays_list, columns=['date', 'holiday_name'])
    ontario_holidays_df['is_holiday'] = True
    display(ontario_holidays_df)
else:
    print("No holidays found for 2023 and 2024.")

Unnamed: 0,date,holiday_name,is_holiday
0,2024-01-01,New Year's Day,True
1,2024-03-29,Good Friday,True
2,2024-07-01,Canada Day,True
3,2024-09-02,Labour Day,True
4,2024-12-25,Christmas Day,True
5,2024-02-19,Family Day,True
6,2024-05-20,Victoria Day,True
7,2024-10-14,Thanksgiving Day,True
8,2024-12-26,Boxing Day,True
9,2023-01-01,New Year's Day,True


In [6]:
# convert sure the date columns are datetime objects
df['date'] = pd.to_datetime(df['date'])
ontario_holidays_df['date'] = pd.to_datetime(ontario_holidays_df['date'])

In [7]:
display(df)

Unnamed: 0,date,day,gross_sales,returns,discounts_comps,net_sales,gift_card_sales,tax,tip,refunds_by_amount,total_collected,cash,card,gift_card,fees,is_store_open,mean_temp_c,total_rain_mm,total_precip_mm,total_snow_mm
0,2023-01-01,Sunday,0.00,0.0,0.00,0.00,0.0,0.00,0.00,0.0,0.00,0.00,0.00,0.00,0.00,0,2.9,0.6,0.6,0.0
1,2023-01-02,Monday,0.00,0.0,0.00,0.00,0.0,0.00,0.00,0.0,0.00,0.00,0.00,0.00,0.00,0,1.8,0.0,0.0,0.0
2,2023-01-03,Tuesday,0.00,0.0,0.00,0.00,0.0,0.00,0.00,0.0,0.00,0.00,0.00,0.00,0.00,0,1.7,4.8,4.8,0.0
3,2023-01-04,Wednesday,0.00,0.0,0.00,0.00,0.0,0.00,0.00,0.0,0.00,0.00,0.00,0.00,0.00,0,2.6,16.6,16.6,0.0
4,2023-01-05,Thursday,0.00,0.0,0.00,0.00,0.0,0.00,0.00,0.0,0.00,0.00,0.00,0.00,0.00,0,2.3,0.4,0.4,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
703,2024-12-04,Wednesday,2968.86,0.0,-70.98,2897.88,25.0,288.48,144.41,0.0,3355.60,457.50,2898.10,0.00,-48.33,1,-1.0,0.0,3.4,44.0
704,2024-12-05,Thursday,2743.55,0.0,-55.58,2687.97,140.0,275.64,163.85,0.0,3267.30,338.35,2911.73,17.22,-53.26,1,-3.8,0.0,0.6,10.0
705,2024-12-06,Friday,3454.79,0.0,-140.47,3314.32,0.0,340.48,204.40,0.0,3859.24,327.10,3495.67,36.47,-64.17,1,-7.6,0.0,0.0,0.0
706,2024-12-07,Saturday,5714.15,0.0,-150.58,5563.57,40.0,540.81,326.05,0.0,6470.36,462.60,5999.76,8.00,-106.80,1,0.4,0.0,0.3,3.0


In [8]:
# Perform the merge
df = df.merge(ontario_holidays_df, on='date', how='left')

# Fill missing values for holidays
df['is_holiday'] = df['is_holiday'].fillna(False).infer_objects(copy=False)
df['holiday_name'] = df['holiday_name'].fillna('Regular Day')

  df['is_holiday'] = df['is_holiday'].fillna(False).infer_objects(copy=False)


In [9]:
display(df)

Unnamed: 0,date,day,gross_sales,returns,discounts_comps,net_sales,gift_card_sales,tax,tip,refunds_by_amount,...,card,gift_card,fees,is_store_open,mean_temp_c,total_rain_mm,total_precip_mm,total_snow_mm,holiday_name,is_holiday
0,2023-01-01,Sunday,0.00,0.0,0.00,0.00,0.0,0.00,0.00,0.0,...,0.00,0.00,0.00,0,2.9,0.6,0.6,0.0,New Year's Day,True
1,2023-01-02,Monday,0.00,0.0,0.00,0.00,0.0,0.00,0.00,0.0,...,0.00,0.00,0.00,0,1.8,0.0,0.0,0.0,New Year's Day (observed),True
2,2023-01-03,Tuesday,0.00,0.0,0.00,0.00,0.0,0.00,0.00,0.0,...,0.00,0.00,0.00,0,1.7,4.8,4.8,0.0,Regular Day,False
3,2023-01-04,Wednesday,0.00,0.0,0.00,0.00,0.0,0.00,0.00,0.0,...,0.00,0.00,0.00,0,2.6,16.6,16.6,0.0,Regular Day,False
4,2023-01-05,Thursday,0.00,0.0,0.00,0.00,0.0,0.00,0.00,0.0,...,0.00,0.00,0.00,0,2.3,0.4,0.4,0.0,Regular Day,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
703,2024-12-04,Wednesday,2968.86,0.0,-70.98,2897.88,25.0,288.48,144.41,0.0,...,2898.10,0.00,-48.33,1,-1.0,0.0,3.4,44.0,Regular Day,False
704,2024-12-05,Thursday,2743.55,0.0,-55.58,2687.97,140.0,275.64,163.85,0.0,...,2911.73,17.22,-53.26,1,-3.8,0.0,0.6,10.0,Regular Day,False
705,2024-12-06,Friday,3454.79,0.0,-140.47,3314.32,0.0,340.48,204.40,0.0,...,3495.67,36.47,-64.17,1,-7.6,0.0,0.0,0.0,Regular Day,False
706,2024-12-07,Saturday,5714.15,0.0,-150.58,5563.57,40.0,540.81,326.05,0.0,...,5999.76,8.00,-106.80,1,0.4,0.0,0.3,3.0,Regular Day,False


In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 708 entries, 0 to 707
Data columns (total 22 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   date               708 non-null    datetime64[ns]
 1   day                708 non-null    object        
 2   gross_sales        708 non-null    float64       
 3   returns            708 non-null    float64       
 4   discounts_comps    708 non-null    float64       
 5   net_sales          708 non-null    float64       
 6   gift_card_sales    708 non-null    float64       
 7   tax                708 non-null    float64       
 8   tip                708 non-null    float64       
 9   refunds_by_amount  708 non-null    float64       
 10  total_collected    708 non-null    float64       
 11  cash               708 non-null    float64       
 12  card               708 non-null    float64       
 13  gift_card          708 non-null    float64       
 14  fees      

In [11]:
# Filter rows to keep only those up to and including 2024-12-08
df = df[df['date'] >= '2023-01-11']

In [12]:
# Reset the index after filtering
df.reset_index(drop=True, inplace=True)

In [13]:
df.head()

Unnamed: 0,date,day,gross_sales,returns,discounts_comps,net_sales,gift_card_sales,tax,tip,refunds_by_amount,...,card,gift_card,fees,is_store_open,mean_temp_c,total_rain_mm,total_precip_mm,total_snow_mm,holiday_name,is_holiday
0,2023-01-11,Wednesday,972.23,0.0,-12.13,960.1,0.0,94.22,58.52,0.0,...,1056.87,0.0,-17.7,1,-1.7,0.0,0.0,0.0,Regular Day,False
1,2023-01-12,Thursday,1316.28,0.0,-51.6,1264.68,0.0,113.85,66.53,0.0,...,1284.91,42.09,-25.7,1,2.4,3.0,7.0,40.0,Regular Day,False
2,2023-01-13,Friday,1354.81,0.0,-24.1,1330.71,0.0,118.38,78.79,0.0,...,1422.53,25.0,-24.01,1,-3.7,0.0,4.2,42.0,Regular Day,False
3,2023-01-14,Saturday,2419.78,0.0,-31.97,2387.81,75.0,215.01,177.15,0.0,...,2651.45,0.0,-43.19,1,-6.55,0.0,0.0,0.0,Regular Day,False
4,2023-01-15,Sunday,1423.06,-23.0,-107.92,1292.14,0.0,111.4,128.24,0.0,...,1477.18,0.0,-26.27,1,-9.4,0.0,0.0,0.0,Regular Day,False


In [14]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 698 entries, 0 to 697
Data columns (total 22 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   date               698 non-null    datetime64[ns]
 1   day                698 non-null    object        
 2   gross_sales        698 non-null    float64       
 3   returns            698 non-null    float64       
 4   discounts_comps    698 non-null    float64       
 5   net_sales          698 non-null    float64       
 6   gift_card_sales    698 non-null    float64       
 7   tax                698 non-null    float64       
 8   tip                698 non-null    float64       
 9   refunds_by_amount  698 non-null    float64       
 10  total_collected    698 non-null    float64       
 11  cash               698 non-null    float64       
 12  card               698 non-null    float64       
 13  gift_card          698 non-null    float64       
 14  fees      

In [15]:
# Create new columns for +1, -1, +2, -2 days around a holiday
df['is_holiday_prev_1'] = df['is_holiday'].shift(1, fill_value=False)
df['is_holiday_next_1'] = df['is_holiday'].shift(-1, fill_value=False)
df['is_holiday_prev_2'] = df['is_holiday'].shift(2, fill_value=False)
df['is_holiday_next_2'] = df['is_holiday'].shift(-2, fill_value=False)

In [16]:
df.head()

Unnamed: 0,date,day,gross_sales,returns,discounts_comps,net_sales,gift_card_sales,tax,tip,refunds_by_amount,...,mean_temp_c,total_rain_mm,total_precip_mm,total_snow_mm,holiday_name,is_holiday,is_holiday_prev_1,is_holiday_next_1,is_holiday_prev_2,is_holiday_next_2
0,2023-01-11,Wednesday,972.23,0.0,-12.13,960.1,0.0,94.22,58.52,0.0,...,-1.7,0.0,0.0,0.0,Regular Day,False,False,False,False,False
1,2023-01-12,Thursday,1316.28,0.0,-51.6,1264.68,0.0,113.85,66.53,0.0,...,2.4,3.0,7.0,40.0,Regular Day,False,False,False,False,False
2,2023-01-13,Friday,1354.81,0.0,-24.1,1330.71,0.0,118.38,78.79,0.0,...,-3.7,0.0,4.2,42.0,Regular Day,False,False,False,False,False
3,2023-01-14,Saturday,2419.78,0.0,-31.97,2387.81,75.0,215.01,177.15,0.0,...,-6.55,0.0,0.0,0.0,Regular Day,False,False,False,False,False
4,2023-01-15,Sunday,1423.06,-23.0,-107.92,1292.14,0.0,111.4,128.24,0.0,...,-9.4,0.0,0.0,0.0,Regular Day,False,False,False,False,False


In [17]:
df.drop(columns = ['total_collected'], inplace=True)

In [18]:
df.columns

Index(['date', 'day', 'gross_sales', 'returns', 'discounts_comps', 'net_sales',
       'gift_card_sales', 'tax', 'tip', 'refunds_by_amount', 'cash', 'card',
       'gift_card', 'fees', 'is_store_open', 'mean_temp_c', 'total_rain_mm',
       'total_precip_mm', 'total_snow_mm', 'holiday_name', 'is_holiday',
       'is_holiday_prev_1', 'is_holiday_next_1', 'is_holiday_prev_2',
       'is_holiday_next_2'],
      dtype='object')

In [19]:
# List of columns to convert
columns_to_convert = [
    'is_holiday', 
    'is_holiday_prev_1', 
    'is_holiday_next_1', 
    'is_holiday_prev_2', 
    'is_holiday_next_2'
]

# Convert specified columns to integers (1/0)
df[columns_to_convert] = df[columns_to_convert].astype(int)


In [20]:
# Checking the conversion
df.head()

Unnamed: 0,date,day,gross_sales,returns,discounts_comps,net_sales,gift_card_sales,tax,tip,refunds_by_amount,...,mean_temp_c,total_rain_mm,total_precip_mm,total_snow_mm,holiday_name,is_holiday,is_holiday_prev_1,is_holiday_next_1,is_holiday_prev_2,is_holiday_next_2
0,2023-01-11,Wednesday,972.23,0.0,-12.13,960.1,0.0,94.22,58.52,0.0,...,-1.7,0.0,0.0,0.0,Regular Day,0,0,0,0,0
1,2023-01-12,Thursday,1316.28,0.0,-51.6,1264.68,0.0,113.85,66.53,0.0,...,2.4,3.0,7.0,40.0,Regular Day,0,0,0,0,0
2,2023-01-13,Friday,1354.81,0.0,-24.1,1330.71,0.0,118.38,78.79,0.0,...,-3.7,0.0,4.2,42.0,Regular Day,0,0,0,0,0
3,2023-01-14,Saturday,2419.78,0.0,-31.97,2387.81,75.0,215.01,177.15,0.0,...,-6.55,0.0,0.0,0.0,Regular Day,0,0,0,0,0
4,2023-01-15,Sunday,1423.06,-23.0,-107.92,1292.14,0.0,111.4,128.24,0.0,...,-9.4,0.0,0.0,0.0,Regular Day,0,0,0,0,0


In [21]:
# Filter rows to only include dates between 2023-02-01 and 2024-11-30 (inclusive)
df = df[(df['date'] >= '2023-02-01') & (df['date'] <= '2024-11-30')]


In [22]:
# Checking for how many rows
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 669 entries, 21 to 689
Data columns (total 25 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   date               669 non-null    datetime64[ns]
 1   day                669 non-null    object        
 2   gross_sales        669 non-null    float64       
 3   returns            669 non-null    float64       
 4   discounts_comps    669 non-null    float64       
 5   net_sales          669 non-null    float64       
 6   gift_card_sales    669 non-null    float64       
 7   tax                669 non-null    float64       
 8   tip                669 non-null    float64       
 9   refunds_by_amount  669 non-null    float64       
 10  cash               669 non-null    float64       
 11  card               669 non-null    float64       
 12  gift_card          669 non-null    float64       
 13  fees               669 non-null    float64       
 14  is_store_open 

In [23]:
# Reset the index after filtering
df.reset_index(drop=True, inplace=True)

In [24]:
# Checking for number of rows
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 669 entries, 0 to 668
Data columns (total 25 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   date               669 non-null    datetime64[ns]
 1   day                669 non-null    object        
 2   gross_sales        669 non-null    float64       
 3   returns            669 non-null    float64       
 4   discounts_comps    669 non-null    float64       
 5   net_sales          669 non-null    float64       
 6   gift_card_sales    669 non-null    float64       
 7   tax                669 non-null    float64       
 8   tip                669 non-null    float64       
 9   refunds_by_amount  669 non-null    float64       
 10  cash               669 non-null    float64       
 11  card               669 non-null    float64       
 12  gift_card          669 non-null    float64       
 13  fees               669 non-null    float64       
 14  is_store_o

In [25]:
# Checking for correct indexing
display(df)

Unnamed: 0,date,day,gross_sales,returns,discounts_comps,net_sales,gift_card_sales,tax,tip,refunds_by_amount,...,mean_temp_c,total_rain_mm,total_precip_mm,total_snow_mm,holiday_name,is_holiday,is_holiday_prev_1,is_holiday_next_1,is_holiday_prev_2,is_holiday_next_2
0,2023-02-01,Wednesday,919.07,0.0,-33.35,885.72,0.0,84.44,42.35,0.0,...,-7.95,0.0,0.0,0.0,Regular Day,0,0,0,0,0
1,2023-02-02,Thursday,1463.52,0.0,-20.61,1442.91,0.0,108.76,72.70,0.0,...,-5.10,0.0,0.0,2.0,Regular Day,0,0,0,0,0
2,2023-02-03,Friday,1051.04,0.0,-9.60,1041.44,0.0,93.65,49.94,0.0,...,-7.70,0.0,0.3,4.0,Regular Day,0,0,0,0,0
3,2023-02-04,Saturday,2243.72,0.0,-12.43,2231.29,0.0,176.67,186.98,0.0,...,-10.30,0.0,0.0,2.0,Regular Day,0,0,0,0,0
4,2023-02-05,Sunday,1405.99,0.0,-25.12,1380.87,0.0,85.04,77.20,0.0,...,1.40,0.0,0.0,0.0,Regular Day,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
664,2024-11-26,Tuesday,2969.38,0.0,-48.67,2920.71,0.0,312.39,206.18,0.0,...,3.50,1.5,1.5,0.0,Regular Day,0,0,0,0,0
665,2024-11-27,Wednesday,3147.48,0.0,-84.48,3063.00,90.0,284.97,161.00,0.0,...,3.15,0.0,0.0,0.0,Regular Day,0,0,0,0,0
666,2024-11-28,Thursday,3178.31,0.0,-90.69,3087.62,0.0,326.08,180.23,0.0,...,2.80,0.4,0.4,0.0,Regular Day,0,0,0,0,0
667,2024-11-29,Friday,4407.06,-7.5,-187.77,4211.79,15.0,422.73,252.49,0.0,...,-1.10,0.0,0.0,0.0,Regular Day,0,0,0,0,0


In [26]:
# function to categorize temperature 
def categorize_temperature(temp):
    if temp <= 5:
        return 'Cold'
    elif 6 <= temp <= 15:
        return 'Cool'
    elif 16 <= temp <= 25:
        return 'Comfortable'
    else:
        return 'Hot'

# Apply the function to create a new 'temp_category' column
df['temp_category'] = df['mean_temp_c'].apply(categorize_temperature)

# Display the updated DataFrame
print(df.head())


        date        day  gross_sales  returns  discounts_comps  net_sales  \
0 2023-02-01  Wednesday       919.07      0.0           -33.35     885.72   
1 2023-02-02   Thursday      1463.52      0.0           -20.61    1442.91   
2 2023-02-03     Friday      1051.04      0.0            -9.60    1041.44   
3 2023-02-04   Saturday      2243.72      0.0           -12.43    2231.29   
4 2023-02-05     Sunday      1405.99      0.0           -25.12    1380.87   

   gift_card_sales     tax     tip  refunds_by_amount  ...  total_rain_mm  \
0              0.0   84.44   42.35                0.0  ...            0.0   
1              0.0  108.76   72.70                0.0  ...            0.0   
2              0.0   93.65   49.94                0.0  ...            0.0   
3              0.0  176.67  186.98                0.0  ...            0.0   
4              0.0   85.04   77.20                0.0  ...            0.0   

   total_precip_mm  total_snow_mm  holiday_name  is_holiday  \
0          

In [31]:
# Define a function to determine the season
def get_season(date):
    month = date.month
    if month in [12, 1, 2]:
        return 'Winter'
    elif month in [3, 4, 5]:
        return 'Spring'
    elif month in [6, 7, 8]:
        return 'Summer'
    else:
        return 'Fall'

# Apply the function to create the 'season' column
df['season'] = df['date'].apply(get_season)

# Display the updated DataFrame
print(df)

          date        day  gross_sales  returns  discounts_comps  net_sales  \
0   2023-02-01  Wednesday       919.07      0.0           -33.35     885.72   
1   2023-02-02   Thursday      1463.52      0.0           -20.61    1442.91   
2   2023-02-03     Friday      1051.04      0.0            -9.60    1041.44   
3   2023-02-04   Saturday      2243.72      0.0           -12.43    2231.29   
4   2023-02-05     Sunday      1405.99      0.0           -25.12    1380.87   
..         ...        ...          ...      ...              ...        ...   
664 2024-11-26    Tuesday      2969.38      0.0           -48.67    2920.71   
665 2024-11-27  Wednesday      3147.48      0.0           -84.48    3063.00   
666 2024-11-28   Thursday      3178.31      0.0           -90.69    3087.62   
667 2024-11-29     Friday      4407.06     -7.5          -187.77    4211.79   
668 2024-11-30   Saturday      5651.18      0.0          -279.29    5371.89   

     gift_card_sales     tax     tip  refunds_by_am

In [32]:
df.head()

Unnamed: 0,date,day,gross_sales,returns,discounts_comps,net_sales,gift_card_sales,tax,tip,refunds_by_amount,...,total_precip_mm,total_snow_mm,holiday_name,is_holiday,is_holiday_prev_1,is_holiday_next_1,is_holiday_prev_2,is_holiday_next_2,temp_category,season
0,2023-02-01,Wednesday,919.07,0.0,-33.35,885.72,0.0,84.44,42.35,0.0,...,0.0,0.0,Regular Day,0,0,0,0,0,Cold,Winter
1,2023-02-02,Thursday,1463.52,0.0,-20.61,1442.91,0.0,108.76,72.7,0.0,...,0.0,2.0,Regular Day,0,0,0,0,0,Cold,Winter
2,2023-02-03,Friday,1051.04,0.0,-9.6,1041.44,0.0,93.65,49.94,0.0,...,0.3,4.0,Regular Day,0,0,0,0,0,Cold,Winter
3,2023-02-04,Saturday,2243.72,0.0,-12.43,2231.29,0.0,176.67,186.98,0.0,...,0.0,2.0,Regular Day,0,0,0,0,0,Cold,Winter
4,2023-02-05,Sunday,1405.99,0.0,-25.12,1380.87,0.0,85.04,77.2,0.0,...,0.0,0.0,Regular Day,0,0,0,0,0,Cold,Winter


In [35]:
#save .csv file with cleaned and formatted data
df.to_csv('../data/clean/veronicas_cleaned_df.csv', index=False)