# Analyzing Economic and Fitness-Related Google Trends During the COVID-19 Pandemic

## Project Overview
This project investigates the relationship between macroeconomic indicators and fitness-related search behavior in the United States during the COVID-19 pandemic period. By combining official economic data from FRED (Federal Reserve Economic Data) with Google Trends data focused on home fitness equipment, gyms, and group fitness, we aim to understand how consumer interest shifted in response to economic and social changes.

## Data Sources
### FRED Economic Indicators (CSV files):
* US Recession Indicator (USREC)
* Unemployment Rate (UNRATE)
* Consumer Price Index (CPI)
* Consumer Sentiment Index
* Personal Consumption Expenditure (PCE) for recreation services and goods

### Google Trends Data (CSV files):
* Home equipment fitness keywords
* Traditional gym keywords
* Search shifts from gym to home workouts
* Group fitness keywords

### Project Goals
* Load and clean multiple datasets consistently
* Reshape Google Trends data into a tidy format for analysis
* Combine economic and search interest data for comparative analysis
* Explore trends in home fitness equipment interest vs. traditional gyms
* Lay groundwork for predictive modeling or visualization

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/data-science-econ-health-port-proj/google_trends_group_fitness.csv
/kaggle/input/data-science-econ-health-port-proj/fred_consumer_sentiment.csv
/kaggle/input/data-science-econ-health-port-proj/fred_pce_recreation_goods.csv
/kaggle/input/data-science-econ-health-port-proj/fred_UNRATE.csv
/kaggle/input/data-science-econ-health-port-proj/google_trends_gym_home_shift.csv
/kaggle/input/data-science-econ-health-port-proj/fred_USREC.csv
/kaggle/input/data-science-econ-health-port-proj/google_trends_traditional_gyms.csv
/kaggle/input/data-science-econ-health-port-proj/fred_pce_recreation_services.csv
/kaggle/input/data-science-econ-health-port-proj/fred_CPI.csv
/kaggle/input/data-science-econ-health-port-proj/google_trends_home_equipment.csv


In [2]:
# Preview the USREC dataset to understand its structure
df_usrec_preview = pd.read_csv('/kaggle/input/data-science-econ-health-port-proj/fred_USREC.csv')
df_usrec_preview.head()

Unnamed: 0,observation_date,USREC
0,1854-12-01,1
1,1855-01-01,0
2,1855-02-01,0
3,1855-03-01,0
4,1855-04-01,0


In [3]:
# Print column names for reference
df_usrec_preview.columns

Index(['observation_date', 'USREC'], dtype='object')

In [4]:
# Function to load and preprocess FRED economic datasets
def load_fred_data(file_path):
    # Load the CSV and parse the date column
    df = pd.read_csv(file_path, parse_dates=['observation_date'])
    
    # Rename the date column for consistency
    df.rename(columns={'observation_date': 'date'}, inplace=True)
    
    return df

# Load each FRED dataset
usrec = load_fred_data('/kaggle/input/data-science-econ-health-port-proj/fred_USREC.csv')
unrate = load_fred_data('/kaggle/input/data-science-econ-health-port-proj/fred_UNRATE.csv')
cpi = load_fred_data('/kaggle/input/data-science-econ-health-port-proj/fred_CPI.csv')
sentiment = load_fred_data('/kaggle/input/data-science-econ-health-port-proj/fred_consumer_sentiment.csv')
pce_services = load_fred_data('/kaggle/input/data-science-econ-health-port-proj/fred_pce_recreation_services.csv')
pce_goods = load_fred_data('/kaggle/input/data-science-econ-health-port-proj/fred_pce_recreation_goods.csv')

In [5]:
# Preview one of the Google Trends CSV files to inspect column names and format
temp = pd.read_csv('/kaggle/input/data-science-econ-health-port-proj/google_trends_home_equipment.csv')
print(temp.columns)
print(temp.head())

Index(['Category: All categories'], dtype='object')
                                                                                                                                                     Category: All categories
Week       weights for home: (United States) yoga mat: (United States) running shoes: (United States) resistance bands: (United States)  adjustable dumbbell: (United States)
2020-05-24 1                                 19                        94                             21                                                                    4
2020-05-31 1                                 16                        89                             17                                                                    3
2020-06-07 1                                 16                        87                             18                                                                    4
2020-06-14 1                                 16                        89     

In [6]:
# Function to load and clean Google Trends data files
def clean_trends_data(file_path):
    df = pd.read_csv(file_path, skiprows=1)  # skip the first row
    df.rename(columns={df.columns[0]: 'date'}, inplace=True)
    df['date'] = pd.to_datetime(df['date'])
    return df

In [7]:
# Load and clean each Google Trends dataset by category
trends_home_equipment = clean_trends_data('/kaggle/input/data-science-econ-health-port-proj/google_trends_home_equipment.csv')
trends_traditional_gyms = clean_trends_data('/kaggle/input/data-science-econ-health-port-proj/google_trends_traditional_gyms.csv')
trends_gym_home_shift = clean_trends_data('/kaggle/input/data-science-econ-health-port-proj/google_trends_gym_home_shift.csv')
trends_group_fitness = clean_trends_data('/kaggle/input/data-science-econ-health-port-proj/google_trends_group_fitness.csv')

In [8]:
# Function to reshape wide-format Google Trends data into a long tidy format
def reshape_trends(df, trend_label):
    df_long = df.melt(id_vars='date', var_name='keyword', value_name='search_interest')
    df_long['trend_type'] = trend_label
    return df_long

In [9]:
# Apply reshaping to each Google Trends dataset, tagging by trend category
home_equipment_long = reshape_trends(trends_home_equipment, 'Home Equipment')
traditional_gyms_long = reshape_trends(trends_traditional_gyms, 'Traditional Gyms')
gym_home_shift_long = reshape_trends(trends_gym_home_shift, 'Gym vs Home Shift')
group_fitness_long = reshape_trends(trends_group_fitness, 'Group Fitness')

In [10]:
# Combine all reshaped trends data into a single DataFrame for unified analysis
all_trends = pd.concat([
    home_equipment_long,
    traditional_gyms_long,
    gym_home_shift_long,
    group_fitness_long
], ignore_index=True)

In [11]:
# Preview the combined trends data
all_trends.head()

# Check how many records belong to each trend type to verify data completeness
all_trends['trend_type'].value_counts()

trend_type
Home Equipment       1305
Traditional Gyms     1305
Gym vs Home Shift    1305
Group Fitness        1305
Name: count, dtype: int64