# Explore files

In [1]:
import os
import pandas as pd

# Specify the directory containing the CSV files
directory = 'data'

# Loop through all files in the directory
for filename in os.listdir(directory):
    if filename.endswith('.csv'):
        file_path = os.path.join(directory, filename)
        # Read the CSV file into a DataFrame
        df = pd.read_csv(file_path)
        # Print the DataFrame
        print(f'Contents of {filename}:')
        print(df)
        print('\n')


Contents of checkin_checkout_history_updated.csv:
          user_id  gym_id         checkin_time        checkout_time  \
0       user_3291   gym_6  2023-09-10 15:55:00  2023-09-10 16:34:00   
1       user_1944   gym_2  2023-04-13 20:07:00  2023-04-13 22:43:00   
2        user_958   gym_7  2023-06-10 12:24:00  2023-06-10 13:49:00   
3        user_811   gym_2  2023-05-23 17:11:00  2023-05-23 20:01:00   
4       user_4923  gym_10  2023-02-21 06:20:00  2023-02-21 08:02:00   
...           ...     ...                  ...                  ...   
299995  user_3995   gym_3  2023-08-06 17:25:00  2023-08-06 18:09:00   
299996   user_206   gym_9  2023-06-27 13:14:00  2023-06-27 16:04:00   
299997  user_4983   gym_4  2023-04-08 14:41:00  2023-04-08 15:54:00   
299998  user_1028  gym_10  2023-03-05 06:07:00  2023-03-05 07:04:00   
299999  user_3314   gym_4  2023-01-05 08:58:00  2023-01-05 09:48:00   

         workout_type  calories_burned  
0       Weightlifting              462  
1              

# Data Modeling

### Normalized (OLTP) schema
Designed to reduce redundancy and ensure data integrity by organizing the data into multiple related tables.

1. **Users**
   - **user_id_pk** (PK)
   - **fk_subscription_plan_id** (FK)
   - **first_name**
   - **last_name**
   - **age**
   - **gender**
   - **birthdate**
   - **sign_up_date**
   - **user_location**

2. **Gyms**
   - **gym_id_pk** (PK)
   - **location**
   - **gym_type**
   - **facilities**

3. **Subscription Plans**
   - **subscription_plan_id_pk** (PK)
   - **plan_name**
   - **price_per_month**
   - **features**

4. **Check-ins**
   - **checkin_id_pk** (PK)
   - **fk_user_id** (FK)
   - **fk_gym_id** (FK)
   - **checkin_time**
   - **checkout_time**
   - **workout_type**
   - **calories_burned**

#### Relationships
- **Users** to **Subscription Plans**:  One-to-Many
- **Check-ins** to **Users**:  Many-to-One
- **Check-ins** to **Gyms**:  Many-to-One

#### Schema Definition

```sql
CREATE TABLE users (
    user_id_pk INT PRIMARY KEY,
    fk_subscription_plan_id INT,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    age INT,
    gender VARCHAR(10),
    birthdate DATE,
    sign_up_date DATE,
    user_location VARCHAR(100),
    FOREIGN KEY (fk_subscription_plan_id) REFERENCES subscription_plans(subscription_plan_id)
);

CREATE TABLE gyms (
    gym_id_pk INT PRIMARY KEY,
    location VARCHAR(100),
    gym_type VARCHAR(50),
    facilities TEXT
);

CREATE TABLE subscription_plans (
    subscription_plan_id_pk INT PRIMARY KEY,
    plan_name VARCHAR(50),
    price_per_month DECIMAL(10, 2),
    features TEXT
);

CREATE TABLE checkins (
    checkin_id_pk INT PRIMARY KEY,
    fk_user_id INT,
    fk_gym_id INT,
    checkin_time DATETIME,
    checkout_time DATETIME,
    workout_type VARCHAR(50),
    calories_burned INT,
    FOREIGN KEY (fk_user_id) REFERENCES users(user_id),
    FOREIGN KEY (fk_gym_id) REFERENCES gyms(gym_id)
);
```


### Star schema
Simplifies complex queries with a central fact table connected to dimension tables.

#### Schema definition

**Fact Table: Check-in/Checkout History**
- **fk_user_id** (FK)
- **fk_gym_id** (FK)
- **checkin_time**
- **checkout_time**
- **workout_type**
- **calories_burned**

**Dimension Tables:**

1. **Users**
   - **user_id_pk** (PK)
   - **first_name**
   - **last_name**
   - **age**
   - **gender**
   - **birthdate**
   - **sign_up_date**
   - **user_location**
   - **subscription_plan**

2. **Gyms**
   - **gym_id_pk** (PK)
   - **location**
   - **gym_type**
   - **facilities**

3. **Subscription Plans**
   - **subscription_plan_pk** (PK)
   - **price_per_month**
   - **features**


### Snowflake schema
Normalized form of the star schema, reducing redundancy.