# 🧠 Gym Progress Analysis – User: Kuba

---

## 📋 Overview

This notebook presents an exploratory data analysis (EDA) of gym training data for a single user: **Kuba**.  
The goal is to understand his training patterns, volume progression, performance changes over time, and relationships between training variables (e.g. sleep, mood, stress).

We will use data loaded from an SQLite database that contains structured records of:

- Training sessions (date, bodyweight, mood, measurements, etc.)
- Exercises (name, muscle group, weight, reps)
- Additional context (program phase, duration, stress level)

---

## 🎯 Objectives

- Visualize training frequency and exercise history
- Track progression of specific lifts (e.g. Bench Press, Deadlift)
- Analyze weekly training volume per muscle group
- Examine correlations between training variables
- Prepare data for future modeling (e.g. predicting strength or body changes)

---


In [1]:
import os
import sys
import pandas as pd
import plotly.express as px
from database.queries import get_all_users
from dataframe_builder import build_user_dataframe


In [2]:
person = get_all_users()
print(person)

[(1, 'Kuba', 'kuba@example.com'), (2, 'Anna', 'anna@example.com')]


In [3]:
df = build_user_dataframe(1)
df.head()

Unnamed: 0,session_id,training_day,date,body_weight,mood,dchest,arms,waist,legs,shoulders,training_duration_minutes,sleep_hours,stress_level,program_phase,exercise,muscle_group,weight,reps,volume
0,1,FULLBODY1,2012-01-02,70.1,tired,82.8,25.2,60.4,45.7,98.6,71.5,6.4,3,deload,Back Squat,legs,39.4,9,354.6
1,1,FULLBODY1,2012-01-02,70.1,tired,82.8,25.2,60.4,45.7,98.6,71.5,6.4,3,deload,Barbell Bench Press,chest,39.9,8,319.2
2,1,FULLBODY1,2012-01-02,70.1,tired,82.8,25.2,60.4,45.7,98.6,71.5,6.4,3,deload,Lat Pulldown,back,35.6,10,356.0
3,1,FULLBODY1,2012-01-02,70.1,tired,82.8,25.2,60.4,45.7,98.6,71.5,6.4,3,deload,Romanian Deadlift,hamstrings,37.1,6,222.6
4,1,FULLBODY1,2012-01-02,70.1,tired,82.8,25.2,60.4,45.7,98.6,71.5,6.4,3,deload,Assisted Dip,triceps,20.1,11,221.1


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14000 entries, 0 to 13999
Data columns (total 19 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   session_id                 14000 non-null  int64  
 1   training_day               14000 non-null  object 
 2   date                       14000 non-null  object 
 3   body_weight                14000 non-null  float64
 4   mood                       14000 non-null  object 
 5   dchest                     14000 non-null  float64
 6   arms                       14000 non-null  float64
 7   waist                      14000 non-null  float64
 8   legs                       14000 non-null  float64
 9   shoulders                  14000 non-null  float64
 10  training_duration_minutes  14000 non-null  float64
 11  sleep_hours                14000 non-null  float64
 12  stress_level               14000 non-null  int64  
 13  program_phase              14000 non-null  obj

# Data Distribution

In [5]:
df['session_id'].nunique()

2000

In [6]:
df['date'].nunique()

1864

In [7]:
count_training_days = df['training_day'].value_counts().reset_index()
fig = px.bar(count_training_days, x='training_day', y = 'count', color = 'training_day', template='plotly_dark' )
fig.show()

Almost perfect training_day distribution

In [8]:
muscle_group_counts = df['muscle_group'].value_counts().reset_index()
px.bar(muscle_group_counts, x = 'muscle_group', y = 'count', color = 'muscle_group' ,template="plotly_dark")

Most trained muscle_groups were: back, chest, triceps, legs, harmstrings and shoulders. 

In [9]:
fig = px.pie(df, names = "exercise",template='plotly_dark')
fig.show()

Perfect exercise distribution

In [10]:
fig = px.pie(df, names = "program_phase",template='plotly_dark')
fig.show()

As expected most of trainings were hypertrophy

In [11]:
fig = px.scatter(df, x='date', y = 'body_weight', template="plotly_dark")
fig.show()

From this plot we conclude that user's plan was to gain mass and build muscles, the further in time, the greater the increases.

In [12]:
px.scatter(df, x ='sleep_hours', y = 'stress_level', color = 'mood', template='plotly_dark')

From this plot we conclude that the less sleep user get, the more stress and worse mood appears.

# Now we will check how grow of each part of the body influence weight lifted on diferent exercises 

## Chest

In [13]:
chest_df = df[df['muscle_group'] == 'chest']
chest = chest_df['exercise'].value_counts().reset_index()
px.bar(chest, x = 'exercise', y = 'count', color = 'exercise', template='plotly_dark')

### For chest we got 3 major exercises

In [14]:
px.scatter(chest_df, x = 'date', y = 'dchest', template = 'plotly_dark')

From the plot we can see how chest was growing 

In [15]:
px.scatter(chest_df, x = 'body_weight', y = 'dchest', template = 'plotly_dark')

growing chest influenced body_weight 

In [16]:
px.scatter(chest_df, x = 'dchest', y = 'weight', template = 'plotly_dark')

the growing chest made it possible to press heavier weights

In [17]:
chest_volume = (
    chest_df.groupby("date")['volume']
    .sum()
    .reset_index()
)

circumgerences = (
    df.groupby("date")["dchest"]
    .mean()
    .reset_index()
)
merged = pd.merge(chest_volume, circumgerences, on = 'date')
px.scatter(merged, x = 'date', y = 'volume', template="plotly_dark")

As expected chest volume is rising