# Performance Analysis Project

This project explores how various factors such as sleep, mood, nutrition, and training difficulty influence performance, particularly the catch/throw percentage. The data is analyzed using visualization, correlation, and regression methods.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import pearsonr
import statsmodels.api as sm

## Load and Clean the Data

We load the dataset and convert necessary columns to appropriate formats. Missing values are handled by filling with column means.

In [None]:
file_path = "/content/dsa_project_dataset_latestvers.xlsx"
data = pd.read_excel(file_path)
data['Date'] = pd.to_datetime(data['Date'])

numeric_cols = ['Sleep Hours', 'Sleep Quality', 'Caffeine (mg)', 'Mood (1-10)',
                'Training Difficulty', 'Protein (g)', 'Carbohydrates (g)',
                'Body Weight (kg)', 'CatchThrow_Percentage']
for col in numeric_cols:
    data[col] = pd.to_numeric(data[col], errors='coerce')

print("Missing values:")
print(data.isnull().sum())
data.fillna(data.mean(), inplace=True)

## Correlation Matrix

We generate a heatmap to visualize the correlations between all numeric variables in the dataset.

In [None]:
plt.figure(figsize=(10, 8))
correlation_matrix = data.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Matrix')
plt.show()

## Sleep Hours vs Catch/Throw Percentage

In [None]:
plt.figure(figsize=(8, 6))
sns.scatterplot(x='Sleep Hours', y='CatchThrow_Percentage', data=data)
plt.title('Sleep Hours vs Catch/Throw Percentage')
plt.xlabel('Sleep Hours')
plt.ylabel('Catch/Throw Percentage')
plt.show()

## Linear Regression: Sleep Hours and Performance

We fit a linear model and display the regression summary.

In [None]:
X = data['Sleep Hours']
y = data['CatchThrow_Percentage']
X = sm.add_constant(X)
model = sm.OLS(y, X).fit()
print(model.summary())

## Pearson Correlation

We test the statistical significance of the relationship.

In [None]:
correlation, p_value = pearsonr(data['Sleep Hours'], data['CatchThrow_Percentage'])
print("Correlation (Sleep Hours vs CatchThrow %):", correlation)
print("P-Value:", p_value)
if p_value < 0.05:
    print("Significant relationship.")
else:
    print("No significant relationship.")