# 🪢 Covariance and Correlation
Covariance: A measure that indicates how two variables vary together, showing the direction of their relationship.

$Cov(X,Y) = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{n-1}$ 

Correlation: A standardized measure of the strength and direction of the linear relationship between two variables, ranging from -1 to 1. 

$Corr(X,Y) = \frac{Cov(X,Y)}{\sigma_X \sigma_Y}$

- $X_i, Y_i$ = individual data points  
- $\bar{X}, \bar{Y}$ = mean of $X$ and $Y$  
- $n$ = number of data points  
- $\sigma_X, \sigma_Y$ = standard deviations of $X$ and $Y$

---

#### » Import warnings to block the FutureWarning which will be seen in summary_cont function

In [None]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

#### » Import seaborn to load "tips" dataset and preview

In [11]:
import seaborn as sns
df = sns.load_dataset("tips")
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [12]:
df.dtypes

total_bill     float64
tip            float64
sex           category
smoker        category
day           category
time          category
size             int64
dtype: object

#### » Import researchpy and get a summary for the numeric values with "summary_cont" function

In [22]:
import researchpy as rp
rp.summary_cont(df[["total_bill","tip","size"]]) 





Unnamed: 0,Variable,N,Mean,SD,SE,95% Conf.,Interval
0,total_bill,244.0,19.7859,8.9024,0.5699,18.6633,20.9086
1,tip,244.0,2.9983,1.3836,0.0886,2.8238,3.1728
2,size,244.0,2.5697,0.9511,0.0609,2.4497,2.6896


#### » Also get a summary for the categorical values with "summary_cont" function too

In [21]:
rp.summary_cat(df[["sex","smoker","day"]]) 

Unnamed: 0,Variable,Outcome,Count,Percent
0,sex,Male,157,64.34
1,,Female,87,35.66
2,smoker,No,151,61.89
3,,Yes,93,38.11
4,day,Sat,87,35.66
5,,Sun,76,31.15
6,,Thur,62,25.41
7,,Fri,19,7.79


#### » Covariance

In [16]:
df[["tip", "total_bill"]].cov()

Unnamed: 0,tip,total_bill
tip,1.914455,8.323502
total_bill,8.323502,79.252939


#### » Correlation

In [17]:
df[["tip", "total_bill"]].corr()

Unnamed: 0,tip,total_bill
tip,1.0,0.675734
total_bill,0.675734,1.0
