Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose
you have collected data on the amount of time students spend studying for an exam and their final exam
scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

In [7]:
import numpy as np
import pandas as pd

data = {
    'Study Time': [2, 3, 5, 6, 8],
    'Exam Score': [65, 70, 75, 80, 90]
}

df = pd.DataFrame(data)
df.corr(method='pearson')


Unnamed: 0,Study Time,Exam Score
Study Time,1.0,0.990771
Exam Score,0.990771,1.0


**Result Interpretation:** In this example, the pearson correlation coefficient is approximately 0.983, which indicates a very strong positive linear relationship between the amount of time students spend studying and their final exam scores. This suggests that as the study time increases, the exam scores tend to increase as well

Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables.
Suppose you have collected data on the amount of sleep individuals get each night and their overall job
satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two
variables and interpret the result.

In [11]:

import pandas as pd

data = {
    'Sleep Hours': [6, 7, 8, 5, 6, 7, 8, 9, 6, 7],
    'Job Satisfaction': [4, 6, 8, 3, 5, 7, 8, 9, 4, 6]
}

df = pd.DataFrame(data)
correlation=df.corr(method='spearman')
print(correlation)

                  Sleep Hours  Job Satisfaction
Sleep Hours          1.000000          0.981307
Job Satisfaction     0.981307          1.000000


The data suggests a very strong positive linear relationship between the amount of sleep individuals get each night and their overall job satisfaction level. This means that individuals who get more sleep are likely to report higher job satisfaction.

Q3. Suppose you are conducting a study to examine the relationship between the number of hours of
exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables
for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation
between these two variables and compare the results.

In [2]:
import pandas as pd
import numpy as np

np.random.seed(42)
num_samples=50

data={
    'Hours of Exercise per Week':np.random.uniform(0, 20, num_samples),
    'BMI': np.random.uniform(18,35, num_samples) # BMI (Body Mass Index)
}

df=pd.DataFrame(data)
Spearman_coef=df.corr(method='spearman')
Pearson_coef=df.corr(method='pearson')
print(Spearman_coef)
print(Pearson_coef)

                            Hours of Exercise per Week       BMI
Hours of Exercise per Week                    1.000000  0.083505
BMI                                           0.083505  1.000000
                            Hours of Exercise per Week       BMI
Hours of Exercise per Week                    1.000000  0.062209
BMI                                           0.062209  1.000000


Both correlations are quite close to zero, suggesting a very weak relationship between the number of hours of exercise per week and BMI

Q4. A researcher is interested in examining the relationship between the number of hours individuals
spend watching television per day and their level of physical activity. The researcher collected data on
both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between
these two variables.

In [3]:
import pandas as pd
import numpy as np
np.random.seed(42)
num=50
data={
    'Watching_Hours':np.random.uniform(0,8,num),
    'Physical_Activity_Level':np.random.uniform(0,10,num)
}
df=pd.DataFrame(data)
df.corr(method='pearson')

Unnamed: 0,Watching_Hours,Physical_Activity_Level
Watching_Hours,1.0,0.062209
Physical_Activity_Level,0.062209,1.0


Q5. A survey was conducted to examine the relationship between age and preference for a particular
brand of soft drink. The survey results are shown below:

In [80]:
import pandas as pd
data=pd.DataFrame({'Age(Years)':[25,42,37,19,31,28],'Soft drink Preference':["Coke","Pepsi","Mountain dew","Coke","Pepsi","Coke"]})
data

Unnamed: 0,Age(Years),Soft drink Preference
0,25,Coke
1,42,Pepsi
2,37,Mountain dew
3,19,Coke
4,31,Pepsi
5,28,Coke


In [72]:
from sklearn.preprocessing import OneHotEncoder
encoder=OneHotEncoder()
encoded=encoder.fit_transform(data[['Soft drink Preference']])
new_data=pd.DataFrame(encoded.toarray(),columns=encoder.get_feature_names_out())
encoded_data=pd.concat([data[['Age(Years)']],new_data],axis=1)

In [58]:
encoded_data.corr(method='pearson')

Unnamed: 0,Age(Years),Soft drink Preference_Coke,Soft drink Preference_Mountain dew,Soft drink Preference_Pepsi
Age(Years),1.0,-0.83724,0.394132,0.576439
Soft drink Preference_Coke,-0.83724,1.0,-0.447214,-0.707107
Soft drink Preference_Mountain dew,0.394132,-0.447214,1.0,-0.316228
Soft drink Preference_Pepsi,0.576439,-0.707107,-0.316228,1.0


In [81]:
a=data.groupby('Soft drink Preference')['Age(Years)'].mean().to_dict()
data['Soft_drinks_encoded']=data['Soft drink Preference'].map(a)
data.drop('Soft drink Preference',axis=1,inplace=True)

In [82]:
data

Unnamed: 0,Age(Years),Soft_drinks_encoded
0,25,24.0
1,42,36.5
2,37,37.0
3,19,24.0
4,31,36.5
5,28,24.0


In [83]:
data.corr()

Unnamed: 0,Age(Years),Soft_drinks_encoded
Age(Years),1.0,0.83753
Soft_drinks_encoded,0.83753,1.0


hence, highly correlated

Q6. A company is interested in examining the relationship between the number of sales calls made per day
and the number of sales made per week. The company collected data on both variables from a sample of
30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

In [38]:
import pandas as pd
import numpy as np

np.random.seed(23)

num=30
df=pd.DataFrame({
    'No.of_calls':np.random.randint(10,30,30),
    'Sales_per_week':np.random.randint(10,20,30)
})
df.corr(method='pearson')


Unnamed: 0,No.of_calls,Sales_per_week
No.of_calls,1.0,-0.07268
Sales_per_week,-0.07268,1.0
