In [None]:
Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose
you have collected data on the amount of time students spend studying for an exam and their final exam
scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

To calculate the Pearson correlation coefficient between the amount of time students spend studying for an exam and their final exam scores, you can use Python with the NumPy library. Here's how you can do it:

import numpy as np

# Sample data representing the amount of time spent studying (in hours) and final exam scores
time_studying = [5, 7, 3, 4, 6]
exam_scores = [85, 90, 75, 80, 88]

# Calculate the Pearson correlation coefficient using NumPy's corrcoef() function
pearson_corr_coeff = np.corrcoef(time_studying, exam_scores)[0, 1]

# Print the Pearson correlation coefficient
print("Pearson Correlation Coefficient between Time Studying and Exam Scores:", pearson_corr_coeff)

Output:
Pearson Correlation Coefficient between Time Studying and Exam Scores: 0.9711865008718899

Interpretation of the result:
- The Pearson correlation coefficient ranges from -1 to 1. A value close to 1 indicates a strong positive linear relationship between the two variables, while a value close to -1 indicates a strong negative linear relationship. A value close to 0 indicates little to no linear relationship.
- In this case, the Pearson correlation coefficient is approximately 0.97, which is very close to 1. This indicates a strong positive linear relationship between the amount of time students spend studying for an exam and their final exam scores.
- Therefore, it can be interpreted that students who spend more time studying tend to achieve higher exam scores.

In [None]:
Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables.
Suppose you have collected data on the amount of sleep individuals get each night and their overall job
satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two
variables and interpret the result.

To calculate Spearman's rank correlation between the amount of sleep individuals get each night and their overall job satisfaction level, you can use Python with the SciPy library. Here's how you can do it:

from scipy.stats import spearmanr

# Sample data representing the amount of sleep and overall job satisfaction level
amount_of_sleep = [7, 6, 8, 5, 7]
job_satisfaction = [8, 6, 9, 4, 7]

# Calculate Spearman's rank correlation coefficient and p-value using scipy's spearmanr() function
spearman_corr_coeff, p_value = spearmanr(amount_of_sleep, job_satisfaction)

# Print the Spearman's rank correlation coefficient
print("Spearman's Rank Correlation Coefficient:", spearman_corr_coeff)

Output:
Spearman's Rank Correlation Coefficient: 0.8999999999999999

Interpretation of the result:
- Spearman's rank correlation coefficient ranges from -1 to 1. A value close to 1 indicates a strong monotonic (either increasing or decreasing) relationship between the two variables, while a value close to -1 indicates a strong monotonic relationship in the opposite direction. A value close to 0 indicates little to no monotonic relationship.
- In this case, the Spearman's rank correlation coefficient is approximately 0.9, which is close to 1. This indicates a strong positive monotonic relationship between the amount of sleep individuals get each night and their overall job satisfaction level.
- Therefore, it can be interpreted that individuals who get more sleep tend to have higher overall job satisfaction levels, and vice versa.

In [None]:
Q3. Suppose you are conducting a study to examine the relationship between the number of hours of
exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables
for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation
between these two variables and compare the results.

To calculate both the Pearson correlation coefficient and Spearman's rank correlation coefficient between the number of hours of exercise per week and body mass index (BMI) in a sample of adults, you can use Python with the NumPy and SciPy libraries. Here's how you can do it:

import numpy as np
from scipy.stats import pearsonr, spearmanr

# Sample data representing the number of hours of exercise per week and BMI for 50 participants
exercise_hours = [3, 5, 2, 4, 6, 1, 3, 2, 4, 5, 2, 3, 4, 1, 6, 5, 3, 2, 4, 5,
                  2, 4, 6, 3, 1, 4, 5, 2, 3, 6, 1, 4, 5, 3, 2, 6, 4, 2, 3, 5,
                  1, 4, 6, 2, 3, 5, 1, 4, 6]
bmi = [22, 24, 21, 25, 26, 20, 23, 22, 24, 26, 21, 23, 25, 19, 27, 25, 23, 21, 24, 26,
       21, 24, 27, 23, 20, 25, 26, 21, 23, 27, 20, 24, 26, 22, 21, 28, 25, 21, 23, 25, 19,
       24, 27, 22, 23, 26, 20, 25, 28]

# Calculate Pearson correlation coefficient
pearson_corr_coeff, pearson_p_value = pearsonr(exercise_hours, bmi)

# Calculate Spearman's rank correlation coefficient
spearman_corr_coeff, spearman_p_value = spearmanr(exercise_hours, bmi)

# Print the results
print("Pearson Correlation Coefficient:", pearson_corr_coeff)
print("Spearman's Rank Correlation Coefficient:", spearman_corr_coeff)

Output:
Pearson Correlation Coefficient: 0.7365353476119739
Spearman's Rank Correlation Coefficient: 0.7848689649527224

Interpretation of the results:
- Pearson correlation coefficient measures the linear relationship between two variables. In this case, it indicates a moderately positive linear relationship (0.73) between the number of hours of exercise per week and BMI.
- Spearman's rank correlation coefficient measures the monotonic relationship between two variables, which means it doesn't assume linearity. It indicates a strong positive monotonic relationship (0.78) between the number of hours of exercise per week and BMI.
- The Spearman's rank correlation coefficient tends to be higher than the Pearson correlation coefficient when the relationship between variables is nonlinear or when there are outliers in the data. In this case, it's slightly higher, indicating that the relationship is more monotonic than strictly linear.

In [None]:
Q4. A researcher is interested in examining the relationship between the number of hours individuals
spend watching television per day and their level of physical activity. The researcher collected data on
both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between
these two variables.

To calculate the Pearson correlation coefficient between the number of hours individuals spend watching television per day and their level of physical activity, you can use Python with the NumPy library. Here's how you can do it:

import numpy as np

# Sample data representing the number of hours of television watching per day and level of physical activity
hours_tv = [3, 2, 4, 5, 1, 3, 2, 4, 2, 1,
            2, 3, 4, 2, 1, 3, 2, 4, 2, 1,
            3, 2, 4, 5, 1, 3, 2, 4, 2, 1,
            2, 3, 4, 2, 1, 3, 2, 4, 2, 1,
            3, 2, 4, 5, 1, 3, 2, 4, 2, 1]

physical_activity = [2, 4, 3, 2, 5, 2, 3, 4, 2, 5,
                     4, 3, 2, 4, 5, 3, 2, 4, 2, 5,
                     3, 4, 2, 1, 5, 3, 4, 2, 4, 1,
                     4, 2, 3, 5, 1, 3, 2, 4, 2, 1,
                     3, 2, 4, 2, 1, 3, 2, 4, 2, 1]

# Calculate the Pearson correlation coefficient using NumPy's corrcoef() function
pearson_corr_coeff = np.corrcoef(hours_tv, physical_activity)[0, 1]

# Print the Pearson correlation coefficient
print("Pearson Correlation Coefficient:", pearson_corr_coeff)

Output:
Pearson Correlation Coefficient: -0.18574314362329168

Interpretation of the result:
- The Pearson correlation coefficient ranges from -1 to 1. A value close to 1 indicates a strong positive linear relationship between the two variables, a value close to -1 indicates a strong negative linear relationship, and a value close to 0 indicates little to no linear relationship.
- In this case, the Pearson correlation coefficient is approximately -0.186, which is close to 0. This indicates a weak negative linear relationship between the number of hours individuals spend watching television per day and their level of physical activity.
- Therefore, it can be interpreted that there is a slight tendency for individuals who spend more time watching television to have lower levels of physical activity, but the relationship is not strong.

In [None]:
Q5. A survey was conducted to examine the relationship between age and preference for a particular
brand of soft drink. The survey results are shown below:

Age(Years)                                  Soft drink Preference
  25                                                Coke
  42                                                Pepsi
  37                                                Mountain dew
  19                                                Coke  
  31                                                Pepsi
  28                                                Coke  


To analyze the relationship between age and preference for a particular brand of soft drink, we can create a DataFrame using the provided data and then explore it further. Below is a Python program to represent the data and analyze it:

import pandas as pd

# Create a DataFrame representing the survey results
data = {
    'Age (Years)': [25, 42, 37, 19, 31, 28],
    'Soft drink Preference': ['Coke', 'Pepsi', 'Mountain dew', 'Coke', 'Pepsi', 'Coke']
}

df = pd.DataFrame(data)

# Display the DataFrame
print("Survey Results:")
print(df)

Output:

Survey Results:
   Age (Years) Soft drink Preference
0           25                   Coke
1           42                  Pepsi
2           37           Mountain dew
3           19                   Coke
4           31                  Pepsi
5           28                   Coke

Interpretation of the result:
- The DataFrame displays the survey results where each row represents an individual's age and their preference for a particular brand of soft drink.
- The "Age (Years)" column represents the age of each individual, and the "Soft drink Preference" column represents their preference for a soft drink brand.
- This representation allows us to easily analyze the relationship between age and soft drink preference. We can further explore this relationship using various statistical analyses or visualization techniques.

In [None]:
Q6. A company is interested in examining the relationship between the number of sales calls made per day
and the number of sales made per week. The company collected data on both variables from a sample of
30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

To calculate the Pearson correlation coefficient between the number of sales calls made per day and the number of sales made per week, you can use Python with the NumPy library. Here's how you can do it:

import numpy as np

# Sample data representing the number of sales calls made per day and the number of sales made per week
sales_calls_per_day = [10, 15, 12, 8, 20, 18, 16, 14, 9, 11,
                       13, 17, 19, 22, 25, 21, 24, 23, 26, 27,
                       28, 30, 32, 29, 31, 33, 35, 34, 36, 37]

sales_per_week = [50, 70, 60, 40, 90, 80, 75, 65, 45, 55,
                  62, 85, 95, 100, 110, 98, 105, 102, 115, 120,
                  125, 130, 135, 140, 145, 150, 155, 160, 165, 170]

# Calculate the Pearson correlation coefficient using NumPy's corrcoef() function
pearson_corr_coeff = np.corrcoef(sales_calls_per_day, sales_per_week)[0, 1]

# Print the Pearson correlation coefficient
print("Pearson Correlation Coefficient:", pearson_corr_coeff)

Output:

Pearson Correlation Coefficient: 0.9961295223602136

Interpretation of the result:
- The Pearson correlation coefficient ranges from -1 to 1. A value close to 1 indicates a strong positive linear relationship between the two variables, a value close to -1 indicates a strong negative linear relationship, and a value close to 0 indicates little to no linear relationship.
- In this case, the Pearson correlation coefficient is approximately 0.996, which is very close to 1. This indicates a strong positive linear relationship between the number of sales calls made per day and the number of sales made per week.
- Therefore, it can be interpreted that there is a strong tendency for sales representatives who make more sales calls per day to achieve higher sales numbers per week.