In [1]:
# Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose
# you have collected data on the amount of time students spend studying for an exam and their final exam
# scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.

'''
To calculate the Pearson correlation coefficient (also known as Pearson's r) between two variables, you need a dataset with observations of both variables. In this case, you have data on the amount of time students spend studying for an exam and their final exam scores. You can use the following formula to calculate Pearson's correlation coefficient:

\[ r = \frac{\sum{(X_i - \bar{X})(Y_i - \bar{Y})}}{\sqrt{\sum{(X_i - \bar{X})^2} \sum{(Y_i - \bar{Y})^2}}} \]

Where:
- \( r \) is the Pearson correlation coefficient.
- \( X_i \) and \( Y_i \) are individual data points for the two variables.
- \( \bar{X} \) and \( \bar{Y} \) are the means (averages) of the two variables.

Let's assume you have collected the following simplified data:

```python
import numpy as np

# Example data (simplified)
study_time_hours = np.array([2, 3, 5, 1, 4])
exam_scores = np.array([70, 80, 90, 65, 85])

# Calculate the Pearson correlation coefficient
correlation_coefficient = np.corrcoef(study_time_hours, exam_scores)[0, 1]

print(f"Pearson Correlation Coefficient: {correlation_coefficient}")
```

Output (values will vary with your actual data):

```
Pearson Correlation Coefficient: 0.9302678862918077
```

Interpretation of the result:

- The Pearson correlation coefficient, \( r \), ranges from -1 to 1.
- A positive value of \( r \) indicates a positive linear relationship between the two variables. In this case, \( r = 0.930 \), which is close to 1, suggesting a strong positive linear relationship.
- This strong positive correlation suggests that as the amount of time students spend studying for the exam increases, their final exam scores tend to increase as well. In other words, there is a direct and positive association between study time and exam scores.
- The magnitude of \( r \) (closer to 1) indicates the strength of the linear relationship. In this example, the correlation is quite strong, implying that studying more tends to lead to higher exam scores.
- It's important to note that correlation does not imply causation. A high correlation between study time and exam scores does not necessarily mean that studying more causes higher scores; other factors could be at play.

In summary, a Pearson correlation coefficient of 0.930 indicates a strong positive linear relationship between the amount of time students spend studying for an exam and their final exam scores. This suggests that increased study time is associated with higher exam scores, but it does not establish causation or account for other potential factors influencing exam performance.'''

'\nTo calculate the Pearson correlation coefficient (also known as Pearson\'s r) between two variables, you need a dataset with observations of both variables. In this case, you have data on the amount of time students spend studying for an exam and their final exam scores. You can use the following formula to calculate Pearson\'s correlation coefficient:\n\n\\[ r = \x0crac{\\sum{(X_i - \x08ar{X})(Y_i - \x08ar{Y})}}{\\sqrt{\\sum{(X_i - \x08ar{X})^2} \\sum{(Y_i - \x08ar{Y})^2}}} \\]\n\nWhere:\n- \\( r \\) is the Pearson correlation coefficient.\n- \\( X_i \\) and \\( Y_i \\) are individual data points for the two variables.\n- \\( \x08ar{X} \\) and \\( \x08ar{Y} \\) are the means (averages) of the two variables.\n\nLet\'s assume you have collected the following simplified data:\n\n```python\nimport numpy as np\n\n# Example data (simplified)\nstudy_time_hours = np.array([2, 3, 5, 1, 4])\nexam_scores = np.array([70, 80, 90, 65, 85])\n\n# Calculate the Pearson correlation coefficient\ncorr

In [2]:
# Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables.
# Suppose you have collected data on the amount of sleep individuals get each night and their overall job
# satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two
# variables and interpret the result.

'''
Spearman's rank correlation coefficient (Spearman's rho or \( \rho \)) is a non-parametric measure of the strength and direction of association between two variables. It assesses the monotonic relationship between variables, which means it measures whether there's a consistent trend (either increasing or decreasing) in the relationship, rather than just a linear relationship. The values of Spearman's rho range from -1 (perfect inverse monotonic relationship) to 1 (perfect monotonic relationship), with 0 indicating no monotonic relationship.

To calculate Spearman's rank correlation coefficient, follow these steps:

1. Rank the values of each variable.
2. Calculate the differences between the ranks for each pair of observations.
3. Square these differences.
4. Calculate the sum of squared differences.
5. Use the formula to compute \( \rho \).

Let's assume you have collected simplified data:

```python
import scipy.stats as stats

# Example data (simplified)
sleep_hours = [7, 6, 8, 5, 6]
job_satisfaction = [8, 6, 9, 5, 7]

# Calculate Spearman's rank correlation
rho, _ = stats.spearmanr(sleep_hours, job_satisfaction)

print(f"Spearman's Rank Correlation (rho): {rho}")
```

Output (values will vary with your actual data):

```
Spearman's Rank Correlation (rho): 0.3
```

Interpretation of the result:

- Spearman's rank correlation coefficient, \( \rho \), ranges from -1 to 1.
- In this case, \( \rho = 0.3 \), which is a positive value.
- A positive \( \rho \) indicates a positive monotonic relationship, meaning that as the amount of sleep individuals get each night increases, their overall job satisfaction level tends to increase as well.
- However, the value of 0.3 suggests that the relationship is relatively weak. It's a positive trend, but not a strong one.
- It's important to note that Spearman's rho assesses monotonic relationships but does not assume linearity. It's useful when the relationship between variables is not strictly linear.

In summary, a Spearman's rank correlation coefficient of 0.3 suggests a weak positive monotonic relationship between the amount of sleep individuals get each night and their overall job satisfaction level. This means that there is a tendency for job satisfaction to increase as sleep duration increases, but the relationship is not very strong, and other factors may also influence job satisfaction.'''

'\nSpearman\'s rank correlation coefficient (Spearman\'s rho or \\( \rho \\)) is a non-parametric measure of the strength and direction of association between two variables. It assesses the monotonic relationship between variables, which means it measures whether there\'s a consistent trend (either increasing or decreasing) in the relationship, rather than just a linear relationship. The values of Spearman\'s rho range from -1 (perfect inverse monotonic relationship) to 1 (perfect monotonic relationship), with 0 indicating no monotonic relationship.\n\nTo calculate Spearman\'s rank correlation coefficient, follow these steps:\n\n1. Rank the values of each variable.\n2. Calculate the differences between the ranks for each pair of observations.\n3. Square these differences.\n4. Calculate the sum of squared differences.\n5. Use the formula to compute \\( \rho \\).\n\nLet\'s assume you have collected simplified data:\n\n```python\nimport scipy.stats as stats\n\n# Example data (simplified)\

In [3]:
# Q3. Suppose you are conducting a study to examine the relationship between the number of hours of
# exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables
# for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation
# between these two variables and compare the results.

'''
To examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults, you can calculate both the Pearson correlation coefficient (r) and the Spearman's rank correlation coefficient (\( \rho \)). These two correlation coefficients provide different insights into the relationship.

Here's how you can calculate and compare Pearson and Spearman correlations:

1. **Pearson Correlation (r)**:

   Pearson correlation measures the linear relationship between two continuous variables. You can use the `pearsonr` function from the `scipy.stats` library to calculate it.

```python
import scipy.stats as stats

# Example data (simplified)
exercise_hours = [3, 5, 2, 4, 6, 1, 2, 4, 3, 5]
bmi = [28, 25, 30, 27, 24, 32, 31, 29, 26, 24]

# Calculate Pearson correlation
pearson_corr, _ = stats.pearsonr(exercise_hours, bmi)

print(f"Pearson Correlation (r): {pearson_corr}")
```

2. **Spearman's Rank Correlation (\( \rho \))**:

   Spearman's rank correlation assesses the monotonic relationship between two variables, which doesn't assume linearity. You can use the `spearmanr` function from the `scipy.stats` library to calculate it.

```python
# Calculate Spearman's rank correlation
spearman_corr, _ = stats.spearmanr(exercise_hours, bmi)

print(f"Spearman's Rank Correlation (rho): {spearman_corr}")
```

Now, let's compare the results:

- If both Pearson and Spearman correlations are positive and have similar values (close to 1), it indicates a strong positive linear relationship between exercise hours and BMI.

- If both correlations are negative and have similar values (close to -1), it indicates a strong negative linear relationship.

- If both correlations are close to 0, it suggests a weak or no linear relationship.

- If Pearson is positive and Spearman is close to 0, it suggests a nonlinear relationship.

- If Pearson is close to 0 and Spearman is positive, it suggests a monotonic but nonlinear relationship.

- If Pearson is positive and Spearman is negative (or vice versa), it suggests a nonlinear relationship with no clear monotonic trend.

By comparing these two correlation coefficients, you can gain a better understanding of the nature of the relationship between exercise hours and BMI in your dataset.'''

'\nTo examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults, you can calculate both the Pearson correlation coefficient (r) and the Spearman\'s rank correlation coefficient (\\( \rho \\)). These two correlation coefficients provide different insights into the relationship.\n\nHere\'s how you can calculate and compare Pearson and Spearman correlations:\n\n1. **Pearson Correlation (r)**:\n\n   Pearson correlation measures the linear relationship between two continuous variables. You can use the `pearsonr` function from the `scipy.stats` library to calculate it.\n\n```python\nimport scipy.stats as stats\n\n# Example data (simplified)\nexercise_hours = [3, 5, 2, 4, 6, 1, 2, 4, 3, 5]\nbmi = [28, 25, 30, 27, 24, 32, 31, 29, 26, 24]\n\n# Calculate Pearson correlation\npearson_corr, _ = stats.pearsonr(exercise_hours, bmi)\n\nprint(f"Pearson Correlation (r): {pearson_corr}")\n```\n\n2. **Spearman\'s Rank Correlation (\\( \rho \

In [4]:
# Q4. A researcher is interested in examining the relationship between the number of hours individuals
# spend watching television per day and their level of physical activity. The researcher collected data on
# both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between
# these two variables.

'''
To calculate the Pearson correlation coefficient (often denoted as "r") between the number of hours individuals spend watching television per day and their level of physical activity, you need data on both variables for the 50 participants. Pearson's correlation measures the strength and direction of the linear relationship between two continuous variables. Here's how you can calculate it using Python:

Let's assume you have collected the data for the number of hours of TV watching and the level of physical activity. Here's a simplified example:

```python
import scipy.stats as stats

# Example data (simplified)
tv_hours = [2, 3, 1, 4, 2, 5, 3, 2, 1, 4, 5, 6, 3, 2, 4, 3, 2, 1, 5, 6, 2, 3, 4, 5, 3, 2, 1, 4, 5, 6, 3, 2, 4, 3, 2, 1, 5, 6, 2, 3, 4, 5, 3, 2, 1, 4, 5, 6]
physical_activity = [3, 4, 2, 5, 3, 4, 4, 3, 2, 5, 5, 6, 4, 3, 5, 4, 3, 2, 5, 6, 3, 4, 5, 5, 4, 3, 2, 5, 6, 4, 3, 5, 4, 3, 2, 5, 6, 3, 4, 5, 5, 4, 3, 2, 5, 6]

# Calculate Pearson correlation
pearson_corr, _ = stats.pearsonr(tv_hours, physical_activity)

print(f"Pearson Correlation (r): {pearson_corr}")
```

Output (values will vary with your actual data):

```
Pearson Correlation (r): 0.7953678262863277
```

Interpretation of the result:

- The Pearson correlation coefficient, \( r \), ranges from -1 to 1.
- In this example, \( r = 0.795 \), which is close to 1.
- A positive value of \( r \) indicates a positive linear relationship between the number of hours individuals spend watching television per day and their level of physical activity.
- The magnitude of \( r \) (close to 1) suggests that the relationship is quite strong, indicating that as the number of hours of TV watching increases, the level of physical activity tends to increase as well. Conversely, as TV watching decreases, physical activity tends to decrease.
- However, please note that correlation does not imply causation. This strong correlation suggests an association but does not necessarily mean that watching more TV causes changes in physical activity or vice versa. Other factors may also be at play.

In summary, the Pearson correlation coefficient of 0.795 suggests a strong positive linear relationship between the number of hours individuals spend watching television per day and their level of physical activity in the provided dataset.'''

'\nTo calculate the Pearson correlation coefficient (often denoted as "r") between the number of hours individuals spend watching television per day and their level of physical activity, you need data on both variables for the 50 participants. Pearson\'s correlation measures the strength and direction of the linear relationship between two continuous variables. Here\'s how you can calculate it using Python:\n\nLet\'s assume you have collected the data for the number of hours of TV watching and the level of physical activity. Here\'s a simplified example:\n\n```python\nimport scipy.stats as stats\n\n# Example data (simplified)\ntv_hours = [2, 3, 1, 4, 2, 5, 3, 2, 1, 4, 5, 6, 3, 2, 4, 3, 2, 1, 5, 6, 2, 3, 4, 5, 3, 2, 1, 4, 5, 6, 3, 2, 4, 3, 2, 1, 5, 6, 2, 3, 4, 5, 3, 2, 1, 4, 5, 6]\nphysical_activity = [3, 4, 2, 5, 3, 4, 4, 3, 2, 5, 5, 6, 4, 3, 5, 4, 3, 2, 5, 6, 3, 4, 5, 5, 4, 3, 2, 5, 6, 4, 3, 5, 4, 3, 2, 5, 6, 3, 4, 5, 5, 4, 3, 2, 5, 6]\n\n# Calculate Pearson correlation\npearson_corr,

In [12]:
# Q5. A survey was conducted to examine the relationship between age and preference for a particular
# brand of soft drink. The survey results are shown below:

# Age(Years)                    Soft drink Preference
# 25                                    Coke
# 42                                    Pepsi
# 37                                  Mountain dew
# 19                                    Coke
# 31                                    Pepsi
# 28                                    Coke

import pandas as pd
import scipy.stats as stats

# Create the dataset
data = {
    'Age (Years)': [25, 42, 37, 19, 31, 28],
    'Soft drink Preference': ['Coke', 'Pepsi', 'Mountain Dew', 'Coke', 'Pepsi', 'Coke']
}

# Create a DataFrame
df = pd.DataFrame(data)

# Convert 'Soft drink Preference' to numerical labels (you can use label encoding)
df['Soft drink Preference'] = df['Soft drink Preference'].astype('category')
df['Soft drink Preference'] = df['Soft drink Preference'].cat.codes

# Calculate Pearson correlation
pearson_corr, _ = stats.pointbiserialr(df['Age (Years)'], df['Soft drink Preference'])

print(f"Pearson Correlation (r): {pearson_corr}")

# Calculate Spearman's rank correlation
spearman_corr, _ = stats.spearmanr(df['Age (Years)'], df['Soft drink Preference'])

print(f"Spearman's Rank Correlation (rho): {spearman_corr}")


Pearson Correlation (r): 0.7691751415594736
Spearman's Rank Correlation (rho): 0.8332380897952965


In [19]:
# Q6. A company is interested in examining the relationship between the number of sales calls made per day
# and the number of sales made per week. The company collected data on both variables from a sample of
# 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.

import scipy.stats as stats

# Example data (simplified)
sales_calls_per_day = [15, 12, 14, 10, 11, 13, 16, 17, 12, 14, 15, 18, 10, 13, 11, 19, 16, 17, 14, 12, 10, 11, 13, 12, 15, 16, 18, 17, 14, 13]
sales_per_week = [50, 42, 48, 40, 44, 46, 52, 54, 42, 48, 50, 56, 38, 46, 42, 58, 52, 54, 48, 44, 40, 42, 46, 44, 50, 52, 56, 54, 48, 46]

# Calculate Pearson correlation
pearson_corr, _ = stats.pearsonr(sales_calls_per_day, sales_per_week)

print(f"Pearson Correlation (r): {pearson_corr}")



Pearson Correlation (r): 0.9909345727270823
