# Q1. Pearson correlation coefficient is a measure of the linear relationship between two variables. Suppose you have collected data on the amount of time students spend studying for an exam and their final exam scores. Calculate the Pearson correlation coefficient between these two variables and interpret the result.
The Pearson correlation coefficient (r) measures the linear relationship between two continuous variables. It ranges from -1 to 1:

r = 1: Perfect positive linear correlation

r = -1: Perfect negative linear correlation

r = 0: No linear correlation

Steps to Calculate (Conceptually):
Given data:

Variable X: Time spent studying (e.g., in hours)

Variable Y: Final exam scores

You calculate Pearson’s r using:

𝑟
=
cov
(
𝑋
,
𝑌
)
𝜎
𝑋
⋅
𝜎
𝑌
r= 
σ 
X
​
 ⋅σ 
Y
​
 
cov(X,Y)
​
 
Where:

cov(X, Y) = covariance between X and Y

σ_X, σ_Y = standard deviations of X and Y

Example Data (Hypothetical):
Hours Studied (X)	Exam Score (Y)
2	60
3	65
5	80
6	85
8	90
If we calculate Pearson’s r from this, we might get something like:

r = 0.98 → This indicates a strong positive linear relationship between hours studied and exam scores.

Interpretation: As students spend more time studying, their exam scores tend to increase almost linearly.

# Q2. Spearman's rank correlation is a measure of the monotonic relationship between two variables. Suppose you have collected data on the amount of sleep individuals get each night and their overall job satisfaction level on a scale of 1 to 10. Calculate the Spearman's rank correlation between these two variables and interpret the result.
Spearman’s rank correlation (ρ) measures the monotonic relationship between two variables using their ranks rather than raw values. It’s useful when:

Data is ordinal or not normally distributed.

Relationship is not linear but still monotonic (consistently increasing or decreasing).

Steps to Calculate (Conceptually):
Given data:

Variable X: Hours of sleep

Variable Y: Job satisfaction (1–10 scale)

Rank the values in each variable.

Calculate the difference in ranks d_i for each pair.

Use the formula:

𝜌
=
1
−
6
∑
𝑑
𝑖
2
𝑛
(
𝑛
2
−
1
)
ρ=1− 
n(n 
2
 −1)
6∑d 
i
2
​
 
​
 
Example Data (Hypothetical):
Sleep Hours (X)	Job Satisfaction (Y)
4	3
6	6
7	8
5	5
8	9
After ranking and computing differences, you might find:

ρ = 0.90 → This indicates a strong positive monotonic relationship.

Interpretation: As people get more sleep, their job satisfaction tends to increase, though not necessarily in a straight line.

# Q3. Suppose you are conducting a study to examine the relationship between the number of hours of exercise per week and body mass index (BMI) in a sample of adults. You collected data on both variables for 50 participants. Calculate the Pearson correlation coefficient and the Spearman's rank correlation between these two variables and compare the results.
Scenario:
You have data on:

X: Hours of exercise per week (continuous)

Y: BMI (continuous)

Collected from 50 adults.

Step-by-step Plan:
Calculate Pearson correlation (r):

Measures the linear relationship.

Assumes normality and constant variance.

Formula:

𝑟
=
cov
(
𝑋
,
𝑌
)
𝜎
𝑋
⋅
𝜎
𝑌
r= 
σ 
X
​
 ⋅σ 
Y
​
 
cov(X,Y)
​
 
Use Python:

python
Copy code
from scipy.stats import pearsonr
r, p = pearsonr(exercise_hours, bmi)
Calculate Spearman’s rank correlation (ρ):

Measures the monotonic relationship.

Works on ranked data, doesn't assume linearity.

Use Python:

python
Copy code
from scipy.stats import spearmanr
rho, p = spearmanr(exercise_hours, bmi)
Comparison & Interpretation:
If both values are similar and high/low, the relationship is both monotonic and linear.

If Spearman is strong and Pearson is weak, the relationship is monotonic but not linear (e.g., BMI decreases up to a point with exercise, then plateaus).

Expect a negative correlation (more exercise → lower BMI), like:

Pearson r ≈ -0.65

Spearman ρ ≈ -0.70

# Q4. A researcher is interested in examining the relationship between the number of hours individuals spend watching television per day and their level of physical activity. The researcher collected data on both variables from a sample of 50 participants. Calculate the Pearson correlation coefficient between these two variables.
Scenario:
X: Hours of TV per day

Y: Level of physical activity (maybe in METs or steps/day)

You collected data from 50 participants.

Calculate Pearson r:
Again assuming both are continuous and normally distributed:

python
Copy code
from scipy.stats import pearsonr
r, p = pearsonr(tv_hours, physical_activity)
Expected Outcome:
Likely a negative linear correlation (more TV → less activity).

For example:
r ≈ -0.60, p < 0.01

Interpretation:
Strong negative linear relationship: The more time people spend watching TV, the less physically active they tend to be.

If significant (p < 0.05), the relationship is statistically meaningful.

# Q5. A survey was conducted to examine the relationship between age and preference for a particular brand of soft drink. The survey results are shown below:
Thanks for sharing the data! Here's the extracted table from your image:

| Age (Years) | Soft Drink Preference |
|-------------|-----------------------|
| 25          | Coke                  |
| 42          | Pepsi                 |
| 37          | Mountain Dew          |
| 19          | Coke                  |
| 31          | Pepsi                 |
| 28          | Coke                  |

---

### **Q5: Analyzing the Relationship Between Age and Soft Drink Preference**

Since **Soft Drink Preference** is **categorical** and **Age** is **numerical**, here's how we can examine their relationship:

---

### ✅ **Approach 1: Descriptive Statistics**

We can look at **average ages** for each group:

| Soft Drink     | Ages (Years)       | Mean Age |
|----------------|--------------------|-----------|
| Coke           | 25, 19, 28         | 24        |
| Pepsi          | 42, 31             | 36.5      |
| Mountain Dew   | 37                 | 37        |

- **Observation**:  
  - Coke is preferred by **younger individuals**.
  - Pepsi and Mountain Dew are preferred by those in their **30s and 40s**.

---

### ✅ **Approach 2: ANOVA or Kruskal-Wallis Test**

To test if age significantly differs by soft drink preference:
- Use **ANOVA** if age is normally distributed.
- Use **Kruskal-Wallis** if not (non-parametric).

In Python:
```python
from scipy.stats import kruskal

coke_ages = [25, 19, 28]
pepsi_ages = [42, 31]
dew_ages = [37]

kruskal(coke_ages, pepsi_ages, dew_ages)
```

If **p < 0.05**, there’s a significant difference in age among the soft drink groups.

---

### ✅ **Approach 3: Encoding + Correlation (Spearman)**

If you encode soft drink choices as:
- Coke = 1
- Pepsi = 2
- Mountain Dew = 3

Then compute **Spearman’s rank correlation** between Age and Preference.

In Python:
```python
from scipy.stats import spearmanr

ages = [25, 42, 37, 19, 31, 28]
preferences = [1, 2, 3, 1, 2, 1]  # Encoded

spearmanr(ages, preferences)
```

This will give you a measure of whether preferences **increase or decrease with age**.

---

Let me know if you'd like a full code demo, visualization (like a box plot), or to test statistical significance in a specific way!

# Q6. A company is interested in examining the relationship between the number of sales calls made per day and the number of sales made per week. The company collected data on both variables from a sample of 30 sales representatives. Calculate the Pearson correlation coefficient between these two variables.
Great question! You're looking to examine the **linear relationship** between:

- **X**: Number of sales calls made per day  
- **Y**: Number of sales made per week  

Given data for **30 sales representatives**, the **Pearson correlation coefficient** is the right choice for quantifying the strength and direction of a **linear** relationship between these two **continuous variables**.

---

### 🔢 **Steps to Calculate Pearson Correlation Coefficient (r):**

1. **Collect your data**  
   Let's say you have two lists:
   ```python
   sales_calls_per_day = [12, 15, 10, 18, ..., n=30]
   sales_per_week = [8, 11, 7, 13, ..., n=30]
   ```

2. **Use Python and `scipy` to calculate Pearson’s r**:
   ```python
   from scipy.stats import pearsonr

   r_value, p_value = pearsonr(sales_calls_per_day, sales_per_week)
   print("Pearson r:", r_value)
   print("P-value:", p_value)
   ```

---

### 📊 **Interpretation:**

- **r ≈ 1** → Strong positive linear relationship (more calls → more sales)
- **r ≈ -1** → Strong negative linear relationship
- **r ≈ 0** → No linear relationship

#### Example:
If `r = 0.85` and `p < 0.01`:
- There’s a **strong positive linear correlation**, and it’s **statistically significant**.
- Interpretation: Reps who make more calls per day tend to make more sales per week.

---

Would you like a **sample dataset** generated and the calculation done for you? I can simulate the data and show exactly how this works in code!