
---

### **Q1. Difference Between Ordinal Encoding and Label Encoding**

**Label Encoding:**
- Converts **categorical values to integers**.
- Assumes **no order** among categories.
- Used for **nominal** data.

**Ordinal Encoding:**
- Converts **ordered categories to integers**, preserving **rank**.
- Used for **ordinal** data (data with a clear hierarchy/order).

#### **Example:**

- **Label Encoding:**  
  `Color = [Red, Green, Blue]` → Red: 0, Green: 1, Blue: 2  
  No real order implied.

- **Ordinal Encoding:**  
  `Size = [Small, Medium, Large]` → Small: 0, Medium: 1, Large: 2  
  Order matters (Small < Medium < Large).

**Choose Ordinal Encoding** when the **categories have a meaningful order**, e.g., education level or customer satisfaction.

---

### **Q2. Target Guided Ordinal Encoding**

This encoding technique **orders categories based on the mean of the target variable**, then assigns integer values accordingly.

#### **Steps:**
1. Calculate the **mean target** for each category.
2. Rank categories by target mean.
3. Assign ordinal values based on rank.

#### **Example:**
Predicting loan default (`0 = no`, `1 = yes`):

```
Education Level:        Default Rate:
High School              0.4
Bachelor's               0.3
Master's                 0.1
```

Encoding result:
- Master's → 0
- Bachelor's → 1
- High School → 2

✅ **Use this when:**
You want to **leverage target information** during encoding (typically with decision tree models or linear models).

---

### **Q3. Covariance: Definition & Calculation**

**Covariance** measures how two variables vary **together**:

- **Positive** covariance: variables increase together.
- **Negative** covariance: one increases, the other decreases.
- **Zero** covariance: no linear relationship.

#### **Formula:**
\[
\text{Cov}(X, Y) = \frac{1}{n - 1} \sum_{i=1}^n (X_i - \bar{X})(Y_i - \bar{Y})
\]

#### **Importance:**
- Used in **PCA** to find principal components.
- Helps understand the **relationship between variables**.

---

### **Q4. Label Encoding with scikit-learn (Color, Size, Material)**

```python
from sklearn.preprocessing import LabelEncoder
import pandas as pd

# Sample data
df = pd.DataFrame({
    'Color': ['red', 'green', 'blue'],
    'Size': ['small', 'medium', 'large'],
    'Material': ['wood', 'metal', 'plastic']
})

# Apply label encoding
encoder = LabelEncoder()

for col in df.columns:
    df[col + '_encoded'] = encoder.fit_transform(df[col])

print(df)
```

#### **Output Explanation:**
Each category is mapped to a number **arbitrarily**:
- `Color`: red = 2, green = 1, blue = 0
- `Size`: small = 2, medium = 1, large = 0
- `Material`: wood = 2, metal = 1, plastic = 0

✅ Useful when category order **doesn't matter**.

---

### **Q5. Covariance Matrix for Age, Income, Education Level**

Given numerical data (you can use sample data like):

```python
import numpy as np
import pandas as pd

# Sample dataset
data = {
    'Age': [25, 30, 45, 35, 50],
    'Income': [50000, 60000, 120000, 80000, 150000],
    'Education': [1, 2, 3, 2, 3]  # Assume: 1=High School, 2=Bachelor, 3=Master
}

df = pd.DataFrame(data)

# Covariance matrix
cov_matrix = df.cov()
print(cov_matrix)
```

#### **Interpreting Results:**
- High positive covariance between Age and Income: older people earn more.
- Education and Income: positive if more educated = higher income.

---

### **Q6. Encoding Categorical Variables (Gender, Education, Employment Status)**

| Variable           | Type      | Encoding Method            | Why? |
|--------------------|-----------|-----------------------------|------|
| Gender             | Binary    | Label Encoding (0/1)        | Only 2 values |
| Education Level    | Ordinal   | Ordinal Encoding            | Clear order (High School < PhD) |
| Employment Status  | Nominal   | One-Hot Encoding            | No order, more than 2 categories |

---

### **Q7. Covariance Between Mixed Variable Types**

**You can only calculate covariance between numerical variables**. So:

- **Temperature & Humidity** → Yes
- **Weather Condition & Wind Direction** → No (must be encoded first)

#### **Step-by-step:**

```python
import pandas as pd
import numpy as np

# Sample data
data = {
    'Temperature': [30, 32, 35, 28, 31],
    'Humidity': [45, 50, 55, 40, 48],
}

df = pd.DataFrame(data)

# Covariance matrix
print(df.cov())
```

#### **Interpretation:**
- Positive covariance → Temperature and Humidity increase together.
- Negative covariance → One increases, the other decreases.

If you want to include **Weather Condition and Wind Direction**, you must:
1. Encode them (e.g., One-Hot or Ordinal).
2. Then calculate covariance with numeric variables