### Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its application.
Ans: \

###  **Definition:**

**Min-Max Scaling** (also known as **Normalization**) is a **feature scaling technique** that transforms features to a **fixed range**, typically **[0, 1]**.

It’s used in data preprocessing to ensure all features contribute equally to the model by **rescaling them based on their minimum and maximum values**.

---

###  **Formula:**
$$
[
X_{\text{scaled}} = \frac{X - X_{\text{min}}}{X_{\text{max}} - X_{\text{min}}}
]
$$


- $( X )$ = Original feature value  
- $( X_{\text{min}} ), ( X_{\text{max}} )$ = Minimum and maximum values of that feature  
- $( X_{\text{scaled}} \in [0, 1] )$

---

###  **Why Use Min-Max Scaling?**

- Some algorithms (like **KNN**, **SVM**, **Neural Networks**) are sensitive to the **scale of input features**.
- If features have very different ranges, the model may be **biased toward the larger-scaled features**.
- Helps models **converge faster** and perform better.

---

###  **Example:**

Let’s say you have a feature: `House Size (in sq ft)`

| Original Size |
|---------------|
| 1000          |
| 1500          |
| 2000          |

**Step 1:** Identify min and max:  
- Min = 1000  
- Max = 2000

**Step 2:** Apply Min-Max Scaling:
$$
[
\text{Scaled Size} = \frac{\text{Size} - 1000}{2000 - 1000}
]
$$

| Original | Scaled   |
|----------|----------|
| 1000     | 0.0      |
| 1500     | 0.5      |
| 2000     | 1.0      |

Now the sizes are all between 0 and 1 — ready for ML modeling.

---

###  **In Python (using sklearn):**
```python
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(data[['House Size']])
```

### Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling? Provide an example to illustrate its application.
Ans: \
###  **Definition: Unit Vector Scaling (Normalization to Unit Norm)**

The **Unit Vector technique** (also called **Vector Normalization**) is a feature scaling method where each data point (i.e., **row**) is **scaled to have a unit norm** (length = 1).  

This is typically used when the **direction of the data vector** matters more than its magnitude — such as in **text classification**, **cosine similarity**, or **clustering** tasks.

---

###  **Formula:**

If a data point (row) is represented as a vector:
$$
[
\vec{x} = [x_1, x_2, ..., x_n]
]
$$
Then the **normalized vector** is:
$$
[
\vec{x}_{\text{norm}} = \frac{\vec{x}}{\|\vec{x}\|} = \frac{[x_1, x_2, ..., x_n]}{\sqrt{x_1^2 + x_2^2 + ... + x_n^2}}
]
$$

This ensures that:
$$
[
\|\vec{x}_{\text{norm}}\| = 1
]
$$
---

###  **Difference from Min-Max Scaling:**

| Aspect                | Min-Max Scaling                    | Unit Vector Scaling                        |
|-----------------------|------------------------------------|--------------------------------------------|
| Works on              | Individual **features (columns)**  | Individual **samples (rows)**              |
| Output range          | Scales values to [0, 1]            | Norm (length) of each row becomes 1        |
| Use case              | Numeric feature normalization      | Text data, cosine similarity, KNN, etc.    |
| Preserves             | Relative spacing between values    | Direction of data vectors                  |

---

###  **Example:**

Suppose we have two data points (rows) with 2 features:

```
X = [[3, 4],
     [1, 2]]
```

#### ▶ Applying **Unit Vector Scaling**:

For row 1:  
$$[
\sqrt{3^2 + 4^2} = \sqrt{9 + 16} = \sqrt{25} = 5  
\Rightarrow [3/5, 4/5] = [0.6, 0.8]
]$$

For row 2:  
$$[
\sqrt{1^2 + 2^2} = \sqrt{5} \approx 2.236  
\Rightarrow [1/2.236, 2/2.236] \approx [0.447, 0.894]
]$$

So the transformed matrix becomes:

```
[[0.6, 0.8],
 [0.447, 0.894]]
```

Each row now has **length = 1**.

---

###  **In Python (using sklearn):**

```python
from sklearn.preprocessing import Normalizer

normalizer = Normalizer()  # Uses L2 norm by default
normalized_data = normalizer.fit_transform(X)
``

### Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an example to illustrate its application.
Ans: \

**Principal Component Analysis (PCA)** is a **linear dimensionality reduction** technique that transforms a high-dimensional dataset into a lower-dimensional one **while retaining as much variance (information) as possible**.

It does this by finding new axes (called **principal components**) that are **orthogonal** (uncorrelated) and **ordered by the amount of variance they capture** from the data.

---

###  **Why Use PCA?**

- To reduce the **number of features** (dimensions) while keeping essential information
- To **remove multicollinearity** between features
- To improve **model training speed** and reduce **overfitting**
- For **data visualization** (e.g., projecting data into 2D or 3D)

---

###  **How PCA Works (Conceptually):**

1. **Standardize** the dataset (mean = 0, std = 1)
2. **Compute the covariance matrix**
3. **Find eigenvectors and eigenvalues** of the covariance matrix
4. **Sort eigenvectors** by descending eigenvalues (variance explained)
5. **Project the data** onto the top *k* eigenvectors (principal components)

---

###  **Example:**

Suppose you have a dataset with 3 highly correlated features:  
- `height`, `weight`, and `BMI`

Instead of using all 3, you can use PCA to reduce it to **1 or 2 principal components** that still explain most of the variance.

---

###  **Before PCA (3 features):**

| Height | Weight | BMI |
|--------|--------|-----|
| 170    | 70     | 24.2|
| 180    | 80     | 24.7|
| 160    | 60     | 23.4|
| ...    | ...    | ... |

These are correlated — PCA can transform them into:

---

###  **After PCA (2 components):**

| PC1   | PC2   |
|-------|-------|
| 2.34  | -0.12 |
| 3.12  |  0.05 |
| 1.89  | -0.08 |
| ...   | ...   |

Now you have **2 new features** (principal components), capturing most of the info with **less redundancy**.

---

###  **In Python (using `scikit-learn`):**

```python
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Step 1: Standardize data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Step 2: Apply PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

print("Explained Variance:", pca.explained_variance_ratio_)
```

---

###  **Key Notes:**

- PCA is **unsupervised** (doesn’t use target labels)
- Best used when features are **correlated**
- You lose some interpretability, but gain **efficiency and performance**

### Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature Extraction? Provide an example to illustrate this concept.
Ans: \

**Feature extraction** is the process of **transforming raw data into new features** that capture the **most relevant information** for modeling — often reducing redundancy or dimensionality in the process.

Instead of selecting features (like in feature selection), you **create new ones** that better represent the data.

---

###  **How PCA Relates to Feature Extraction**

**PCA is a classic feature extraction technique.**

- It takes a set of possibly **correlated features** and transforms them into a **new set of uncorrelated features** called **Principal Components**.
- These principal components are **linear combinations** of the original features.
- Each new feature (component) captures **maximum variance** from the original dataset.

 So, **PCA doesn’t just pick features — it creates new ones** that summarize your data in fewer dimensions.

---

###  **How PCA Works for Feature Extraction**

1. **Standardize the data**  
2. **Apply PCA** to find principal components (new features)  
3. Choose the top *k* components that explain the majority of the variance  
4. Use these components as **input features** for your machine learning model

---

###  **Example:**

Suppose you’re predicting customer behavior based on:

- Age  
- Income  
- Spending Score  
- Credit Score  

These may be **correlated**, and using them all might introduce **redundancy**.

 You apply PCA and extract **2 principal components** that capture 95% of the variance:

| PC1   | PC2   |
|-------|-------|
| 1.23  | 0.45  |
| 2.12  | -0.36 |
| 0.98  | 0.12  |

These new features (PC1, PC2) are used as inputs to your model — they are **extracted features**, not selected ones.

---

###  **In Python (Example):**

```python
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Assume X is your original feature set
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

print("Extracted Features (PCA):", X_pca)
```

---

###  **In Summary:**

- **Feature selection** = choose the best original features  
- **Feature extraction** = **create new features** (like PC1, PC2) from existing ones  
- **PCA is a feature extraction method** that transforms data into a new space of fewer, uncorrelated features with maximum variance

### Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to preprocess the data.
Ans: \
We're working on a **recommendation system** where the dataset has features like:

- **Price** of the food item  
- **Rating** given by customers  
- **Delivery time** in minutes  

These features are **on different scales**, and to make sure that no feature dominates the others in the model (especially in algorithms like **KNN**, **SVM**, or **clustering**), you need to **normalize** them using **Min-Max scaling**.

---

###  **Why Min-Max Scaling?**

- It transforms values to a common scale: **[0, 1]**
- Makes distance-based models (like KNN or collaborative filtering) fair
- Useful for optimizing algorithms that are sensitive to **feature magnitude**

---

###  **Step-by-Step Preprocessing with Min-Max Scaling:**

---

#### **🔹 Original Data Example (before scaling):**

| Price (₹) | Rating (1–5) | Delivery Time (min) |
|-----------|--------------|----------------------|
| 150       | 4.5          | 30                   |
| 300       | 3.0          | 50                   |
| 200       | 5.0          | 20                   |

---

#### ** Apply Min-Max Scaling:**

Use the formula:
$$
[
X_{\text{scaled}} = \frac{X - X_{\min}}{X_{\max} - X_{\min}}
]
$$
#####  Step 1: Find min and max for each column:

- **Price:** min = 150, max = 300  
- **Rating:** min = 3.0, max = 5.0  
- **Delivery Time:** min = 20, max = 50

####  **Scaled Dataset:**

| Price | Rating | Delivery Time |
|-------|--------|----------------|
| 0.00  | 0.75   | 0.33           |
| 1.00  | 0.00   | 1.00           |
| 0.33  | 1.00   | 0.00           |

---

###  **How to Do It in Python (with scikit-learn):**

```python
from sklearn.preprocessing import MinMaxScaler
import pandas as pd

# Sample data
data = pd.DataFrame({
    'Price': [150, 300, 200],
    'Rating': [4.5, 3.0, 5.0],
    'DeliveryTime': [30, 50, 20]
})

# Initialize and apply scaler
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(data)

# Convert back to DataFrame
scaled_df = pd.DataFrame(scaled_data, columns=data.columns)
print(scaled_df)
```

---

###  **Summary:**

- **Min-Max scaling** is essential for **equal treatment** of features with different units.
- In this recommendation system, it ensures **fair comparisons** between price, rating, and delivery time.
- It helps the model learn better and make more balanced suggestions.

### Q6. You are working on a project to build a model to predict stock prices. The dataset contains many features, such as company financial data and market trends. Explain how you would use PCA to reduce the dimensionality of the dataset.
Ans: \

We're building a **machine learning model to predict stock prices**, and your dataset includes many features like:

- Financial ratios (P/E, debt-to-equity, EPS, etc.)  
- Historical prices  
- Volume  
- Market indicators  
- Macroeconomic signals  

This high-dimensional data can be **redundant**, **correlated**, and **noisy**, which can lead to:

- **Overfitting**  
- **Slower training**  
- Difficulty in visualizing or interpreting the model  

That's where **Principal Component Analysis (PCA)** comes in.

---

###  **How PCA Helps:**

PCA reduces the number of features by transforming them into **principal components** — new variables that are **uncorrelated** and capture the **maximum variance** in the data.

This helps by:
- Removing multicollinearity  
- Speeding up training  
- Preventing overfitting  
- Helping with feature visualization and interpretation

---

###  **Steps to Apply PCA in Your Stock Prediction Project:**

---

####  **Step 1: Standardize the Data**
Because PCA is sensitive to scale, first **standardize all features**:

```python
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)  # X is your feature matrix
```

---

####  **Step 2: Apply PCA**

Choose the number of principal components to keep (e.g., enough to retain 95% of the variance):

```python
from sklearn.decomposition import PCA

pca = PCA(n_components=0.95)  # Keep components that explain 95% of the variance
X_pca = pca.fit_transform(X_scaled)
```

---

####  **Step 3: Analyze Results**

Check how much variance each principal component explains:

```python
print(pca.explained_variance_ratio_)
print(pca.n_components_)  # Number of selected components
```

Suppose your original data had **50 features**, and PCA reduced it to **10 components** while retaining **95% of the information** — now your model is much more efficient.

---

####  **Step 4: Train the Model on Transformed Features**

```python
from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_pca, y)  # y is the target stock price
```

---

###  **Before vs After PCA:**

| Feature Set      | Number of Features | Multicollinearity | Training Time | Overfitting Risk |
|------------------|--------------------|-------------------|----------------|------------------|
| Original         | 50                 | High              | Slow           | High             |
| After PCA        | 10–15              | None              | Faster         | Lower            |

---

###  **Summary:**

- Use PCA to reduce dimensionality when working with **many financial features**  
- It creates new features (principal components) that capture most of the variance  
- You can improve **model performance**, reduce noise, and avoid overfitting  
- Especially helpful in **stock prediction**, where features often overlap or correlate

### Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the values to a range of -1 to 1.
Ans: \
To perform **Min-Max scaling** on the dataset `[1, 5, 10, 15, 20]` and **scale it to the range [-1, 1]**, we use the following formula:

---

###  **Min-Max Scaling Formula (Custom Range [a, b]):**
$$
[
X_{\text{scaled}} = a + \frac{(X - X_{\min}) \times (b - a)}{X_{\max} - X_{\min}}
]
$$
Where:
- \( a = -1 \)
- \( b = 1 \)
- $( X_{\min} = 1 )$
- $( X_{\max} = 20 )$

---

###  Apply the Formula:

We'll transform each value in `[1, 5, 10, 15, 20]`.

---

#### 1 For X = 1:

$[
-1 + \frac{(1 - 1) \times (1 - (-1))}{20 - 1} = -1 + 0 = \mathbf{-1.0}
]
$
#### 2 For X = 5:

$[
-1 + \frac{(5 - 1) \times 2}{19} = -1 + \frac{8}{19} ≈ \mathbf{-0.579}
$]

#### 3 For X = 10:

$[
-1 + \frac{(10 - 1) \times 2}{19} = -1 + \frac{18}{19} ≈ \mathbf{-0.053}
$]

#### 4 For X = 15:

$[
-1 + \frac{(15 - 1) \times 2}{19} = -1 + \frac{28}{19} ≈ \mathbf{0.474}
$]

#### 5 For X = 20:

$[
-1 + \frac{(20 - 1) \times 2}{19} = -1 + \frac{38}{19} = -1 + 2 = \mathbf{1.0}
$]

---

### **Scaled Output (Range -1 to 1):**

| Original | Scaled   |
|----------|----------|
| 1        | -1.000   |
| 5        | -0.579   |
| 10       | -0.053   |
| 15       | 0.474    |
| 20       | 1.000    

### Q8. For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform Feature Extraction using PCA. How many principal components would you choose to retain, and why?
Ans: \

###  **Step 1: Understand the Dataset**

You have **5 features**:

- **Height** – continuous  
- **Weight** – continuous  
- **Age** – continuous  
- **Gender** – categorical (likely binary: male/female or 0/1)  
- **Blood Pressure** – continuous  

 So, 4 numerical + 1 categorical (gender), which must be **encoded numerically** for PCA.

---

###  **Step 2: Preprocess the Data**

- **Standardize** all numeric features (mean = 0, std = 1)
- **Encode** categorical variables (e.g., one-hot or label encoding for gender)

```python
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import pandas as pd

# Sample DataFrame (assuming gender already encoded as 0/1)
data = pd.DataFrame({
    'height': [...],
    'weight': [...],
    'age': [...],
    'gender': [...],
    'blood_pressure': [...]
})

# Standardize features
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)

# Apply PCA
pca = PCA()
pca_data = pca.fit_transform(scaled_data)
```

---

###  **Step 3: Decide How Many Principal Components to Retain**

Use **explained variance ratio** to determine how many components to keep:

```python
explained_variance = pca.explained_variance_ratio_
cumulative_variance = pca.explained_variance_ratio_.cumsum()
```

Then plot or check the cumulative variance:

| PC # | Variance Explained | Cumulative Variance |
|------|--------------------|---------------------|
| PC1  | 0.45               | 0.45                |
| PC2  | 0.25               | 0.70                |
| PC3  | 0.15               | 0.85                |
| PC4  | 0.10               | 0.95                |
| PC5  | 0.05               | 1.00                |

---

###  **How Many Principal Components to Choose?**

 Choose **enough components to retain ~95%** of the variance.

So in this case:
- Retaining **4 components** keeps **95% of the variance**
- That’s a good trade-off between dimensionality reduction and information retention

---

###  **Final PCA Application:**

```python
pca = PCA(n_components=4)  # Retain 95% variance
reduced_data = pca.fit_transform(scaled_data)
```

---

###  **Summary:**

- PCA helps compress 5 features into **~4 principal components**  
- You reduce redundancy and keep essential patterns in the data  
- The number of components is chosen based on how much **variance you want to retain** (commonly ≥ 95%)