---

### ✅**Theoretical Questions and Answers**

---

#### **1. What is Logistic Regression, and how does it differ from Linear Regression?**

**Logistic Regression** is a classification algorithm used to predict discrete outcomes (e.g., binary labels 0 or 1).  
**Linear Regression** predicts continuous outcomes.

**Key Difference:**  
- Logistic Regression uses the **sigmoid function** to output probabilities between 0 and 1.  
- Linear Regression outputs real numbers without bounds.

---

#### **2. What is the mathematical equation of Logistic Regression?**

\[
P(y=1|x) = \frac{1}{1 + e^{-(w^Tx + b)}}
\]

Where:  
- \( x \): input features  
- \( w \): weights  
- \( b \): bias  
- \( P(y=1|x) \): probability of class 1

---

#### **3. Why do we use the Sigmoid function in Logistic Regression?**

The **sigmoid function** maps any real-valued number to a value between 0 and 1, which helps in interpreting the output as a **probability**.

---

#### **4. What is the cost function of Logistic Regression?**

Logistic Regression uses **Log Loss (Binary Cross-Entropy)** as the cost function:

\[
J(\theta) = -\frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(h(x^{(i)})) + (1 - y^{(i)}) \log(1 - h(x^{(i)})) \right]
\]

---

#### **5. What is Regularization in Logistic Regression? Why is it needed?**

**Regularization** adds a penalty to the loss function to **prevent overfitting**.

- It controls the complexity of the model by shrinking coefficients.
- It helps in improving generalization on unseen data.

---

#### **6. Explain the difference between Lasso, Ridge, and Elastic Net Regression.**

| Type        | Penalty Type     | Feature Selection |
|-------------|------------------|-------------------|
| Lasso       | L1               | Yes (shrinks to 0)|
| Ridge       | L2               | No                |
| Elastic Net | L1 + L2 combined | Yes               |

---

#### **7. When should we use Elastic Net instead of Lasso or Ridge?**

Use **Elastic Net** when:
- You suspect **multiple correlated features**.
- You want the **feature selection of Lasso** + **stability of Ridge**.

---

#### **8. What is the impact of the regularization parameter (λ) in Logistic Regression?**

- **λ (or C = 1/λ)** controls regularization strength.
- A **higher λ** adds more penalty (simpler model).
- A **lower λ** reduces penalty (risk of overfitting).

---

#### **9. What are the key assumptions of Logistic Regression?**

- The response variable is **binary (or multiclass with extensions)**.
- No multicollinearity between features.
- Features are linearly related to the log-odds.
- Large sample size for reliable predictions.

---

#### **10. What are some alternatives to Logistic Regression for classification tasks?**

- Decision Trees  
- Random Forest  
- Support Vector Machines (SVM)  
- K-Nearest Neighbors (KNN)  
- Naive Bayes  
- Neural Networks  

---

#### **11. What are Classification Evaluation Metrics?**

- **Accuracy**  
- **Precision**  
- **Recall**  
- **F1-Score**  
- **ROC-AUC**  
- **Confusion Matrix**  
- **Cohen’s Kappa**  
- **Matthews Correlation Coefficient (MCC)**  

---

#### **12. How does class imbalance affect Logistic Regression?**

- It can cause the model to **favor the majority class**, leading to poor performance on the minority class.
- Use techniques like:
  - **class weights**
  - **SMOTE**
  - **resampling**

---

#### **13. What is Hyperparameter Tuning in Logistic Regression?**

It involves choosing the best parameters like:
- `C` (regularization strength)
- `penalty` (l1, l2, elasticnet)
- `solver` (optimization algorithm)

Tools: `GridSearchCV`, `RandomizedSearchCV`

---

#### **14. What are different solvers in Logistic Regression? Which one should be used?**

| Solver      | Best For                          |
|-------------|-----------------------------------|
| liblinear   | Binary, small datasets            |
| saga        | Large datasets, elastic net       |
| lbfgs       | Multiclass, fast for small data   |
| newton-cg   | Multiclass                        |

---

#### **15. How is Logistic Regression extended for multiclass classification?**

- **One-vs-Rest (OvR):** One classifier per class vs. all others.  
- **Softmax (Multinomial):** Generalizes Logistic Regression to multiple classes.

---

#### **16. What are the advantages and disadvantages of Logistic Regression?**

✅ Advantages:  
- Simple and fast  
- Easy to interpret  
- Works well for linearly separable data

❌ Disadvantages:  
- Assumes linear decision boundary  
- Poor performance with complex patterns  
- Sensitive to outliers and imbalance

---

#### **17. What are some use cases of Logistic Regression?**

- Email spam detection  
- Fraud detection  
- Customer churn prediction  
- Disease diagnosis (e.g., cancer prediction)

---

#### **18. What is the difference between Softmax Regression and Logistic Regression?**

- **Logistic Regression** handles **binary classification**.
- **Softmax Regression** (a.k.a. Multinomial Logistic Regression) handles **multiclass classification** using softmax function.

---

#### **19. How do we choose between One-vs-Rest (OvR) and Softmax for multiclass classification?**

- Use **OvR** for simplicity and interpretability.
- Use **Softmax** for **mutually exclusive** classes and better performance on large multiclass problems.

---

#### **20. How do we interpret coefficients in Logistic Regression?**

- Each coefficient shows the **change in log-odds** of the outcome with a one-unit increase in the predictor.
- \( \text{Odds Ratio} = e^{\text{coefficient}} \)  
  > >1 = increase in odds  
  > <1 = decrease in odds


---

## ✅ Practical Questions and Answers

---

### **1. Write a Python program that loads a dataset, splits it into training and testing sets, applies Logistic Regression, and prints the model accuracy**

```python
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Apply logistic regression
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Predict and print accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Model Accuracy:", accuracy)
```

---

### **2. Write a Python program to apply L1 regularization (Lasso) on a dataset using LogisticRegression(penalty='l1') and print the model accuracy**

```python
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import warnings

warnings.filterwarnings("ignore")  # to avoid convergence warnings

# Load data
X, y = load_breast_cancer(return_X_y=True)

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# L1 Regularization
model = LogisticRegression(penalty='l1', solver='liblinear', max_iter=1000)
model.fit(X_train, y_train)

# Predict & Accuracy
y_pred = model.predict(X_test)
print("L1 Regularization Accuracy:", accuracy_score(y_test, y_pred))
```

---

### **3. Write a Python program to train Logistic Regression with L2 regularization (Ridge) using LogisticRegression(penalty='l2'). Print model accuracy and coefficients**

```python
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset
X, y = load_breast_cancer(return_X_y=True)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# L2 Regularization (default)
model = LogisticRegression(penalty='l2', solver='lbfgs', max_iter=1000)
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print("L2 Regularization Accuracy:", accuracy_score(y_test, y_pred))
print("Model Coefficients:", model.coef_)
```

---

### **4. Write a Python program to train Logistic Regression with Elastic Net Regularization (penalty='elasticnet')**

```python
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset
X, y = load_breast_cancer(return_X_y=True)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Elastic Net Regularization
model = LogisticRegression(penalty='elasticnet', solver='saga', l1_ratio=0.5, max_iter=1000)
model.fit(X_train, y_train)

# Predict & Accuracy
y_pred = model.predict(X_test)
print("Elastic Net Accuracy:", accuracy_score(y_test, y_pred))
```

---

### **5. Write a Python program to train a Logistic Regression model for multiclass classification using multi_class='ovr'**

```python
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load multiclass dataset
X, y = load_iris(return_X_y=True)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# One-vs-Rest Logistic Regression
model = LogisticRegression(multi_class='ovr', solver='lbfgs', max_iter=1000)
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print("OvR Multiclass Accuracy:", accuracy_score(y_test, y_pred))
```



---

### **6. Write a Python program to apply GridSearchCV to tune the hyperparameters (C and penalty) of Logistic Regression. Print the best parameters and accuracy**

```python
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load data
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Hyperparameter grid
param_grid = {
    'C': [0.01, 0.1, 1, 10],
    'penalty': ['l1', 'l2'],
    'solver': ['liblinear']
}

# Grid Search
grid = GridSearchCV(LogisticRegression(max_iter=1000), param_grid, cv=5)
grid.fit(X_train, y_train)

# Best model
print("Best Parameters:", grid.best_params_)
print("Best Accuracy:", grid.score(X_test, y_test))
```

---

### **7. Write a Python program to evaluate Logistic Regression using Stratified K-Fold Cross-Validation. Print the average accuracy**

```python
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.linear_model import LogisticRegression

# Load data
X, y = load_breast_cancer(return_X_y=True)

# Stratified K-Fold
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
model = LogisticRegression(max_iter=1000)

# Cross-validation
scores = cross_val_score(model, X, y, cv=skf)
print("Cross-Validation Accuracy Scores:", scores)
print("Average Accuracy:", scores.mean())
```

---

### **8. Write a Python program to load a dataset from a CSV file, apply Logistic Regression, and evaluate its accuracy**

```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load dataset from CSV
df = pd.read_csv('data.csv')  # Replace with your CSV file
X = df.drop('target', axis=1)  # Replace 'target' with your label column
y = df['target']

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Logistic Regression
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
print("Accuracy from CSV data:", accuracy_score(y_test, y_pred))
```

---

### **9. Write a Python program to apply RandomizedSearchCV for tuning hyperparameters (C, penalty, solver) in Logistic Regression. Print the best parameters and accuracy**

```python
from sklearn.model_selection import RandomizedSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
import numpy as np

# Load data
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Parameter grid
param_dist = {
    'C': np.logspace(-3, 3, 10),
    'penalty': ['l1', 'l2'],
    'solver': ['liblinear', 'saga']
}

# Randomized Search
rs = RandomizedSearchCV(LogisticRegression(max_iter=1000), param_distributions=param_dist, n_iter=10, cv=5)
rs.fit(X_train, y_train)

# Best results
print("Best Parameters:", rs.best_params_)
print("Accuracy:", rs.score(X_test, y_test))
```

---

### **10. Write a Python program to implement One-vs-One (OvO) Multiclass Logistic Regression and print accuracy**

```python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.multiclass import OneVsOneClassifier
from sklearn.metrics import accuracy_score

# Load data
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# OvO Logistic Regression
ovo = OneVsOneClassifier(LogisticRegression(max_iter=1000))
ovo.fit(X_train, y_train)

# Accuracy
y_pred = ovo.predict(X_test)
print("One-vs-One Accuracy:", accuracy_score(y_test, y_pred))
```

---



###  **11. Visualize the confusion matrix for binary classification**

```python
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

# Load data
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot()
plt.title("Confusion Matrix")
plt.show()
```

---

###  **12. Evaluate model using Precision, Recall, and F1-Score**

```python
from sklearn.metrics import precision_score, recall_score, f1_score

print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
print("F1 Score:", f1_score(y_test, y_pred))
```

---

###  **13. Train a Logistic Regression model on imbalanced data and apply class weights**

```python
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Create imbalanced dataset
X, y = make_classification(n_samples=1000, n_features=20, weights=[0.9, 0.1], random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Class weights to handle imbalance
model = LogisticRegression(class_weight='balanced', max_iter=1000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print("Accuracy on imbalanced data:", accuracy_score(y_test, y_pred))
```

---

###  **14. Train Logistic Regression on the Titanic dataset, handle missing values, and evaluate performance**

```python
import seaborn as sns
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score

# Load Titanic dataset
titanic = sns.load_dataset("titanic")

# Select features and drop rows with missing target or key features
titanic = titanic[["age", "sex", "fare", "survived"]].dropna(subset=["fare", "sex", "survived"])

# Handle missing age values
imputer = SimpleImputer(strategy="mean")
titanic["age"] = imputer.fit_transform(titanic[["age"]])

# Encode categorical column
titanic["sex"] = LabelEncoder().fit_transform(titanic["sex"])

# Split features and label
X = titanic.drop("survived", axis=1)
y = titanic["survived"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train and evaluate model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("Titanic Dataset Accuracy:", accuracy_score(y_test, y_pred))
```

---

###  **15. Apply feature scaling (Standardization) before training a Logistic Regression model and compare results**

```python
from sklearn.preprocessing import StandardScaler

# Without Scaling
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
acc_no_scaling = accuracy_score(y_test, model.predict(X_test))

# With Scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
model_scaled = LogisticRegression(max_iter=1000)
model_scaled.fit(X_train_scaled, y_train)
acc_scaled = accuracy_score(y_test, model_scaled.predict(X_test_scaled))

print("Accuracy without scaling:", acc_no_scaling)
print("Accuracy with scaling:", acc_scaled)
```

---



###  **16. Write a Python program to train Logistic Regression and evaluate its performance using ROC-AUC score**

```python
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

# Load data
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Predict probabilities
y_prob = model.predict_proba(X_test)[:, 1]
auc = roc_auc_score(y_test, y_prob)
print("ROC-AUC Score:", auc)
```

---

###  **17. Write a Python program to train Logistic Regression using a custom learning rate (C=0.5) and evaluate accuracy**

```python
model = LogisticRegression(C=0.5, max_iter=1000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("Accuracy with C=0.5:", accuracy_score(y_test, y_pred))
```

---

###  **18. Write a Python program to train Logistic Regression and identify important features based on model coefficients**

```python
import numpy as np

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Coefficients and feature importance
importance = model.coef_[0]
for i, v in enumerate(importance):
    print(f"Feature {i}: Coefficient = {v:.4f}")
```

---

###  **19. Write a Python program to train Logistic Regression and evaluate its performance using Cohen’s Kappa Score**

```python
from sklearn.metrics import cohen_kappa_score

y_pred = model.predict(X_test)
kappa = cohen_kappa_score(y_test, y_pred)
print("Cohen’s Kappa Score:", kappa)
```

---

###  **20. Write a Python program to train Logistic Regression and visualize the Precision-Recall Curve for binary classification**

```python
from sklearn.metrics import precision_recall_curve, PrecisionRecallDisplay
import matplotlib.pyplot as plt

y_scores = model.predict_proba(X_test)[:, 1]
precision, recall, _ = precision_recall_curve(y_test, y_scores)
disp = PrecisionRecallDisplay(precision=precision, recall=recall)
disp.plot()
plt.title("Precision-Recall Curve")
plt.show()
```

---

###  **21. Write a Python program to train Logistic Regression with different solvers (liblinear, saga, lbfgs) and compare their accuracy**

```python
solvers = ['liblinear', 'saga', 'lbfgs']
for solver in solvers:
    model = LogisticRegression(solver=solver, max_iter=1000)
    model.fit(X_train, y_train)
    acc = accuracy_score(y_test, model.predict(X_test))
    print(f"Solver: {solver}, Accuracy: {acc}")
```

---

###  **22. Write a Python program to train Logistic Regression and evaluate its performance using Matthews Correlation Coefficient (MCC)**

```python
from sklearn.metrics import matthews_corrcoef

y_pred = model.predict(X_test)
mcc = matthews_corrcoef(y_test, y_pred)
print("Matthews Correlation Coefficient (MCC):", mcc)
```

---

###  **23. Write a Python program to train Logistic Regression on both raw and standardized data. Compare their accuracy to see the impact of feature scaling**

```python
# Raw data
model_raw = LogisticRegression(max_iter=1000)
model_raw.fit(X_train, y_train)
acc_raw = accuracy_score(y_test, model_raw.predict(X_test))

# Scaled data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model_scaled = LogisticRegression(max_iter=1000)
model_scaled.fit(X_train_scaled, y_train)
acc_scaled = accuracy_score(y_test, model_scaled.predict(X_test_scaled))

print("Accuracy (Raw):", acc_raw)
print("Accuracy (Standardized):", acc_scaled)
```

---

###  **24. Write a Python program to train Logistic Regression and find the optimal C (regularization strength) using cross-validation**

```python
from sklearn.model_selection import cross_val_score
import numpy as np

C_values = [0.01, 0.1, 1, 10]
for C in C_values:
    model = LogisticRegression(C=C, max_iter=1000)
    scores = cross_val_score(model, X, y, cv=5)
    print(f"C={C}, Cross-Validation Accuracy: {np.mean(scores):.4f}")
```

---

###  **25. Write a Python program to train Logistic Regression, save the trained model using joblib, and load it again to make predictions**

```python
import joblib

# Train model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Save model
joblib.dump(model, "logistic_model.pkl")

# Load and predict
loaded_model = joblib.load("logistic_model.pkl")
y_pred = loaded_model.predict(X_test)
print("Accuracy of loaded model:", accuracy_score(y_test, y_pred))
```

---
