# Assignment Code: DA-AG-011  
## Logistic Regression | Assignment  


---




### **Question 1: What is Logistic Regression, and how does it differ from Linear Regression?**

**Answer:**  
Logistic Regression is a **supervised learning algorithm** used for **classification problems** — especially binary ones.  
It predicts the probability that a given input belongs to a particular class.  

- **Linear Regression** → Output is *continuous* (e.g., predicting salary).  
- **Logistic Regression** → Output is *categorical/probabilistic* (e.g., yes/no, 0/1).  

Logistic Regression uses the **Sigmoid function** to map values between **0 and 1**, making it ideal for probability estimation.


### **Question 2: Explain the role of the Sigmoid function in Logistic Regression.**

**Answer:**  
The **Sigmoid function** converts any real-valued number into a value between **0 and 1**, representing a **probability**.  
It ensures that Logistic Regression outputs probabilities rather than raw linear values.

The formula is:

\[
\sigma(z) = \frac{1}{1 + e^{-z}}
\]

Where  
\( z = b_0 + b_1x_1 + b_2x_2 + ... + b_nx_n \)

- If \( \sigma(z) > 0.5 \) → Class 1  
- If \( \sigma(z) \le 0.5 \) → Class 0  

Thus, it acts as a **link function** connecting the linear model to a probabilistic output.


### **Question 3: What is Regularization in Logistic Regression and why is it needed?**

**Answer:**  
Regularization adds a **penalty term** to the loss function to avoid overfitting by discouraging large coefficients.

- **L1 (Lasso):** Uses absolute value of coefficients → leads to sparse models.  
- **L2 (Ridge):** Uses squared values of coefficients → shrinks large weights smoothly.

Regularization improves model **generalization** and **stability** on unseen data.


### **Question 4: What are some common evaluation metrics for classification models, and why are they important?**

**Answer:**  

| Metric | Formula / Meaning | Use |
|---------|-------------------|-----|
| **Accuracy** | (TP + TN) / (Total) | Works for balanced data |
| **Precision** | TP / (TP + FP) | Reliability of positive predictions |
| **Recall** | TP / (TP + FN) | Ability to find all positives |
| **F1-Score** | Harmonic mean of Precision & Recall | Balances both |
| **ROC-AUC** | Area under ROC curve | Measures discrimination ability |

They help evaluate model performance beyond accuracy, especially in imbalanced datasets.


### **Question 5: Write a Python program that loads a CSV file into a Pandas DataFrame, splits into train/test sets, trains a Logistic Regression model, and prints its accuracy. (Use Dataset from sklearn package)**


In [1]:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train model
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
print("Model Accuracy:", accuracy_score(y_test, y_pred))


Model Accuracy: 1.0


### **Question 6: Write a Python program to train a Logistic Regression model using L2 regularization (Ridge) and print the model coefficients and accuracy.
(Use Dataset from sklearn package)

(Include your Python code and output in the code box below.)**


In [2]:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=42)

# L2 Regularization
model = LogisticRegression(penalty='l2', solver='lbfgs', max_iter=200)
model.fit(X_train, y_train)

print("Coefficients:\n", model.coef_)
print("Accuracy:", accuracy_score(y_test, model.predict(X_test)))


Coefficients:
 [[-0.39086526  0.92121444 -2.33169483 -0.97997422]
 [ 0.49862408 -0.30952764 -0.21642637 -0.73163847]
 [-0.10775882 -0.6116868   2.5481212   1.71161268]]
Accuracy: 1.0


### **Question 7:  Write a Python program to train a Logistic Regression model for multiclass classification using multi_class='ovr' and print the classification report.
(Use Dataset from sklearn package)

(Include your Python code and output in the code box below.)**


In [3]:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=42)

model = LogisticRegression(multi_class='ovr', max_iter=200)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))


              precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       1.00      0.91      0.95        11
           2       0.92      1.00      0.96        12

    accuracy                           0.97        38
   macro avg       0.97      0.97      0.97        38
weighted avg       0.98      0.97      0.97        38





### **Question 8: Write a Python program to apply GridSearchCV to tune C and penalty hyperparameters for Logistic Regression and print the best parameters and validation accuracy.
(Use Dataset from sklearn package)

(Include your Python code and output in the code box below.)**


In [4]:
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=42)

param_grid = {'C': [0.1, 1, 10],
              'penalty': ['l1', 'l2'],
              'solver': ['liblinear']}

grid = GridSearchCV(LogisticRegression(max_iter=200), param_grid, cv=5)
grid.fit(X_train, y_train)

print("Best Parameters:", grid.best_params_)
print("Validation Accuracy:", grid.best_score_)


Best Parameters: {'C': 10, 'penalty': 'l2', 'solver': 'liblinear'}
Validation Accuracy: 0.9640316205533598


### **Question 9: Write a Python program to standardize the features before training Logistic Regression and compare the model's accuracy with and without scaling.
(Use Dataset from sklearn package)

(Include your Python code and output in the code box below.) **


In [5]:
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=42)

# Without scaling
model1 = LogisticRegression(max_iter=200)
model1.fit(X_train, y_train)
acc1 = accuracy_score(y_test, model1.predict(X_test))

# With scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model2 = LogisticRegression(max_iter=200)
model2.fit(X_train_scaled, y_train)
acc2 = accuracy_score(y_test, model2.predict(X_test_scaled))

print("Accuracy without scaling:", acc1)
print("Accuracy with scaling:", acc2)


Accuracy without scaling: 1.0
Accuracy with scaling: 1.0


### **Question 10:  Imagine you are working at an e-commerce company that wants to predict which customers will respond to a marketing campaign. Given an imbalanced dataset (only 5% of customers respond), describe the approach you’d take to build a Logistic Regression model — including data handling, feature scaling, balancing classes, hyperparameter tuning, and evaluating the model for this real-world business use case.**

**Answer:**  

Steps to build the Logistic Regression model:

1. **Data Handling**  
   - Clean missing values, encode categorical variables, and remove outliers.

2. **Feature Scaling**  
   - Standardize numerical features using `StandardScaler`.

3. **Handle Class Imbalance**  
   - Since only 5% customers respond, apply  
     - **SMOTE (oversampling)** or  
     - **class_weight='balanced'** during model training.

4. **Model Training**  
   - Train `LogisticRegression(penalty='l2')` with balanced class weights.

5. **Hyperparameter Tuning**  
   - Use **GridSearchCV** to optimize `C`, `penalty`, and `solver`.

6. **Model Evaluation**  
   - Focus on **Precision**, **Recall**, **F1-score**, and **ROC-AUC** (not just accuracy).

7. **Business Use**  
   - Identify high-probability customers and target them to improve marketing ROI.
