**Q1.** What is Logistic Regression, and how does it differ from Linear Regression?
Answer: Logistic Regression is a supervised learning algorithm used for classification problems. It predicts the probability that an observation belongs to a particular category, using the logistic (sigmoid) function to map output values between 0 and 1.

Linear Regression predicts continuous numerical values and uses the least squares method.

Logistic Regression predicts discrete categories (e.g., yes/no) and uses maximum likelihood estimation.

In logistic regression, the dependent variable is categorical, whereas in linear regression, it’s continuous.

**Q2.** Role of the Sigmoid Function in Logistic Regression
Answer: The sigmoid function transforms a linear combination of features into a probability between 0 and 1:

σ(z)= 1/ 1+e^-z


This mapping allows the model to interpret outputs as probabilities, which can then be thresholded (e.g., 0.5) to make binary predictions.

**Q3. **Regularization in Logistic Regression and Why It’s Needed
Answer: Regularization adds a penalty term to the loss function to prevent overfitting by discouraging overly complex models:

L1 (Lasso): Adds the absolute value of coefficients (encourages sparsity).

L2 (Ridge): Adds the square of coefficients (shrinks weights).
It helps improve generalization by controlling the magnitude of feature coefficients.

**Q4. **Common Evaluation Metrics for Classification Models
Answer: Accuracy: Proportion of correctly classified instances.

Precision: Proportion of positive predictions that are correct.

Recall (Sensitivity): Proportion of actual positives correctly identified.

F1-Score: Harmonic mean of precision and recall.

ROC-AUC: Measures model discrimination capability.
These metrics provide different perspectives, especially important for imbalanced datasets.

**Q5** Write a Python program that loads a CSV file into a Pandas DataFrame, splits into train/test sets, trains a Logistic Regression model, and prints its accuracy.

In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import pandas as pd

# Load dataset
data = load_iris()
X, y = data.data, data.target

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Model
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# Predictions & accuracy
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

Accuracy: 1.0


**Q6** Write a Python program to train a Logistic Regression model using L2 regularization (Ridge) and print the model coefficients and accuracy.

In [2]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Model with L2 regularization (Ridge) - default is L2
model_l2 = LogisticRegression(penalty='l2', max_iter=200)
model_l2.fit(X_train, y_train)

# Predictions & accuracy
y_pred_l2 = model_l2.predict(X_test)
print("Accuracy with L2 regularization:", accuracy_score(y_test, y_pred_l2))

# Print coefficients
print("Model coefficients:", model_l2.coef_)

Accuracy with L2 regularization: 1.0
Model coefficients: [[-0.40538546  0.86892246 -2.2778749  -0.95680114]
 [ 0.46642685 -0.37487888 -0.18745257 -0.72127133]
 [-0.06104139 -0.49404358  2.46532746  1.67807247]]


**Q7** Write a Python program to train a Logistic Regression model for multiclass classification using multi_class='ovr' and print the classification report. (Use Dataset from sklearn package)

In [3]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

# Model with multi_class='ovr'
model_ovr = LogisticRegression(multi_class='ovr', max_iter=200)
model_ovr.fit(X_train, y_train)

# Predictions
y_pred_ovr = model_ovr.predict(X_test)

# Print classification report
print("Classification Report (multi_class='ovr'):\n", classification_report(y_test, y_pred_ovr))



Classification Report (multi_class='ovr'):
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      0.85      0.92        13
           2       0.87      1.00      0.93        13

    accuracy                           0.96        45
   macro avg       0.96      0.95      0.95        45
weighted avg       0.96      0.96      0.96        45



**Q8** Write a Python program to apply GridSearchCV to tune C and penalty hyperparameters for Logistic Regression and print the best parameters and validation accuracy.

In [5]:
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Define the parameter grid
param_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100],
              'penalty': ['l1', 'l2']} # 'elasticnet' and 'none' are also options depending on solver

# Create a Logistic Regression model
# Note: Some solvers don't support all penalties. 'liblinear' supports 'l1' and 'l2'.
model = LogisticRegression(solver='liblinear', max_iter=200)

# Create GridSearchCV object
grid_search = GridSearchCV(model, param_grid, cv=5, scoring='accuracy')

# Fit the grid search to the data
grid_search.fit(X_train, y_train)

# Print the best parameters and best score
print("Best parameters:", grid_search.best_params_)
print("Best validation accuracy:", grid_search.best_score_)

# Evaluate the best model on the test set
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)
test_accuracy = accuracy_score(y_test, y_pred)
print("Test accuracy with best model:", test_accuracy)

Best parameters: {'C': 10, 'penalty': 'l2'}
Best validation accuracy: 0.9523809523809523
Test accuracy with best model: 1.0


**Q9** Write a Python program to standardize the features before training Logistic Regression and compare the model's accuracy with and without scaling.

In [6]:
from sklearn.preprocessing import StandardScaler

# Without scaling
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
print("Accuracy without scaling:", accuracy_score(y_test, model.predict(X_test)))

# With scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model_scaled = LogisticRegression(max_iter=200)
model_scaled.fit(X_train_scaled, y_train)
print("Accuracy with scaling:", accuracy_score(y_test, model_scaled.predict(X_test_scaled)))


Accuracy without scaling: 1.0
Accuracy with scaling: 1.0


**Q10**  Imagine you are working at an e-commerce company that wants to predict which customers will respond to a marketing campaign. Given an imbalanced dataset (only 5% of customers respond), describe the approach you’d take to build a Logistic Regression model — including data handling, feature scaling, balancing classes, hyperparameter tuning, and evaluating the model for this real-world business use case.

Here's an approach to building a Logistic Regression model for predicting customer response in an imbalanced e-commerce dataset:

**1. Data Handling and Exploration:**

*   **Load the data:** Load the customer data into a pandas DataFrame.
*   **Understand the data:** Explore the features, identify potential issues like missing values, outliers, and the distribution of the target variable (customer response).
*   **Feature Engineering:** Create new features that could be relevant for predicting response, such as:
    *   Customer purchase history (frequency, recency, monetary value).
    *   Customer demographics.
    *   Website activity (time spent, pages visited).
    *   Previous campaign interactions.

**2. Feature Scaling:**

*   **Standardize numerical features:** Logistic Regression is sensitive to the scale of features. Use `StandardScaler` or `MinMaxScaler` to standardize or normalize numerical features.

**3. Handling Class Imbalance:**

*   **Understand the problem:** With only 5% of customers responding, the dataset is highly imbalanced. Simply training a model on this data will likely result in a model that predicts the majority class (non-responders) most of the time, leading to high accuracy but poor performance on the minority class (responders).
*   **Choose a resampling technique:** Several techniques can address class imbalance:
    *   **Oversampling the minority class:**
        *   **SMOTE (Synthetic Minority Over-sampling Technique):** Creates synthetic samples of the minority class.
        *   **Random oversampling:** Duplicates random instances of the minority class.
    *   **Undersampling the majority class:**
        *   **Random undersampling:** Removes random instances of the majority class.
        *   **NearMiss:** Selects samples from the majority class that are close to the minority class.
*   **Apply the technique:** Apply the chosen resampling technique to the training data *before* training the model. It's crucial *not* to apply it to the test data, as this would lead to an unrealistic evaluation of the model's performance on real-world data.

**4. Model Selection and Training:**

*   **Choose Logistic Regression:** As requested, use Logistic Regression. It's a good choice for binary classification and provides interpretable coefficients.
*   **Train the model:** Train the Logistic Regression model on the balanced training data.

**5. Hyperparameter Tuning:**

*   **Identify key hyperparameters:** For Logistic Regression, important hyperparameters include:
    *   `C`: Inverse of regularization strength. Smaller values specify stronger regularization.
    *   `penalty`: The type of regularization ('l1', 'l2', 'elasticnet', 'none').
    *   `solver`: Algorithm to use for optimization ('liblinear', 'lbfgs', 'sag', 'saga', 'newton-cg').
*   **Use GridSearchCV or RandomizedSearchCV:** Employ cross-validation with `GridSearchCV` or `RandomizedSearchCV` to find the best combination of hyperparameters that maximizes a relevant evaluation metric (see below).

**6. Model Evaluation:**

*   **Choose appropriate metrics:** For imbalanced datasets, accuracy is not a reliable metric. Instead, focus on metrics that assess the model's ability to correctly identify the minority class:
    *   **Precision:** Of all the customers the model predicted would respond, what proportion actually responded?
    *   **Recall (Sensitivity):** Of all the customers who actually responded, what proportion did the model correctly identify?
    *   **F1-Score:** The harmonic mean of precision and recall, providing a balance between the two.
    *   **ROC-AUC:** Measures the model's ability to distinguish between positive and negative classes. A higher AUC indicates better discrimination.
    *   **Confusion Matrix:** Provides a detailed breakdown of true positives, true negatives, false positives, and false negatives.
*   **Evaluate on the test set:** Evaluate the tuned model on the *original, unbalanced* test set using the chosen metrics. This provides a realistic assessment of how the model will perform in a real-world scenario.

**7. Interpretation and Deployment:**

*   **Interpret coefficients:** Analyze the model's coefficients to understand which features are most influential in predicting customer response. This can provide valuable business insights.
*   **Set a probability threshold:** Instead of using the default 0.5 threshold for classification, consider adjusting it based on the business objective. For example, if the cost of a false negative (failing to identify a responder) is higher than the cost of a false positive (incorrectly identifying a non-responder), you might lower the threshold to increase recall.
*   **Deploy the model:** Once satisfied with the model's performance, deploy it to predict customer responses for future marketing campaigns.
*   **Monitor and retrain:** Continuously monitor the model's performance in production and retrain it periodically with new data to ensure it remains accurate and relevant.

By following these steps, you can build a robust Logistic Regression model that effectively predicts customer response in an imbalanced dataset, providing valuable insights for your e-commerce marketing efforts.